Processor Limitations and Known Bugs
This page is a non exhaustive list of PA-RISC CPUs features, to be kept in mind when writing PA-RISC code, and especially kernel code.
In spite of what the arch manual says (it says the congruence stride is 16MB), the congruence stride on all manufactured parisc processors is 4MB. This means that any virtual addresses, regardless of space id, that are equal modulo 4MB have the same cache colour. (See this thread).
Block TLB vs Variable page sizes
PA 1.1 does NOT support Variable page sizes. Only PA 2.0 processors have that. But PA 1.1 does have "Block TLB" in some processor versions. Use PDC calls (firmware, see ProcessorDependentCode) to setup the BTLB entries when available.
Specific issues of pa2.0 family.
Unfortunately 64bit integer divides cannot be done using the FPU (the limit is about 52bits). This means that we must call millicode any time a long is divided by another long. This is really bad since the DS instruction doesn't support 64bit division. Instead the millicode must perform long division (remember how long that took you in gradeschool?) on a 64bit integer. This can take as much as 500 cycles.
Suddenly we have gone from performing two divides of type long in 60 cycles to one divide of type long in 500 cycles. It is very important that the appropriate casts to unsigned int are performed before doing divides whenever possible.
Just keep that in mind when you see "/" operator in code.
Unsigned vs Signed ints
Under LP64, pointers are 64 bit quantities and integers are 32 bit quantities. In a typical array reference, the index (an integer) must be added to the base address of the array (a pointer) to obtain the address of the item being accessed. Before adding a 32bit item to a 64bit item, the 32bit item must be sign extended to clear the garbage out of the top 32bits of the register. This sign extension is done with an extract (`EXTRD`) operation and takes one cycle. Here is the typical array reference:
register int a; register int i; x = a[i];
and the assembly code under PA1.1:
And finally the assembly code under PA2.0:
EXTRD,S index,63,32,temp_reg LDWX temp_reg(basereg),x
The cost of a typical array access is now 4 cycles instead of 3 cycles. On average, applications doing this will suffer a 5-10% performance loss due to these extra extracts.