Kernel Hotspots and Problems
Much of this dicussion came about during OLS 2005 and the PARISC Mini-bof we held on Thursday July 21st, 2005.
Problem the first - More Crash Info -
When the kernel fails we need more information to debug the problem. Possible solutions include:
- Linux kernel crash dump (Dump crash data to the network).
- Kexec (Launch second kernel to debug incore data).
Problem the second - Floating Point Registers -
We have re-enabled floating pointer registers around the 2.6.9 era, and this introduced a bug in our code. The bug was rather subtle and fixed by James Bottomley. The issue was that our _switch_to and _switch_to_ret was not saving the callee saves floating point registers. The situation arose that one of the floating point registers was getting trashed in the pa_memcpy code which usually meant something died with an invalid copy.
There are still some stability issues and crashes in pa_memcpy. It seems the Joel Soete has had some limited success using GCC 4.0 to limit the use of floating point registers.
Update: With the 3.x kernels we switched to gcc-4.x and instruct it to not use floating point registers any longer. Problem doesn't exist any longer.