... in which I learn that when they say there isn't much stack space available in the Linux kernel, they really mean it.
Actually, the real reason it took me so long to debug this particular problem wasn't just the stack space; the compiler optimizer wanted to play as well. I'd foolishly put something large onto the stack, in preparation for some later code, but not actually used itand so the compiler had quietly stripped it out. When I did eventually start using it, some time later, things began to go awry.
At this point, the usual rule of "cause of bug == most recent change to code" no longer applied, and as I don't have a debugging system set up yet, I pretty much had to resort to shotgun debugging.
In fact, a kernel debugger wouldn't actually have helped much in tracking down the problem. The key piece of evidence that I eventually twigged to was that the crash would appear and disappear as I included or removed chunks of the code, even though the code was never run. That is: if I added in code that referenced the big thing on the stack, the optimizer could no longer quietly elide that buffer, and the code would crash before it got anywhere near the new code. Doh, problem solved.
On the plus side, I got paid today. When I first looked at the payslip, I had a worrying moment when I saw the size of the Tax number in the deductions column. Then I noticed it had a minus sign in front of itwhich was just as well, given that the tax number was bigger than the pay number. So I ended up being paid about treble what I expected, presumably because of the mysterious machinations of the PAYE system.