Examples of debugging process
- Student complains that he gets different results running his program against input read from URL via Java library versus using browser and saving to file. Insisted that program should work same way. Mystified. Only two possible conclusions: made mistaking saving file or browser and java library give different versions of web page. Naturally the browser sends cookies and perhaps even logged student
in to Yahoo or whatever.
- System was randomly crashing during JavaOne one-year. I had people poking around reading the code trying to figure out what was wrong. We weren't getting diagnostics or anything. The system would simply stop and exit. I thought, you know, it's almost like someone is calling System.exit. Then I said, actually, it's the only possible explanation. One grep later and we founded in the code of some intern ;)
- When something goes wrong think about what is different or what has changed. I know this sounds obvious, but it is a very powerful focusing technique. It is really tempting to freak out and try all kinds of fixes when the system becomes totally unstable. After our system crash in the bullet point above (before my trip to Paris), jGuru became super slow and unstable. The system was launching 700 threads, bringing the machine to a grinding halt. I kept thinking "what's changed?", but couldn't think of anything. The software was the same, I said! So, I started building thread debugging tools. Anyway, turns out I did change something. Ah ha, I thought. I did make a minor change when trying to get the server back up after the crash--it was causing portal.init() to be executed twice. The system seemed ok for a few hours, but then was right back to the huge number of threads. Finally, I realized that I had specifically code the system so it could only be initialized once. It couldn't have been that. Using the "what has changed" focusing lens, I convinced myself that the software was the same (confirmed by revision control system). Therefore, no matter how unlikely, there must be a data problem. Given that the server crashed, I would normally be suspicious of this immediately, but our database naturally has transactions and recovers nicely from power outages and so on. Well, it turns out the search database, which is different, got caught in the middle of a locked operation when the system died (leaving a file called commit.lock) around. I copied this search database with the freeze-dried lock to the new drive, making the search database freak out. The search library waits like 3 seconds to see if the lock will free up before timing out. With all of the searches initiated on jGuru, this queued up a HUGE number of threads. Problem was solved literally by removing that lock file. The number of threads dropped before my eyes.
- Script with student info parameters to generate CPT/OPT letters; sometimes worked and sometimes didn't! Same code. Why did it generate correct letter only sometimes?