Windbg: Conditional breakpoints

I found a very strange bug with a kernel driver I do some work on this last week. It only seems to appear when we have GoogleDesktop or the MSN Desktop Search installed. After some period of time the graphics display just starts doing some whacky stuff: fonts don’t display, repaints don’t work quite right, just general whackiness.

So since this is only happening after some period of time on a machine, we started looking at resources. Using the pooltag tool from sysinternals we found one particular tag that looked out of control. The tag was FSrN, which according to the lists I could find is “File-system runtime”. Helpful, no?

So that brings me to the meat of the story. I wanted to find out who was allocating that and what the call stack looked like at the time. In order to do this I needed to put a breakpoint on ExAllocatePoolWithTag, but that gets called ALL the time. I only wanted to break when we hit the right tag value. So I came up with the following command to set the breakpoint in windbg:

bp nt!ExAllocatePoolWithTag “j (Poi(ss:esp+c) = ‘NrSF’) ‘kb; db ss:esp+c’; ‘gc’ “

You can check out the windbg help for more information on conditional breakpoints. It will explain what the above command means. I wanted to post it since I couldn’t find any good example of how to do this with a non-integer value.

Programming Virtue: Complete Coding

One of the virtues of programming that I have found very useful and have been trying to discipline myself to use is what I like to call complete coding. This is the practice of writing shipping quality code ALL the time.

Most programmers try out an algorithm or put a thought into code very quickly to test the idea. There are two major problems with this practice. All too often the code ends up not being cleaned up and sits in the live project until somebody happens to find a bug in it. Or, the code gets tested in its quick and dirty form, then cleaned up, and checked in without further testing. Both of these practices put bugs into the code when they don’t need to be there. (The bugs are just a result of sloppy procedures and are fairly easy to avoid.)

So with these thoughts in mind, I have a few recommendations to make to programmers (myself included).

1- Always write error handling code as you are writing the main code paths. If you don’t it probably won’t get done, and it will be a while before the bugs get found since those error code paths by definition don’t get executed during normal operation.

2- Always write the code in such a way that you can understand it easily. If you have to think about it to write it, you will have to think about it to maintain it. And most importantly, it will be much more difficult to be sure you wrote it correctly the first time. (Thinking is just too darn hard, he he.)

3- Consider reuse of the code in the initial writing. Sometimes this means hiding implementation details behind an interface. Sometimes it means doing the refactoring work NOW, even though you desperately want to just finish the piece of code you’re working on.

4- Always run through the code in your head (or even better, in a debugger) with sample input to make sure the code is complete. Think about the what-ifs (the things you never think will happen, but sooner or later they will).

Well, there are probably many more, which is why there’s entire books about this subject. But these are few that I have noticed as being extremely useful. Let me know if you have other “complete coding” practices that others could benefit from.

A strange kernel-mode bug: PsLookupProcessByProcessId

So I was working on a bug this last week in a Windows kernel-mode driver. It was really quite a strange symptom and once I found the problem I thought it might be useful to share since I wasn’t able to find any information on it myself.

So first, the symptom. After running a system for a while, I broke in with the kernel debugger to examine what looked like a deadlock. When I did a “!process 0 7” to examine the processes on the system and see what might be deadlocked, I found that every single process that I had run on the machine was still sitting around in memory with no active threads.

Well it turns out that the code was using the undocumented function PsLookupProcessByProcessId. Apparently, unlike it’s documented cousin PsGetCurrentProcess, PsLookupProcessByProcessId bumps the reference count on the EPROCESS object that it returns. The function thus requires a call to ObDereferencePointer to release the ref count and allow the process object to be closed.

Following is an example of one of the processes as it appears in the debugger.

PROCESS 890ce020 SessionId: 0 Cid: 07d0 Peb: 7ffdd000 ParentCid: 07c8
DirBase: 6dba0000 ObjectTable: 00000000 HandleCount: 0.
Image: cmd.exe
VadRoot 00000000 Vads 0 Clone 0 Private 0. Modified 17. Locked 0.
DeviceMap e12b4410
Token e3d58030
ElapsedTime 20:42:16.447
UserTime 00:00:00.062
KernelTime 00:00:00.546
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (4, 50, 345) (16KB, 200KB, 1380KB)
PeakWorkingSetSize 1120
VirtualSize 10 Mb
PeakVirtualSize 13 Mb
PageFaultCount 1299
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 0

No active threads