I had to debug an annoying little problem today that I thought might be worth writing about. I was interested in walking through some code that was failing, but the same code was getting called in a recursive loop, so there were literally hundreds of successful runs that I was not interested in prior to the single failure I did care about.

Now a normal usermode developer might just add some special code at the point of failure to detect the failure and recall the failing function. Nice and easy. But that’s really not any fun, and when you’re doing kernel debugging, writing some new code and getting it running on the machine is not quite as simple (it’s not hard, just more time consuming).

Enter this neato debugging trick…

bp address "j (dwo(status)!=0) 'r @rip=fffff880`02b5bd1f'; 'gc'"

Basically this executes a conditional test (the “j” command) each time the breakpoint is hit. If the DWORD value represented by the variable named ‘status’ is non-zero, then I know I’ve hit the failure condition. In that case, I just adjust the instruction pointer back up to before the failing function call, leaving me right where I am ready to trace into the function and see the failure. Otherwise, the breakpoint essentially just hits ‘Go’ to continue on to the next hit.

The syntax here is a bit rough, and would have to be modified if your program isn’t always at the same code location (since I hard-coded the rip register). It could be replaced with an offset from the current location to be a bit more elegant. But since I was working on a driver, it was always in memory and at the same place, so I was lazy. (A habit that always pays off immediately.)

Quite some time back I wrote a blog entry on how to make conditional breakpoints. In particular, I was looking for a way to find when a certain pool tag allocation occurred. Well it turns out there is a MUCH better way of doing it that what I posted in that blog entry.

The latest issue of “The NT Insider” from OSR just came out, and has an article on debugging techniques. Apparently there is a global value that you can set to a given pool tag that you would like to break on allocation for. So if I were looking for the tag ‘Test’ I could set the following value in the debugger:

kd> ed nt!PoolHitTag 'tseT'

Note that this technique will only work if pool tagging is enabled on the system. But nevertheless, this technique is a lot more efficient and faster than the method I showed previously. (Although the technique could still be good for other scenarios.)

I have had the opportunity to work for a number of different software companies each of which has localized their product into different languages. These opportunities have led to me experience a number of different processes around localization, and a great deal of pain surrounding the whole issue. Localization sucks. However, I don’t believe that it has to suck, and I think there are processes that if put in place can make this a relatively painless process.

I believe the biggest problem area that I have seen in localization is the interweaving of responsibilities. Let me describe the process from one company I have worked for. The developers create their resources in an english-only version, and the sources for these resources are checked into the source control. Then at some predetermined point, a developer packages up all the resource source files and sends them off to the localization team. They have some database which strings are loaded into, translated (often by an third party), and then new source files are generated for the various languages. The source files are then sent back to the developers who have to check them in and test them to make sure that they compile. Net result: days of turnaround time for even trivial localization changes, and hours of wasted time.

The root issue here is that you have developers who are involved in localization, which is not their primary responsibility, and localizers who are involved in development (by producing source files that have to be compiled). I think that we can resolve these issues and streamline the process by simply separating the functions more distinctly.

First, we need to have a method whereby localization can take place during the build process without any intervention. My first thoughts here are that we could create a program that is part of the build system that localizes an english source file into another language source file. The data that this program would consume is a translation database that would be checked into the source control system. The localization team is responsible simply for checking in a new copy of the database when they have updates. This system is a good improvement on the original, but it still leaves the problem that the files have to be compiled after the localization process changes them. One wrong string (with a misplaced quote character, etc.) and the build fails.

A better way to solve this problem is that the build system builds the initial binaries. Then the localization program is pointed at the binary file containing english resources. It edits this binary file and produces a copy of it with updated resources. In this way, we have isolated the localization process from the development process as much as possible.

The program that is responsible for performing the localization of binaries should also produce as part of its output a file that shows what resources are new or updated. This resources need further localization work. There needs now to be a tool that the localizers use to edit the localization database. It should be able to consume this build output product to import the new strings into their database. In this way the localizers never have to deal directly with the product. They only deal with the inputs to a tool that they own, and the outputs from a tool that they own.

Another side improvement to the general localization process is how to do a large degree of localization before any translation takes place. This process is what I have heard referred to as “pseudo-localization”. The localizing tool should be capable of producing a “pseudo-localized” binary, that does not have ANY real strings. Rather, it should take all existing string resources, generate enough gibberish to pad the value and make it long enough to simulate a translated string. (Usually strings grow by something like 30% or so when translated to certain languages.) The product can then be installed and tested with these pseudo-localized strings to find spacing issues, etc. long before any translation work is done.

When writing a driver, there are times when you may want to call a function if it is available on the version of the operating system you are running on, but it may not always be available. For example, I recently came across a need to use the ZwRenameKey function which was added in Windows XP. My driver also runs on Windows 2000 so I need to dynamically detect and use this routine if it is available. Enter the handy function MmGetSystemRoutineAddress. But wait… it doesn’t seem to work for ZwRenameKey, which is apparently not made public and therefore cannot be gotten using that routine.

But since I really need to use it (don’t ask why… long story) I’m going to have to find another way to get the address of the routine. The first step is to get the address of the service descriptor table.

kd> x nt!KeServiceDescriptorTable
8089f460 nt!KeServiceDescriptorTable = <no type information>

This table actually has four entries, the first of which is used for the Native API. (See Microsoft Windows Internals, Fourth Edition, page 122 for more information about these structures.) So we get the address from the first entry.

kd> dd 8089f460 L4
8089f460 80830bb4 00000000 00000128 80831058

Now we just need to dump this table with symbols so we can find the routine we’re interested in.

kd> dps 80830bb4 L120
80830bb4 80917510 nt!NtAcceptConnectPort
80830bb8 80962516 nt!NtAccessCheck
80830bbc 809667ce nt!NtAccessCheckAndAuditAlarm
80830bc0 80962548 nt!NtAccessCheckByType
80830bc4 80966808 nt!NtAccessCheckByTypeAndAuditAlarm
80830bc8 8096257e nt!NtAccessCheckByTypeResultList
80830bcc 8096684c nt!NtAccessCheckByTypeResultListAndAuditAlarm
80830bd0 80966890 nt!NtAccessCheckByTypeResultListAndAuditAlarmByHandle

80830ed4 808b0f88 nt!NtRenameKey

And then a little bit of math will tell us the offset. With this offset we can write some code to go to this offset and get the address of the routine we need.

kd> ? (80830ed4 – 80830bb4) / 4
Evaluate expression: 200 = 000000c8

Note that this is not a great thing to have to do. These offsets are not guaranteed to stay the same, and they are definitely different between versions of the operating system.

Sometimes you have an ASSERT in your code and for some reason it starts being hit all the time. It’s good to know about it the first time, but if it’s happening hundreds of times a second, it can make debugging (or just replacing the code with something that doesn’t ASSERT) very difficult. Enter windbg and the power to kill an ASSERT.

When the debugger breaks in on an ASSERT, look at your disassembly. You will probably see something similar to the following. Yours may look a little different, depending on what you’re ASSERTING on, but it will be similar.

f6bc9923 7414 jz driver!function+0x1d9 (f6bc9939)
f6bc9925 6819050000 push 0x519
f6bc992a 68c096bcf6 push 0xf6bc96c0
f6bc992f e874010300 call driver!DbgPrint (f6bf9aa8)
f6bc9934 83c408 add esp,0x8
f6bc9937 cd01 int 01

The key element here is the first line which is actually doing the test on the ASSERTion. If the ASSERT comparison evaluates to zero (i.e., the ASSERT succeeds), then it jumps over the call to output a debugger string and the int 01 which breaks into the debugger.

So what we want it to do is always skip that section. If we enter windbg into the assembler mode using the “a” command (of course telling it what address to assemble into), and then replace the jz instruction with a jmp instruction, that’s all we need.

kd> a f6bc9923
f6bc9923 jmp driver!function+0x1d9
jmp driver!function+0x1d9

kd> g

Note that the only affect on the binary code is to change the first byte from a 74 to an EB. You could accomplish the same thing that way instead of using the assembler, which is as simple as:

kd> eb f6bc9923 eb

I was helping another developer with some work the other day and thought that what we came up with might be useful for others. We had a handle to a registry key that we got from somewhere (i.e., some other unrelated part of the code had opened it) and we needed to determine the name of the key.

In the past I have wrapped all my registry handling in some class objects that also maintain the name of the key. Each time a key is opened relative to another, it copies the key name from the parent key class to make up the key name of the new class. It certainly works, but it seems silly to maintain something that is already being stored somewhere down in the kernel.

So we created some code to use the native API function NtQueryObject. It returns the object name in the form that it’s store in down in the kernel, but was just perfect for what we needed. The name comes back in the form “REGISTRYMACHINESOFTWAREMicrosoft”.

A few interesting notes about this technique. It mostly tells you what the REAL name of the object is, so if you’re on a 64-bit system and you open HKLMSoftwareMicrosoft, and then query its name, you will see that it is REGISTRYMACHINESOFTWAREWow6432NodeMicrosoft. However, if you open a file object with a short name and then query it, you will STILL get the short name back.

For a sample program, continue reading…

=> Read more!

Today I ran into a need to set a breakpoint that would only stop when a certain string was encountered. In the past I have just modified the code to test for the string, and then update my driver, reboot, etc. A very time consuming process. So today I decided that I wanted to figure out how to do it right in the debugger. I knew it was possible from comments, but didn’t know how to implement it.

First of all, since runs a non-trivial set of commands each time the breakpoint is hit, so I placed the commands in a secondary file. There may be a way to get this all on a single line breakpoint command, but I don’t see it. So the breakpoint we create is just going to run the commands from the secondary file. The command to create the breakpoint is something like this:

bp driver!functionName “$$< C:\debugCommands.txt”

Then comes the important part – the actual commands that get executed. We need to evaluate a string against a pattern, which the masm expression evaluator can handle using the “$spat” command. The hard part about that is that at first glance it only appears to work with string literals. So $spat( “Big string”, “*str*” ) will work, while $spat( poi(variableName), “*str*” ) just laughs mockingly at you.

The key here is to assign the string to an alias which will then allow it to be evaluated by the $spat command. So using our example comparison, the commands in the secondary file look like:

as /mu ${/v:MyAlias} poi(variableName);
.if ( $spat( “${MyAlias}”, “*str*” ) != 0 ) { g }

The commands evaluate the string. If a match is not found, the g[o] command is executed, otherwise execution will stop at the point when the pattern is found. Note that there are much more complicated pattern matching expressions available as well.

I found a very strange bug with a kernel driver I do some work on this last week. It only seems to appear when we have GoogleDesktop or the MSN Desktop Search installed. After some period of time the graphics display just starts doing some whacky stuff: fonts don’t display, repaints don’t work quite right, just general whackiness.

So since this is only happening after some period of time on a machine, we started looking at resources. Using the pooltag tool from sysinternals we found one particular tag that looked out of control. The tag was FSrN, which according to the lists I could find is “File-system runtime”. Helpful, no?

So that brings me to the meat of the story. I wanted to find out who was allocating that and what the call stack looked like at the time. In order to do this I needed to put a breakpoint on ExAllocatePoolWithTag, but that gets called ALL the time. I only wanted to break when we hit the right tag value. So I came up with the following command to set the breakpoint in windbg:

bp nt!ExAllocatePoolWithTag “j (Poi(ss:esp+c) = ‘NrSF’) ‘kb; db ss:esp+c’; ‘gc’ “

You can check out the windbg help for more information on conditional breakpoints. It will explain what the above command means. I wanted to post it since I couldn’t find any good example of how to do this with a non-integer value.

One of the virtues of programming that I have found very useful and have been trying to discipline myself to use is what I like to call complete coding. This is the practice of writing shipping quality code ALL the time.

Most programmers try out an algorithm or put a thought into code very quickly to test the idea. There are two major problems with this practice. All too often the code ends up not being cleaned up and sits in the live project until somebody happens to find a bug in it. Or, the code gets tested in its quick and dirty form, then cleaned up, and checked in without further testing. Both of these practices put bugs into the code when they don’t need to be there. (The bugs are just a result of sloppy procedures and are fairly easy to avoid.)

So with these thoughts in mind, I have a few recommendations to make to programmers (myself included).

1- Always write error handling code as you are writing the main code paths. If you don’t it probably won’t get done, and it will be a while before the bugs get found since those error code paths by definition don’t get executed during normal operation.

2- Always write the code in such a way that you can understand it easily. If you have to think about it to write it, you will have to think about it to maintain it. And most importantly, it will be much more difficult to be sure you wrote it correctly the first time. (Thinking is just too darn hard, he he.)

3- Consider reuse of the code in the initial writing. Sometimes this means hiding implementation details behind an interface. Sometimes it means doing the refactoring work NOW, even though you desperately want to just finish the piece of code you’re working on.

4- Always run through the code in your head (or even better, in a debugger) with sample input to make sure the code is complete. Think about the what-ifs (the things you never think will happen, but sooner or later they will).

Well, there are probably many more, which is why there’s entire books about this subject. But these are few that I have noticed as being extremely useful. Let me know if you have other “complete coding” practices that others could benefit from.

So I was working on a bug this last week in a Windows kernel-mode driver. It was really quite a strange symptom and once I found the problem I thought it might be useful to share since I wasn’t able to find any information on it myself.

So first, the symptom. After running a system for a while, I broke in with the kernel debugger to examine what looked like a deadlock. When I did a “!process 0 7” to examine the processes on the system and see what might be deadlocked, I found that every single process that I had run on the machine was still sitting around in memory with no active threads.

Well it turns out that the code was using the undocumented function PsLookupProcessByProcessId. Apparently, unlike it’s documented cousin PsGetCurrentProcess, PsLookupProcessByProcessId bumps the reference count on the EPROCESS object that it returns. The function thus requires a call to ObDereferencePointer to release the ref count and allow the process object to be closed.

Following is an example of one of the processes as it appears in the debugger.

PROCESS 890ce020 SessionId: 0 Cid: 07d0 Peb: 7ffdd000 ParentCid: 07c8
DirBase: 6dba0000 ObjectTable: 00000000 HandleCount: 0.
Image: cmd.exe
VadRoot 00000000 Vads 0 Clone 0 Private 0. Modified 17. Locked 0.
DeviceMap e12b4410
Token e3d58030
ElapsedTime 20:42:16.447
UserTime 00:00:00.062
KernelTime 00:00:00.546
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (4, 50, 345) (16KB, 200KB, 1380KB)
PeakWorkingSetSize 1120
VirtualSize 10 Mb
PeakVirtualSize 13 Mb
PageFaultCount 1299
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 0

No active threads