Debug Break on Access

I had to diagnose a crash in my driver code recently. Every time it happened, it was because a particular pointer value inside a structure had been decremented incorrectly. Decrementing the actual pointer is something that should never happen, and I couldn’t figure out how the hell it was. I had tried breakpoints in all the code that I thought was potentially problematic, and turned up nothing. Finally I came up with the following gem…

ba w 8 poi(frxdrvvt!ActiveRedirections)+0x78 ".if (poi(poi(frxdrvvt!ActiveRedirections)+0x78) % 2 == 0) { g }"

Basically, it says break on write access to the pointer in question, and if the value is divisible by 2 (since pointers have to be aligned on 8-byte boundaries) then just continue. It broke on the decrement I suspected was there and was immediately obvious what was happening. (It was my new code, but it was code I never suspected of being problematic because it was so simple, lol.)

Finding the size of a list in Kernel debugger

When I am kernel debugging, I use the !list command a LOT. It basically walks a doubly-linked list and dumps the memory of each list entry. You can also have it dump a more readable type, or have it run commands for each list entry. Today I ran across a list that was extremely large. I wasn’t really interested in the data so much as understanding just how large it was. So I had to figure out how to count the number of list entries and I figured I would share.

r $t0 = 0; !list " -x \"dd @$extret L0; r $t0 = @$t0 + 1\" 0xffffffff'0a1bcdef "; ? @$t0

Gibberish? Let’s dig in and analyze the statement. For starters, it’s a multi-part command, separated by semi-colons. The first statement sets a pseudo-register to 0, the next part uses the !list command to hit every list entry, and the last part just evaluates the pseudo-registry and dumps it’s value afterward.

r $t0 = 0
!list " -x \"dd @$extret L0; r $t0 = @$t0 + 1\" 0xffffffff'0a1bcdef "
? @$t0

Now the guts of this is in the !list command, which runs a multi-part command for each list entry. The first part of the command is just there to use up the @$extret parameter, so windbg doesn’t try to tack it onto the end of our second command, which is simply incrementing the value in the pseudo-register.

Voila! Now I know that my list has 21,000 or so entries in it, and I now understand why there just might be a performance problem. 😉

Happy debugging!

Signing Kernel-mode Drivers with SHA-2/SHA-256

I hit a pretty frustrating problem the other day when I renewed my digital certificate for code signing and my drivers stopped working. When I renewed the certificate through VeriSign (Symantec), I was given the option of choosing a SHA-1 or a SHA-2 (with 256-bit digest) hashing algorithm for the certificate. After reading through the background information on these options, and understanding that the new and improved SHA-2 was supported on Vista and above, I determined that SHA-2 would meet my requirements and clicked the button.

I got the new cert, put it in my cert store, and reconfigured all my projects in Visual Studio to sign with the new cert. Everything was working perfectly. It installed and run fine on my test machine, but then one of my co-workers complained that the driver wouldn’t load on his machine. After a short while I realized that I had the same problem on my machine, I just wasn’t seeing it because I had a kernel debugger attached. (Note that there are some registry options that can enable mandatory kernel mode code signing even when the debugger is attached. Read more at MSDN.)

So now that I could reproduce the problem, I started troubleshooting the signing itself. Using “signtool.exe verify /kp”, I verified that the driver was signed, all certificates were valid, and the cross-certificate from Microsoft was also in the signing chain. Signtool claimed that everything was just fine, but the darn thing still wouldn’t load. I was getting the error ERROR_INVALID_IMAGE_HASH (0x80070241): “Windows cannot verify the digital signature for this file. A recent hardware or software change might have installed a file that is signed incorrectly or damaged, or that might be malicious software from an unknown source.” After searching the event logs for a bit, I found the Microsoft-Windows-CodeIntegrity log which contained an Event 3004 with basically the same message as the error code above.

After exhausting my Google-Fu skills with no results, we started brainstorming about what was different between the build yesterday that worked and the one today. Somebody mentioned encryption algorithms, and the light bulbs went on in my head. Adding the specific case of SHA-2 to my searching yielded a couple of pages: Practical Windows Code and Driver Signing and PiXCL: Signing Windows 8 Drivers.

Some of my own testing showed that I couldn’t get a driver built with Visual Studio and a SHA-2 certificate to load on both Windows 7 and Windows 8. Theoretically you could sign it twice with each algorithm (using the /fd switch of signtool), but Visual Studio with the WDK is not really set up to do that, I didn’t want to go monkeying with MSBuild stuff too much.

After calling Symantec support and describing the problem we were basically told, “oh yeah, the support for SHA-2 is not really there for a lot of systems.” You think? Maybe you could have mentioned this when we were renewing the certificate?

The moral of the story, dear reader, is stick with the SHA-1 hash algorithm for signing your kernel mode drivers, at least for the time being.

Registry Filters and Symbolic Links

I ran into an interesting case while working on a registry filter driver the other day. Boiled down to its most basic form, my driver is blocking access to registry keys by name. I watch for a name that matches one in my list then fail the open/create. The interesting case is when a registry key is a symbolic link (I.e., was created with REG_CREATE_LINK and has a REG_LINK value named SymbolicLinkValue that points to a second key).

All my filtering work takes place in the Pre-Open/Create path, and because the name coming in doesn’t actually represent what really will be opened, I can’t exactly rely on the name. So after digging into how these operations get handled, I have found some very odd behavior. I see two different behaviors for an open/create under these conditions. I created a sample filter driver to test this and tested on Windows 8 x64 and Windows 7 x64.

  1. When I try to open the link source (the key that links to a second key), I see the following:
  1. RegNtPreOpenKeyEx with CompleteName containing the name of the key – I return STATUS_SUCCESS to allow processing to continue.
  2. RegNtPostOpenKeyEx with PreInformation->CompleteName containing the key name from the SymbolicLinkValue. Status contains STATUS_REPARSE.
  3. RegNtPreOpenKeyEx with Complete Name containing the key name of the link target (from step b).
  4. RegNtPostOpenKeyEx with Status equal to STATUS_SUCCESS.
  • After a small number of repetitions (usually 2) the above behavior stops, and is replaced with the following:
    1. RegNtPreOpenKeyEx with CompleteName containing the name of the key – I return STATUS_SUCCESS to allow processing to continue.
    2. RegNtPostOpenKeyEx with Status equal to STATUS_SUCCESS. PreInformation->CompleteName contains the original name from step a, but the Object contains a pointer to a registry object for the link target key.

    Additionally, the behaviors above seem to revert after some period of time. If I run a simple registry open command for the link source a hundred times, I see behavior #1 twice and then #2 the remainder of the time. But after doing that if I let the machine sit idle for 4-5 minutes and run it again, I will again see behavior #1 twice, followed by #2 for as many times as I care to repeat it.

    I e-mailed some contacts at Microsoft and they confirmed what I have been seeing. The registry maintains a key lookup cache that is intended to speed up open/create operations. One thing that the cache can do is remember symbolic links and avoid the need to reparse them every time they are opened. This is a bit frustrating for me, but it probably makes a significant difference in registry performance (which, incidentally is a heck of a lot faster than file system performance).

    Unfortunately it would seem that the way to solve this problem is to store some of the data I need internally to my driver based on watching the systems behavior. So, for example, I can watch for the STATUS_REPARSE result to come back, and do something with the CompleteName value. So I could add a context to the call in the pre-open, put the original complete name in the context, then in the post-open when I see STATUS_REPARSE, I can add an entry to a list that says “Name A links to Name B”, which I can thereafter use in my pre-open. I then also have to deal with cases where the link key gets removed (I can watch for a pre-open with OBJ_OPENLINK, place a context on the object, then watch for the delete, and clean up my internal cache). I also have to deal with the case where the link gets retargeted to something else (I can watch for the pre-open with OBJ_OPENLINK, context the object, watch for the pre-set-value for REG_LINK named SymbolicLinkValue and update my cache).

    I won’t show the solution code here, since its fairly complicated, and it is the intellectual property of my company, but the solution is fairly straight-forward once you understand what needs to be done. I will, however, show some of the code that I actually used to test this and grok the behavior.

    Continue reading “Registry Filters and Symbolic Links”

    Modern C++ in the Windows Kernel

    In the past few months I have been playing around with Microsoft Visual Studio 2013 and trying out a bunch of the new C++ language features that are supported. Many of these were already in 2012, but I have been making a focused effort to make sure I understand all the new language features. I am loving the new stuff, and the way it makes my code easier to write.

    Unfortunately, when it comes to using all these cool new features in daily work, I am a kernel developer, and most of this seems to be problematic in the kernel. Specifically, I wish there were a kernel-friendly Standard Template Library that we could use in kernel-land. There have long been problems documented with using C++ in the Windows kernel. The Windows compiler team has done some work in 2012 to add a /kernel switch that is supposed to help with some of this, but really it seems to do no more than make sure you don’t use C++ exceptions.

    What I really believe the Windows kernel community needs, however, is a concerted effort to make all the modern C++ language features including STL available to kernel developers. I believe that it really impacts the quality of the driver ecosystem to have every single developer writing their own lists and list processing code, and trying to create their own wrappers for things like locking, IPC, etc.

    One of the purposes of the new language features is to make it easy for developers to write good code, and to enable them to do the right thing. Kernel mode programming makes it difficult to do the right thing. And the worst part about that is that when you do the wrong thing in kernel mode, you crash the machine altogether (as opposed to user mode where you simply crash your own process and the system continues merrily on its way).

    I understand that there are inherent difficulties in kernel mode programming, and things that you don’t have to worry about in normal C++ code, such as controlling what memory gets paged vs. non-paged, or worrying about code that can only run at certain IRQLs. So I believe that in addition to needing modern C++ and STL, we would need some (probably Microsoft-specific) extensions to help deal with these extra little problems. For example, when you declare a template, the code gets generated when the template is actually used. If the template is used in a non-paged function, and in a paged function, then we probably need a way to deterministically say how the template code should be generated.

    This is stuff that can all be done, and Microsoft compiler guys are GOOD at it. They could make our lives so much easier: not just for programmers, but for consumers who are sick of having drivers that crash their machines. Come on guys
 do it for freedom. Do it for the children. Just do it. Please?

    Debugging User-mode Processes from the Kernel-mode Debugger

    In my day job, I almost exclusively use a kernel-mode debugger attached to a Hyper-V virtual machine as the target. Most of the serious issues I see are kernel-mode issues (naturally those are the worst because they tend to crash the machine, aka the Blue Screen of Death). But periodically I see something pop up in a user-mode component, and it’s honestly just a big pain in the butt to have to install debugging tools on the target machine, and then reproduce whatever is going on under that debugger. Especially since I am already in a debugger!

    I hit this just the other day when my user-mode component was failing an assertion, and I wanted to see what it was without having to jump through all those hoops.

    I recently figured out how to debug the user-mode process from my kernel mode debugger. It’s not perfect, but it works pretty well, and it’s certainly more efficient than the alternative. What you do is use the .process debugger command with a few parameters. So here are the series of commands that I used.

    !process 0 0
    .process /i /r /p ffffffff12345678
    g

    The first command is basic, and just lists all the processes on the machine. I used this just so I could get the process object pointer for use in the next command.

    The second command is where the magic happens. The switches to the process command attach to the process invasively, which means that the process will actually become active. Another does some address translation to make it easier to see the user-mode process space. The third reloads the user-mode symbols for the process. In my experience, sometimes I have to still do this manually.

    Finally, the go command is required because the target system actually has to activate the process. This will return almost immediately, and you will be running in the context of the user-mode process. You can now set breakpoints, step through code, etc. all without leaving the comfort of your kernel-mode debugger.

    Happy debugging!

    Basics of Pending I/O in Windows Kernel

    So I was troubleshooting a bug with a co-worker the other day and what we were seeing was a little confusing, and it turned out to be a difference in when I/O gets pended in the system. The basic scenario was that we were calling ZwQueryDirectoryFile to do some testing of a filter driver that was changing the directory query results. In testing without the driver installed, the return value from ZwQueryDirectoryFile was always STATUS_PENDING. So the assumption was made that this would continue to always be true.

    Well as soon as our filter was installed and we changed the results of the query, the api started returning immediately. Our code was assuming that then I/O would be pending and was waiting on the handle and checking the IoStatusBlock for the results, and it just wasn’t looking correct: the IoStatusBlock was showing STATUS_SUCCESS when we thought our filter had returned STATUS_NO_SUCH_FILE. After figuring out what was going on, I thought I would write a bit about how pending I/O works in the kernel.

    When an API call is made, there are times when it can be immediately determined what result should be returned. For example, if you pass a bad parameter, the system doesn’t need to pend the I/O because it can return STATUS_INVALID_PARAMETER immediately. In this case, ZwQueryDirectoryFile would just return STATUS_INVALID_PARAMETER as its result, and the IoStatusBlock should be ignored.

    If the API call returns STATUS_PENDING, only then should the caller wait on the handle to be signaled and then check the value of the IoStatusBlock parameter to determine the outcome of the I/O operation.

    In our specific case, our filter driver was populating the results and returning STATUS_NO_SUCH_FILE. This was changing the behavior of the test call to ZwQueryDirectoryFile from sending the request to disk and pending the I/O (the behavior without our driver) to returning the status code immediately.

    So it seems that a correct way to call one of these functions should look something like the following:

    IO_STATUS_BLOCK IoStatusBlock;
    NTSTATUS status = ZwQueryDirectoryFile( ... );
    if ( STATUS_PENDING == status )
    {
       ZwWaitForSingleObject( ... );
       status = IoStatusBlock.Status;
    }
    // Do something with the result of the query directory call
    

    ZwQueryInformationFile with FileStreamInformation equals an empty buffer

    Debugged a fun crash today. I had some driver code that was enumerating all the child streams of a file. Because of a bug it ended up calling the code for a directory rather than a file. What I would have expected to see is that a call to ZwQueryInformationFile for the FileStreamInformation class would have returned at least one FILE_STREAM_INFORMATION structure, or an error code such as a not found. Instead it seems that the call returns STATUS_SUCCESS, and the IoStatusBlock.Information contains zero bytes returned. From what I can tell this only happens on a directory.

    IoGetCurrentIrpStackLocation in the debugger

    Today I had occasion to debug a problem with some IRP handling code in my driver. In the particular debugger session I found myself in, I wanted to examine some of the Irp parameters, found in the current stack location. Unfortunately I had only a pointer to the Irp in this code, and therefore needed to figure out how to find the stack location pointer using the debugger. Fun!

    The function that does this in code is IoGetCurrentIrpStackLocation, which I decided to disassemble. The first part of the function basically just checks the StackCount and CurrentLocation members of the Irp to make sure that everything is ok. (This fires an assert if it doesn’t check out.) Then near the bottom of the function, we find

    drv!IoGetCurrentIrpStackLocation+0x42 [source language="removed"][/source]:
    23318 9dc43ea2 8b4d08          mov     ecx,dword ptr [ebp+8]
    23318 9dc43ea5 8b4160          mov     eax,dword ptr [ecx+60h]
    23319 9dc43ea8 8be5            mov     esp,ebp
    23319 9dc43eaa 5d              pop     ebp
    23319 9dc43eab c20400          ret     4
    

    So first this moves the Irp pointer into ecx, and then goes to offset 60 within that structure (which happens to be outside the range of the documented structure), and puts the pointer there into eax for return to the caller. So I try that in my debugger and compare with the output of the !irp command.

    kd> !irp 9f104f68
    Irp is active with 1 stacks 1 is current (= 0x9f104fd8)
     No Mdl: System buffer=9f08cbf0: Thread 88e05558:  Irp stack trace.  
         cmd  flg cl Device   File     Completion-Context
    >[  e, 0]   5  1 88dcbd18 88e05ab8 00000000-00000000    pending
    	       FileSystemFSLX
    			Args: 0000040c 00000000 94000004 00000000
    
    kd> dd 9f104f68+60 L1
    9f104fc8  9f104fd8
    
    kd> db 9f104fd8
    9f104fd8  0e 00 05 01 0c 04 00 00-00 00 00 00 04 00 00 94  ................
    9f104fe8  00 00 00 00 18 bd dc 88-b8 5a e0 88 00 00 00 00  .........Z......
    9f104ff8  00 00 00 00 15 15 15 15-?? ?? ?? ?? ?? ?? ?? ??  ........????????
    

    At this point, I now realized that this structure that I needed to look at really doesn’t provide anything that’s not already provided in the !irp output. The Args output of that command correspond to the members of the IO_STACK_LOCATION.Parameters union. In this case, I am looking at a device control Irp, so these parameters are OutputBufferLength (40c), InputBufferLength (0), IoControlCode (94000004), and Type3InputBuffer (0).

    So I guess the bottom line of this post is that !irp is cool and does just what you need it to, I just had to poke around a little bit before I realized it.

    Undocumented Query Directory Flags

    Last week I ran into a strange Windows file system behavior that I couldn’t find any information on anywhere. Since it’s always extremely frustrating to try to figure things out when there’s no information available, I thought I would share what I found. The bug we were experiencing had to do with a directory query operation over the network (e.g., when you run ‘dir \localhostc$Windows’ from a command window). If the directory doesn’t have many files in it this works just fine, but if it is a large directory, as in the example above then the IRPs that are issued to the file system are a bit strange. Our filter driver wasn’t handling these quite correctly, and the result was that if you queried the directory using the local name you’d get ~200 files, and if you used the UNC name you’d only get about ~150 files.

    After digging into this with a coworker, we found an unexpected style of IRP. When performing the directory query over the network the SRV kernel component issues a IRP_MN_QUERY_DIRECTORY with a IrpSp->Parameters.QueryDirectory.FileName and IrpSp->Parameters.QueryDirectory.FileIndex combination that seems to essentially reset the point at which the enumeration continues. The sequence we were seeing goes something like this:
    Continue reading “Undocumented Query Directory Flags”