I ran into an interesting case while working on a registry filter driver the other day. Boiled down to its most basic form, my driver is blocking access to registry keys by name. I watch for a name that matches one in my list then fail the open/create. The interesting case is when a registry key is a symbolic link (I.e., was created with REG_CREATE_LINK and has a REG_LINK value named SymbolicLinkValue that points to a second key).

All my filtering work takes place in the Pre-Open/Create path, and because the name coming in doesn’t actually represent what really will be opened, I can’t exactly rely on the name. So after digging into how these operations get handled, I have found some very odd behavior. I see two different behaviors for an open/create under these conditions. I created a sample filter driver to test this and tested on Windows 8 x64 and Windows 7 x64.

  1. When I try to open the link source (the key that links to a second key), I see the following:
  1. RegNtPreOpenKeyEx with CompleteName containing the name of the key – I return STATUS_SUCCESS to allow processing to continue.
  2. RegNtPostOpenKeyEx with PreInformation->CompleteName containing the key name from the SymbolicLinkValue. Status contains STATUS_REPARSE.
  3. RegNtPreOpenKeyEx with Complete Name containing the key name of the link target (from step b).
  4. RegNtPostOpenKeyEx with Status equal to STATUS_SUCCESS.
  • After a small number of repetitions (usually 2) the above behavior stops, and is replaced with the following:
    1. RegNtPreOpenKeyEx with CompleteName containing the name of the key – I return STATUS_SUCCESS to allow processing to continue.
    2. RegNtPostOpenKeyEx with Status equal to STATUS_SUCCESS. PreInformation->CompleteName contains the original name from step a, but the Object contains a pointer to a registry object for the link target key.

    Additionally, the behaviors above seem to revert after some period of time. If I run a simple registry open command for the link source a hundred times, I see behavior #1 twice and then #2 the remainder of the time. But after doing that if I let the machine sit idle for 4-5 minutes and run it again, I will again see behavior #1 twice, followed by #2 for as many times as I care to repeat it.

    I e-mailed some contacts at Microsoft and they confirmed what I have been seeing. The registry maintains a key lookup cache that is intended to speed up open/create operations. One thing that the cache can do is remember symbolic links and avoid the need to reparse them every time they are opened. This is a bit frustrating for me, but it probably makes a significant difference in registry performance (which, incidentally is a heck of a lot faster than file system performance).

    Unfortunately it would seem that the way to solve this problem is to store some of the data I need internally to my driver based on watching the systems behavior. So, for example, I can watch for the STATUS_REPARSE result to come back, and do something with the CompleteName value. So I could add a context to the call in the pre-open, put the original complete name in the context, then in the post-open when I see STATUS_REPARSE, I can add an entry to a list that says “Name A links to Name B”, which I can thereafter use in my pre-open. I then also have to deal with cases where the link key gets removed (I can watch for a pre-open with OBJ_OPENLINK, place a context on the object, then watch for the delete, and clean up my internal cache). I also have to deal with the case where the link gets retargeted to something else (I can watch for the pre-open with OBJ_OPENLINK, context the object, watch for the pre-set-value for REG_LINK named SymbolicLinkValue and update my cache).

    I won’t show the solution code here, since its fairly complicated, and it is the intellectual property of my company, but the solution is fairly straight-forward once you understand what needs to be done. I will, however, show some of the code that I actually used to test this and grok the behavior.

    Continue reading

    A while back I wrote a tool called NativeTest and open-sourced it on CodePlex. You can view the original post about this at the FSLogix Blog. Today I was troubleshooting a problem with registry links that required me to look at what was really going on in the registry, without it redirecting me through symbolic links. So I turned to the NativeTest project, which was able to help out.

    For those who don’t already know, the registry can have symbolic links where a key actually is just a pointer to another key. So, for example, the HKLM\Software\CurrentControlSet key is a symbolic link created during system boot that points to HKLM\Software\CurrentControlSet001 (or some other numbered control set, depending on your hardware configuration).

    Viewing a registry symbolic link involves opening the key with a special flag (REG_OPTION_OPEN_LINK) and then querying the value named “SymbolicLinkValue”. The value’s data will be a fully qualified native path to another registry key. The only difficult part about doing this with NativeTest is figuring out that the numeric option for REG_OPTION_OPEN_LINK is equivalent to 8, which you can find in the Windows SDK header files.

    The NativeTest session for viewing the link at HKLM\System\ControlSet001\Control\Print\Printers is as follows:

    > openkeyex -name \registry\machine\system\controlset001\control\print\printers -access kr -options 8
    Status = 0x00000000
    Handle = 40 (AUTO-0)
    > enumvaluekey -handle AUTO-0 -index 0 -class full -bufferlen 256
    Status = 0x00000000
    ResultLength = 210
    00 00 00 00 06 00 00 00     ........
    38 00 00 00 9a 00 00 00     8.......
    22 00 00 00 53 00 79 00     "...S.y.
    6d 00 62 00 6f 00 6c 00     m.b.o.l.
    69 00 63 00 4c 00 69 00     i.c.L.i.
    6e 00 6b 00 56 00 61 00     n.k.V.a.
    6c 00 75 00 65 00 46 00     l.u.e.F.
    5c 00 52 00 65 00 67 00     \.R.e.g.
    69 00 73 00 74 00 72 00     i.s.t.r.
    79 00 5c 00 4d 00 61 00     y.\.M.a.
    63 00 68 00 69 00 6e 00     c.h.i.n.
    65 00 5c 00 53 00 6f 00     e.\.S.o.
    66 00 74 00 77 00 61 00     f.t.w.a.
    72 00 65 00 5c 00 4d 00     r.e.\.M.
    69 00 63 00 72 00 6f 00     i.c.r.o.
    73 00 6f 00 66 00 74 00     s.o.f.t.
    5c 00 57 00 69 00 6e 00     \.W.i.n.
    64 00 6f 00 77 00 73 00     d.o.w.s.
    20 00 4e 00 54 00 5c 00      .N.T.\.
    43 00 75 00 72 00 72 00     C.u.r.r.
    65 00 6e 00 74 00 56 00     e.n.t.V.
    65 00 72 00 73 00 69 00     e.r.s.i.
    6f 00 6e 00 5c 00 50 00     o.n.\.P.
    72 00 69 00 6e 00 74 00     r.i.n.t.
    5c 00 50 00 72 00 69 00     \.P.r.i.
    6e 00 74 00 65 00 72 00     n.t.e.r.
    73 00                       s.
    > closekey -handle auto-0
    Status = 0x00000000

    As you can see, this key is really just a pointer to HKLM\Software\Microsoft\Windows NT\CurrentVersion\Print\Printers.

    The next task is how can I actually create one of these links for my own testing, just so I don’t accidentally mess up one of the system keys. Creating a symbolic registry link is pretty easy. You just have to create they key with a certain option (REG_OPTION_CREATE_LINK), and then set the value named “SymbolicLinkValue”, which needs to be of type REG_LINK. Again, the hard part is figuring out that REG_OPTION_CREATE_LINK is the equivalent of 2.

    The session looks like this:

    > createkey -name \registry\machine\software\Test -access ka -options 2
    Status = 0x00000000
    Disposition = Created
    Handle = 48 (AUTO-0)
    > setvaluekey -handle auto-0 -name SymbolicLinkValue -type link -data \\Registry\\Machine\\Software\\Test2
    Status = 0x00000000
    > queryvaluekey -handle auto-0 -name SymbolicLinkValue -class full -bufferlen 512
    Status = 0x00000000
    ResultLength = 120
    00 00 00 00 06 00 00 00     ........
    38 00 00 00 44 00 00 00     8...D...
    22 00 00 00 53 00 79 00     "...S.y.
    6d 00 62 00 6f 00 6c 00     m.b.o.l.
    69 00 63 00 4c 00 69 00     i.c.L.i.
    6e 00 6b 00 56 00 61 00     n.k.V.a.
    6c 00 75 00 65 00 00 00     l.u.e...
    5c 00 52 00 65 00 67 00     \.R.e.g.
    69 00 73 00 74 00 72 00     i.s.t.r.
    79 00 5c 00 4d 00 61 00     y.\.M.a.
    63 00 68 00 69 00 6e 00     c.h.i.n.
    65 00 5c 00 53 00 6f 00     e.\.S.o.
    66 00 74 00 77 00 61 00     f.t.w.a.
    72 00 65 00 5c 00 46 00     r.e.\.T.
    53 00 4c 00 6f 00 67 00     e.s.t.2.
    > closekey -handle auto-0
    Status = 0x00000000

    Once you have created it, it’s kind of fun to poke around at the key with regedit. If you, like me, decide to clean up your test key and delete it in regedit, you will find that deleting the link source actually follows the link and deletes the target. The link source remains and now displays an error about the system cannot find the file specified. In my testing, it seems to be impossible to delete the link source in regedit, so we need to look at one more thing in NativeTest: deleting the link source key. The only trick here is that you have to open the key with REG_OPTION_OPEN_LINK so that the system doesn’t try to follow the link (which, incidentally is what regedit is doing).

    The session looks like this:

    > openkeyex -name \registry\machine\software\test -access ka -options 8
    Status = 0x00000000
    Handle = 52 (AUTO-0)
    > deletekey -handle auto-0
    Status = 0x00000000
    > closekey -handle auto-0
    Status = 0x00000000

    A recent feature addition to my product at work is logging to a custom event log channel using ETW. I have used technology built on top of ETW before, such as WPP tracing, and while it is a bit cumbersome, the advantages are great, and once it’s set up, it’s not hard at all to use. I have also done classic Event Logging before.

    With this recent feature work, I wanted to use the current technology, and I wanted to be able to log from my service and driver. I thought I had everything working. I created a manifest .xml file that defined my custom channels, and the events I wanted to log. I wrote some code to register/unregister with ETW, and to log various messages with their accompanying data.

    But in my testing, I was able to log events from my service (running as system), but not from my driver. The calls to EtwRegister and EtwWrite looked like they were working perfectly. They returned a success code, but nothing every appeared in the log. The documentation says they will behave this way (returning success but not logging anything) if the event is not enabled.

    I spend the next few hours trying to figure out why my events were not enabled. The logging level (Error, Warning, Information) was fine. The keywords were fine. The manifest file and the message resources it pointed to were okay. I even went so far as to write a notification callback so I could see what the system was enabling. It all looked fine.

    I finally decided to try using a built-in channel instead of my own, and I selected the SYSTEM channel. Bam! Everything started working. Then I started to think that this might be a security issue. This was a bit counter-intuitive because the APIs weren’t failing. I would think that if you tried to write an event but didn’t have rights, you should get an access denied error, but apparently the system just happily eats your event and gives you a thumbs up.

    The key turned out to be the isolation attribute on my custom channel. Setting the isolation=”system” attribute on the channel made everything start working.

    I don’t understand why the logging worked from my service, but not from my driver, since both were running in the system context. But there it is. A very simple solution to a maddening problem (at least for a day).

    Windows Installer stores information about components, products, etc. that are installed on the system in the registry. Most of these elements are identified by a GUID, but the Installer doesn’t always store the GUID directly in the registry. Instead it uses what I call SQUIDs, or Squished GUIDs. Basically they just reformat the GUID a bit and remove some of the extra characters that don’t add anything (hyphens and braces). You can read more about this here: A Brief Note on Installer GUIDs.

    Periodically when I am working on some installer code I can end up needing to search the registry for stuff that is related to a component I am working on. It is very helpful to be able to convert between SQUIDs and GUIDs. I usually just do this by hand, but today I decided I needed a bit more help doing this and wrote a tiny vbscript piece of code that does the conversion. On the chance that it might provide value for someone else, I have decided to post it here.

    Function SquidToGuid(s)
      SquidToGuid = _
        StrReverse(Mid(s, 1, 8)) & "-" & _
        StrReverse(Mid(s, 9, 4)) & "-" & _
        StrReverse(Mid(s, 13, 4)) & "-" & _
        StrReverse(Mid(s, 17, 2)) & StrReverse(Mid(s, 19, 2)) & "-" & _
        StrReverse(Mid(s, 21, 2)) & StrReverse(Mid(s, 23, 2)) & StrReverse(Mid(s, 25, 2)) & _
        StrReverse(Mid(s, 27, 2)) & StrReverse(Mid(s, 29, 2)) & StrReverse(Mid(s, 31, 2))
    End Function
    Function GuidToSquid(g)
      Replace g, "{", ""
      Replace g, "}", ""
      GuidToSquid = _
        StrReverse(Mid(g, 1, 8)) & _
        StrReverse(Mid(g, 10, 4)) & _
        StrReverse(Mid(g, 15, 4)) & _
        StrReverse(Mid(g, 20, 2)) & StrReverse(Mid(g, 22, 2)) & _
        StrReverse(Mid(g, 25, 2)) & StrReverse(Mid(g, 27, 2)) & StrReverse(Mid(g, 29, 2)) & _
        StrReverse(Mid(g, 31, 2)) & StrReverse(Mid(g, 33, 2)) & StrReverse(Mid(g, 35, 2))
    End Function

    In the past few months I have been playing around with Microsoft Visual Studio 2013 and trying out a bunch of the new C++ language features that are supported. Many of these were already in 2012, but I have been making a focused effort to make sure I understand all the new language features. I am loving the new stuff, and the way it makes my code easier to write.

    Unfortunately, when it comes to using all these cool new features in daily work, I am a kernel developer, and most of this seems to be problematic in the kernel. Specifically, I wish there were a kernel-friendly Standard Template Library that we could use in kernel-land. There have long been problems documented with using C++ in the Windows kernel. The Windows compiler team has done some work in 2012 to add a /kernel switch that is supposed to help with some of this, but really it seems to do no more than make sure you don’t use C++ exceptions.

    What I really believe the Windows kernel community needs, however, is a concerted effort to make all the modern C++ language features including STL available to kernel developers. I believe that it really impacts the quality of the driver ecosystem to have every single developer writing their own lists and list processing code, and trying to create their own wrappers for things like locking, IPC, etc.

    One of the purposes of the new language features is to make it easy for developers to write good code, and to enable them to do the right thing. Kernel mode programming makes it difficult to do the right thing. And the worst part about that is that when you do the wrong thing in kernel mode, you crash the machine altogether (as opposed to user mode where you simply crash your own process and the system continues merrily on its way).

    I understand that there are inherent difficulties in kernel mode programming, and things that you don’t have to worry about in normal C++ code, such as controlling what memory gets paged vs. non-paged, or worrying about code that can only run at certain IRQLs. So I believe that in addition to needing modern C++ and STL, we would need some (probably Microsoft-specific) extensions to help deal with these extra little problems. For example, when you declare a template, the code gets generated when the template is actually used. If the template is used in a non-paged function, and in a paged function, then we probably need a way to deterministically say how the template code should be generated.

    This is stuff that can all be done, and Microsoft compiler guys are GOOD at it. They could make our lives so much easier: not just for programmers, but for consumers who are sick of having drivers that crash their machines. Come on guys… do it for freedom. Do it for the children. Just do it. Please?

    In my day job, I almost exclusively use a kernel-mode debugger attached to a Hyper-V virtual machine as the target. Most of the serious issues I see are kernel-mode issues (naturally those are the worst because they tend to crash the machine, aka the Blue Screen of Death). But periodically I see something pop up in a user-mode component, and it’s honestly just a big pain in the butt to have to install debugging tools on the target machine, and then reproduce whatever is going on under that debugger. Especially since I am already in a debugger!

    I hit this just the other day when my user-mode component was failing an assertion, and I wanted to see what it was without having to jump through all those hoops.

    I recently figured out how to debug the user-mode process from my kernel mode debugger. It’s not perfect, but it works pretty well, and it’s certainly more efficient than the alternative. What you do is use the .process debugger command with a few parameters. So here are the series of commands that I used.

    !process 0 0
    .process /i /r /p ffffffff12345678

    The first command is basic, and just lists all the processes on the machine. I used this just so I could get the process object pointer for use in the next command.

    The second command is where the magic happens. The switches to the process command attach to the process invasively, which means that the process will actually become active. Another does some address translation to make it easier to see the user-mode process space. The third reloads the user-mode symbols for the process. In my experience, sometimes I have to still do this manually.

    Finally, the go command is required because the target system actually has to activate the process. This will return almost immediately, and you will be running in the context of the user-mode process. You can now set breakpoints, step through code, etc. all without leaving the comfort of your kernel-mode debugger.

    Happy debugging!

    At work I have recently been working on some system software that integrates closely with the registry. This means I require some fairly in-depth knowledge about the Registry APIs, and I also need a way to test their functionality (both with and without my software involved).

    I have done this in the past by writing custom code, which then gets lost or out-of-date. This time around I decided to make a more interactive and reusable tool to help me accomplish this, and my company – FSLogix, Inc. – decided to release the tool as open source on CodePlex. I also wrote a blog article with more in-depth information about the tool over at the FSLogix Blog.

    If you have read my blog at all, you already know that I enjoy poking around at the assembly-code-level in a debugger. But I believe it’s really NOT just for the geek interested in the arcane. Understanding what your code looks like at the assembly level can help you figure out what’s really going on, and in some cases it can be a whole lot easier than the same task at a higher level language, such as C. (All you C# and Java guys are laughing at the idea that C is a high level language, but this idea can be applied to understanding the byte code generated from those languages as well.)

    As a first example, a good defensive programming technique is to use asserts to make sure that your assumptions about the code are true. In a case where your assumption turns out to be wrong, these asserts can make a debugging session a nightmare. In a complicated product, such as assert may make it impossible to step through code running on a different thread. Fortunately it is super-easy to disable such an assert right in the debugger by just changing an opcode. No re-compiling, re-deploying, getting the scenario set up to reproduce again. In the assembly code, you can find the test for your assertion, which will be followed by a jump instruction of some type. If you are testing the return value of a function for success, you may see something like:

    test eax, eax
    jne <address>

    If you look at the byte code for the jne (jump not equal) instruction it will likely begin with a 0x75 (assuming an x86 instruction set and a short jump). Disabling that assert is a simple as changing that 0x75 to an 0x74, which changes the jne instruction to a je (jump equal). Good bye assertion.

    Another very closely related example I ran into the other day. I forgot to put a logical negation in a statement. The if statement was testing for the exact opposite of what I intended. Well setting up the scenario to test it was about a 5 minute process. But using the same trick as with the assert, I was able to switch my code to do the right thing and continue debugging the remainder of the scenario.

    One more example that will be more applicable to you C# guys out there. In the latest version of the C# language and .NET Framework, Microsoft added a couple of new keywords: async and await. Well the descriptions of the functionality and the explanations I read didn’t quite add up. Being quite familiar with multi-threaded programming, I just didn’t feel like I was getting a full picture of what was really happening. (I actually asked about it at a developers conference, and the speaker said “I don’t know how they do it, it’s just magic.”) So I wrote a quick sample, fired up a debugger and looked at the generated byte code, to get the complete picture of what was happening. It works about as I expected it would, but without understanding assembly level code, I never would have been able to figure that out.

    So the next time you’re tempted to think that assembly language programming is a dead art, think again.

    So I was troubleshooting a bug with a co-worker the other day and what we were seeing was a little confusing, and it turned out to be a difference in when I/O gets pended in the system. The basic scenario was that we were calling ZwQueryDirectoryFile to do some testing of a filter driver that was changing the directory query results. In testing without the driver installed, the return value from ZwQueryDirectoryFile was always STATUS_PENDING. So the assumption was made that this would continue to always be true.

    Well as soon as our filter was installed and we changed the results of the query, the api started returning immediately. Our code was assuming that then I/O would be pending and was waiting on the handle and checking the IoStatusBlock for the results, and it just wasn’t looking correct: the IoStatusBlock was showing STATUS_SUCCESS when we thought our filter had returned STATUS_NO_SUCH_FILE. After figuring out what was going on, I thought I would write a bit about how pending I/O works in the kernel.

    When an API call is made, there are times when it can be immediately determined what result should be returned. For example, if you pass a bad parameter, the system doesn’t need to pend the I/O because it can return STATUS_INVALID_PARAMETER immediately. In this case, ZwQueryDirectoryFile would just return STATUS_INVALID_PARAMETER as its result, and the IoStatusBlock should be ignored.

    If the API call returns STATUS_PENDING, only then should the caller wait on the handle to be signaled and then check the value of the IoStatusBlock parameter to determine the outcome of the I/O operation.

    In our specific case, our filter driver was populating the results and returning STATUS_NO_SUCH_FILE. This was changing the behavior of the test call to ZwQueryDirectoryFile from sending the request to disk and pending the I/O (the behavior without our driver) to returning the status code immediately.

    So it seems that a correct way to call one of these functions should look something like the following:

    IO_STATUS_BLOCK IoStatusBlock;
    NTSTATUS status = ZwQueryDirectoryFile( ... );
    if ( STATUS_PENDING == status )
       ZwWaitForSingleObject( ... );
       status = IoStatusBlock.Status;
    // Do something with the result of the query directory call