A recent feature addition to my product at work is logging to a custom event log channel using ETW. I have used technology built on top of ETW before, such as WPP tracing, and while it is a bit cumbersome, the advantages are great, and once it’s set up, it’s not hard at all to use. I have also done classic Event Logging before.

With this recent feature work, I wanted to use the current technology, and I wanted to be able to log from my service and driver. I thought I had everything working. I created a manifest .xml file that defined my custom channels, and the events I wanted to log. I wrote some code to register/unregister with ETW, and to log various messages with their accompanying data.

But in my testing, I was able to log events from my service (running as system), but not from my driver. The calls to EtwRegister and EtwWrite looked like they were working perfectly. They returned a success code, but nothing every appeared in the log. The documentation says they will behave this way (returning success but not logging anything) if the event is not enabled.

I spend the next few hours trying to figure out why my events were not enabled. The logging level (Error, Warning, Information) was fine. The keywords were fine. The manifest file and the message resources it pointed to were okay. I even went so far as to write a notification callback so I could see what the system was enabling. It all looked fine.

I finally decided to try using a built-in channel instead of my own, and I selected the SYSTEM channel. Bam! Everything started working. Then I started to think that this might be a security issue. This was a bit counter-intuitive because the APIs weren’t failing. I would think that if you tried to write an event but didn’t have rights, you should get an access denied error, but apparently the system just happily eats your event and gives you a thumbs up.

The key turned out to be the isolation attribute on my custom channel. Setting the isolation=”system” attribute on the channel made everything start working.

I don’t understand why the logging worked from my service, but not from my driver, since both were running in the system context. But there it is. A very simple solution to a maddening problem (at least for a day).

Windows Installer stores information about components, products, etc. that are installed on the system in the registry. Most of these elements are identified by a GUID, but the Installer doesn’t always store the GUID directly in the registry. Instead it uses what I call SQUIDs, or Squished GUIDs. Basically they just reformat the GUID a bit and remove some of the extra characters that don’t add anything (hyphens and braces). You can read more about this here: A Brief Note on Installer GUIDs.

Periodically when I am working on some installer code I can end up needing to search the registry for stuff that is related to a component I am working on. It is very helpful to be able to convert between SQUIDs and GUIDs. I usually just do this by hand, but today I decided I needed a bit more help doing this and wrote a tiny vbscript piece of code that does the conversion. On the chance that it might provide value for someone else, I have decided to post it here.

Function SquidToGuid(s)
  SquidToGuid = _
    StrReverse(Mid(s, 1, 8)) & "-" & _
    StrReverse(Mid(s, 9, 4)) & "-" & _
    StrReverse(Mid(s, 13, 4)) & "-" & _
    StrReverse(Mid(s, 17, 2)) & StrReverse(Mid(s, 19, 2)) & "-" & _
    StrReverse(Mid(s, 21, 2)) & StrReverse(Mid(s, 23, 2)) & StrReverse(Mid(s, 25, 2)) & _
    StrReverse(Mid(s, 27, 2)) & StrReverse(Mid(s, 29, 2)) & StrReverse(Mid(s, 31, 2))
End Function

Function GuidToSquid(g)
  Replace g, "{", ""
  Replace g, "}", ""
  GuidToSquid = _
    StrReverse(Mid(g, 1, 8)) & _
    StrReverse(Mid(g, 10, 4)) & _
    StrReverse(Mid(g, 15, 4)) & _
    StrReverse(Mid(g, 20, 2)) & StrReverse(Mid(g, 22, 2)) & _
    StrReverse(Mid(g, 25, 2)) & StrReverse(Mid(g, 27, 2)) & StrReverse(Mid(g, 29, 2)) & _
    StrReverse(Mid(g, 31, 2)) & StrReverse(Mid(g, 33, 2)) & StrReverse(Mid(g, 35, 2))
End Function

In the past few months I have been playing around with Microsoft Visual Studio 2013 and trying out a bunch of the new C++ language features that are supported. Many of these were already in 2012, but I have been making a focused effort to make sure I understand all the new language features. I am loving the new stuff, and the way it makes my code easier to write.

Unfortunately, when it comes to using all these cool new features in daily work, I am a kernel developer, and most of this seems to be problematic in the kernel. Specifically, I wish there were a kernel-friendly Standard Template Library that we could use in kernel-land. There have long been problems documented with using C++ in the Windows kernel. The Windows compiler team has done some work in 2012 to add a /kernel switch that is supposed to help with some of this, but really it seems to do no more than make sure you don’t use C++ exceptions.

What I really believe the Windows kernel community needs, however, is a concerted effort to make all the modern C++ language features including STL available to kernel developers. I believe that it really impacts the quality of the driver ecosystem to have every single developer writing their own lists and list processing code, and trying to create their own wrappers for things like locking, IPC, etc.

One of the purposes of the new language features is to make it easy for developers to write good code, and to enable them to do the right thing. Kernel mode programming makes it difficult to do the right thing. And the worst part about that is that when you do the wrong thing in kernel mode, you crash the machine altogether (as opposed to user mode where you simply crash your own process and the system continues merrily on its way).

I understand that there are inherent difficulties in kernel mode programming, and things that you don’t have to worry about in normal C++ code, such as controlling what memory gets paged vs. non-paged, or worrying about code that can only run at certain IRQLs. So I believe that in addition to needing modern C++ and STL, we would need some (probably Microsoft-specific) extensions to help deal with these extra little problems. For example, when you declare a template, the code gets generated when the template is actually used. If the template is used in a non-paged function, and in a paged function, then we probably need a way to deterministically say how the template code should be generated.

This is stuff that can all be done, and Microsoft compiler guys are GOOD at it. They could make our lives so much easier: not just for programmers, but for consumers who are sick of having drivers that crash their machines. Come on guys… do it for freedom. Do it for the children. Just do it. Please?

So I was troubleshooting a bug with a co-worker the other day and what we were seeing was a little confusing, and it turned out to be a difference in when I/O gets pended in the system. The basic scenario was that we were calling ZwQueryDirectoryFile to do some testing of a filter driver that was changing the directory query results. In testing without the driver installed, the return value from ZwQueryDirectoryFile was always STATUS_PENDING. So the assumption was made that this would continue to always be true.

Well as soon as our filter was installed and we changed the results of the query, the api started returning immediately. Our code was assuming that then I/O would be pending and was waiting on the handle and checking the IoStatusBlock for the results, and it just wasn’t looking correct: the IoStatusBlock was showing STATUS_SUCCESS when we thought our filter had returned STATUS_NO_SUCH_FILE. After figuring out what was going on, I thought I would write a bit about how pending I/O works in the kernel.

When an API call is made, there are times when it can be immediately determined what result should be returned. For example, if you pass a bad parameter, the system doesn’t need to pend the I/O because it can return STATUS_INVALID_PARAMETER immediately. In this case, ZwQueryDirectoryFile would just return STATUS_INVALID_PARAMETER as its result, and the IoStatusBlock should be ignored.

If the API call returns STATUS_PENDING, only then should the caller wait on the handle to be signaled and then check the value of the IoStatusBlock parameter to determine the outcome of the I/O operation.

In our specific case, our filter driver was populating the results and returning STATUS_NO_SUCH_FILE. This was changing the behavior of the test call to ZwQueryDirectoryFile from sending the request to disk and pending the I/O (the behavior without our driver) to returning the status code immediately.

So it seems that a correct way to call one of these functions should look something like the following:

NTSTATUS status = ZwQueryDirectoryFile( ... );
if ( STATUS_PENDING == status )
   ZwWaitForSingleObject( ... );
   status = IoStatusBlock.Status;
// Do something with the result of the query directory call

Debugged a fun crash today. I had some driver code that was enumerating all the child streams of a file. Because of a bug it ended up calling the code for a directory rather than a file. What I would have expected to see is that a call to ZwQueryInformationFile for the FileStreamInformation class would have returned at least one FILE_STREAM_INFORMATION structure, or an error code such as a not found. Instead it seems that the call returns STATUS_SUCCESS, and the IoStatusBlock.Information contains zero bytes returned. From what I can tell this only happens on a directory.

Last week I ran into a strange Windows file system behavior that I couldn’t find any information on anywhere. Since it’s always extremely frustrating to try to figure things out when there’s no information available, I thought I would share what I found. The bug we were experiencing had to do with a directory query operation over the network (e.g., when you run ‘dir \localhostc$Windows’ from a command window). If the directory doesn’t have many files in it this works just fine, but if it is a large directory, as in the example above then the IRPs that are issued to the file system are a bit strange. Our filter driver wasn’t handling these quite correctly, and the result was that if you queried the directory using the local name you’d get ~200 files, and if you used the UNC name you’d only get about ~150 files.

After digging into this with a coworker, we found an unexpected style of IRP. When performing the directory query over the network the SRV kernel component issues a IRP_MN_QUERY_DIRECTORY with a IrpSp->Parameters.QueryDirectory.FileName and IrpSp->Parameters.QueryDirectory.FileIndex combination that seems to essentially reset the point at which the enumeration continues. The sequence we were seeing goes something like this:
Continue reading

I was helping another developer with some work the other day and thought that what we came up with might be useful for others. We had a handle to a registry key that we got from somewhere (i.e., some other unrelated part of the code had opened it) and we needed to determine the name of the key.

In the past I have wrapped all my registry handling in some class objects that also maintain the name of the key. Each time a key is opened relative to another, it copies the key name from the parent key class to make up the key name of the new class. It certainly works, but it seems silly to maintain something that is already being stored somewhere down in the kernel.

So we created some code to use the native API function NtQueryObject. It returns the object name in the form that it’s store in down in the kernel, but was just perfect for what we needed. The name comes back in the form “REGISTRYMACHINESOFTWAREMicrosoft”.

A few interesting notes about this technique. It mostly tells you what the REAL name of the object is, so if you’re on a 64-bit system and you open HKLMSoftwareMicrosoft, and then query its name, you will see that it is REGISTRYMACHINESOFTWAREWow6432NodeMicrosoft. However, if you open a file object with a short name and then query it, you will STILL get the short name back.

For a sample program, continue reading…

=> Read more!

So I was working on a bug this last week in a Windows kernel-mode driver. It was really quite a strange symptom and once I found the problem I thought it might be useful to share since I wasn’t able to find any information on it myself.

So first, the symptom. After running a system for a while, I broke in with the kernel debugger to examine what looked like a deadlock. When I did a “!process 0 7” to examine the processes on the system and see what might be deadlocked, I found that every single process that I had run on the machine was still sitting around in memory with no active threads.

Well it turns out that the code was using the undocumented function PsLookupProcessByProcessId. Apparently, unlike it’s documented cousin PsGetCurrentProcess, PsLookupProcessByProcessId bumps the reference count on the EPROCESS object that it returns. The function thus requires a call to ObDereferencePointer to release the ref count and allow the process object to be closed.

Following is an example of one of the processes as it appears in the debugger.

PROCESS 890ce020 SessionId: 0 Cid: 07d0 Peb: 7ffdd000 ParentCid: 07c8
DirBase: 6dba0000 ObjectTable: 00000000 HandleCount: 0.
Image: cmd.exe
VadRoot 00000000 Vads 0 Clone 0 Private 0. Modified 17. Locked 0.
DeviceMap e12b4410
Token e3d58030
ElapsedTime 20:42:16.447
UserTime 00:00:00.062
KernelTime 00:00:00.546
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (4, 50, 345) (16KB, 200KB, 1380KB)
PeakWorkingSetSize 1120
VirtualSize 10 Mb
PeakVirtualSize 13 Mb
PageFaultCount 1299
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 0

No active threads