Tracing Get.Workspace

Intro

In the process of building our sandbox for Office documents, the Streetfight team has quite a bit of experience handling malicious Microsoft Office files. Recently, we (and others) have observed and uptick in XLM macro malware which uses the obscure GET.WORKSPACE (stylized Get.Workspace) function to evade sandbox analysis. After reading Inquest’s excellent blog we got to thinking about how Get.Workspace works and how to hook it.

Given that the functionality lies outside of VBA, it is likely that it is contained in excel.exe itself. This is a problem, as this binary is roughly 40 MB — too large to handle in any reasonable time with purely static analysis. To overcome this issue, we could:

  1. Guess at the API’s that Excel is likely to use in Get.Workspace and see if they are hit with breakpoints
  2. Measure the code that is executed when Get.Workspace is called via dynamic binary instrumentation (henceforth DBI) with Intel PIN.

We chose option (2) and detail the steps taken below.

The Plan

  1. Generate two documents that exercise Get.Workspace(19) and Get.Workspace(42)
  2. Open the first document under PIN two times, logging the basic blocks that were hit during each run. Extract the blocks that are in common between runs
  3. Open the second document under PIN two times, logging the basic blocks that were hit during each run. Extract the blocks that are in common between runs
  4. Take the difference in exercised basic blocks from (3) and (4). This should give us the location of the code (likely a switch) that handles the Get.Workspace function
  5. Look at the function in Ghidra and try to make sense of it
  6. Develop a method to hook the functionality

Making the Files

We produced two XLS files with embedded XLM macros which did the following

  1. Run XLM on auto_open
  2. Call GET.WORKSPACE(19) / GET.WORKSPACE(42)
  3. Use XLM to exit the process

Obtaining Traces

To obtain the DBI traces, we used the pintool from lighthouse which appears to be inspired by the grugq’s old runtracer — check out Prospecting for Rootite if you have not seen it.

The environment we used was:

  1. CodeCoverage-v0.9–98189
  2. pin-3.13–98189-g60a6ef199-msvc-windows
  3. Excel 16.0.13029.20308

Lighthouse’s CodeCoverage tool builds in Visual Studio to a DLL and once compiled, can be run in PIN via the shell (using the -w switch for whitelist)

pin.exe -t CodeCoverage.dll -w EXCEL.EXE -- "PATH TO EXCEL.EXE" <inputfile.xls>

This command will produce a file in binary drcov format called trace.logrepresenting the basic blocks hit during the execution. We can modify a second tool called lighthouse drcov to convert this binary data into plain text and to suppress some of the tools output — we only care about the EXCEL.EXE module offsets.

We then ran this script like so:

python ..\\\\convert_drcov_txt.py trace.log >> trace.txt

A file called trace.txt is produced, representing the basic blocks hit during the run. Because some 'noise' can occur during a tracing run, we find it useful to run the same file through pin two times (or more) to obtain multiple plaintext outputs per file.

Processing Traces

We now have a set of plain text files for both GET.WORKSPACE(19) and GET.WORKSPACE(42). We need to find the offsets that remain stable throughout multiple runs of the same file. This can be accomplished many better ways, but we just used bash on windows.

$ sort wksp_19_run_0.txt >> wksp_19_run_0.sorted
$ sort wksp_19_run_1.txt >> wksp_19_run_1.sorted
$ comm -12 wksp_19_run_0.sorted wksp_19_run_1.sorted >> wksp_19_min.txt

The above process can be repeated for the Get.Workspace(42) files. Once we have our two minimized files, we need to remove any of the sameness between the two.

$ comm -3 wksp_19_min.txt wksp_42_min.txt >> wksp_diff.txt

The final file should be pretty boring looking, but will contain a set of offsets which differ between opening the two files. One of these offsets should correspond to the different path through the binary taken when parsing Workspace 19 vs Workspace 42.

Analyzing Output

We can open Ghidra and explore these offsets, as there are only roughly 50 of them. Visually, we are just looking for complexity in the graph that would indicate parsing for many Get.Workspace commands.

Upon browsing differences, we came across a function graph that looked nightmarish enough to be the switch statement we are looking for:

Zooming in to the differences confirms that this is indeed the giant switch statement handling Get.Workspace.

Workspace 42

The first relevant difference in flow occurs at EXCEL+0xA3F8BF, which was called in the pictured function above. We know from documentation that Get.Workspace(42) determines if audio is present on the host machine, so we should be on the lookout for any clues. Below is the jumptable case for 42:

This function at FUN_00a3f8bf gets a handle to WINMM.DLL via a call into Mso30Win32Client:

After a bit of setup, the function ends up calling mciSendCommand:

mciSendCommand has the following parameters:

When mciSendCommand is called with the command MCI_OPEN , the dwParam argument represents a pointer to an MCI_OPEN_PARMS structure.

With this knowledge, we know now that we should insert our hook in winmm.dllupon seeing it load via Mso30Win32Client!ordinal_330. Furthermore, upon intercepting the call, we should look for the parameters:

  1. IDDevice == 0
  2. uMsg == MCI_OPEN
  3. dwParam->lpstrDeviceType == "WaveAudio"

If these conditions are met, we should modify the return value to emulate the existence of a audio device.

Workspace 19

The next difference is in the same jumptable, for case 19 at EXCEL+0xAE21F1. This is a much simpler setup and according to the manual should check if a mouse is present. The relevant disassembly is below:

; jumptable case 19
PUSH 0x13
CALL GetSystemMetrics

This code simply calls GetSystemMetrics with the parameter SM_MOUSEPRESENT. With this knowledge, our game plan can simply be:

  1. Hook user32!GetSystemMetrics at process start
  2. Emulate the mouse existance via the return value

Conclusion

Hooking and tracing functionality inside Excel.exe is terrible, but PIN can really help out. The Streetfight sandbox already hooks these functions to avoid detection. Sign up for the Office Sandbox waitlist or hire us to hack your stuff.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store