Friday, August 24, 2007


Persistence of documents on file systems

Persistence of data on storage media is always an interesting topic. No one can predict how long deleted file will be stored on hard disk.
Sometimes during investigation it is necessary to present at court the history of resident documents or deleted documents. Today I would like to discuss the behavior of Microsoft Word application and how it influences on creating timeline history of information. Some mentioned behaviors are the same for other applications like AutoCAD. I will focus on method of creating timeline history of documents which were edited by the users in the past. I this article I’ve used the NTFS file system but similar behavior can be observed on FATx file systems. Analyzing data from documents metadata are out of the scope of this document.

1. General behavior of Microsoft Word

Let’s say that we already have a file on local file system. It means that several clusters are allocated for storing the content of the file. For better explanation I will use particular cluster numbers. Our file has at least one run list which starts at cluster number 0x15ca0 (89248).
When we add or remove at least one character to/from the doc file and save changes then a new MFT entry and new clusters will be allocated for storing metadata and content of the updated file. New allocated FILE entry will store the original name of the file. The FILE entry with the previous version of the file is also updated because the file is renamed into ~WRL????.tmp. At this time we have 2 allocated FILE entries which points to different clusters. (There are more changes in the FILE entry but there are not so important at this stage – of course MAC times are always useful ;)).
If we close the file, the MFT entry and clusters of ~WRL????.tmp file will be freed. It means that the operating system can overwrite content of entry and clusters at any time. The picture below shows clusters (previously reserved for first version of our file) which now are marked as unused. As I mentioned above the first cluster number is 0x15ca0.

The content of updated file is now stored at new clusters. The first cluster of the first run list is 0x15cef (89327).
When we repeat above activity (1. open file, 2. change something and finally, 3. close it), the situation will be the same. It means that new entry in the MFT will be allocated (very often the MFT entry freed previously are allocated once again, so only 2 FILE entries are usually used concurrently – I observed this behavior only for the MFT entries – not for clusters). Also new clusters will be allocated for updated file and the old clusters will be freed. In this case we can still recover previous files, even after hours or days, but as always, there is a risk that free clusters which contain previous versions of file will be simply allocated by the operating system.
Anyhow, we have to use data carving techniques to find all doc files on the file system (as we know the header of the doc file is well known ;)).

2. The save button (ctrl + S) during editing documents

Every “save process” invoked by the user will create new file on the file system – the previous one is renamed with the following prefix ~WRL. For example: dokument.doc file is being edited for some period of time and the user had saved changes in the content 4 times. The result is presented below:

The above statements are true only when the user had changed the content of the document before invoking “save process” (save process = press the button save or press CTRL + S).
As we can see all created files are visible during “editing session”. The content of each of file is stored at different allocated run lists. It also means that each file has its own (allocated) FILE entry in the MFT.

The part of the MFT is presented below:



The dokument.doc is allocated on clusters where the first cluster has number = 0x15c3b (89147). The ~WRL0003.tmp starts from 0xfaa8 (64168). The ~WRL0005.tmp starts from 0x15b06 (88838). The ~WRL0656.tmp starts from 0x15bee (89070). The last one - ~WRL1188.tmp starts from 0x15ba1 (88993).

When the file is closed by the user only one file will stay visible – document.doc. Rests of documents are deleted automatically. Delete means that the entries in the MFT and clusters are freed.

Such behavior allows us to trace the document history. We can easily recover each file because we can identify FILE entries in the MFT. We can also create the timeline history by analyzing MAC times which are written inside FILE entries. It is worth to mention that above entries and clusters can be allocated by other users or process (because there are not allocated).


3. Auto-save option


There is one more place from which documents edited (in the past) by users can be recovered. Microsoft Word has auto-save feature enable by default. This feature creates the copy of documents being edited. The default settings are presented below:

When the file is open & the content of file was modified, Microsoft Word will create the copy in “safe location” defined in “File Locations” tab. The name of the file is “AutoRecovery save of .asd”.
When the user modify the content of the file, after some period of time (10 minutes by default), Microsoft Word automatically will save changes in new file with the same name (“AutoRecovery save of .asd”) and will free clusters which contain the content of old file. Also the FILE entry of the MFT is freed. In brief the behavior is similar to activities described in first part of this article – General behavior of Microsoft Word. The only difference is that Microsoft Word closes and opens .asd file in background. It is worth to mention that each time new clusters are allocated, so the same content of file is at least in 2 different locations on file systems (original and backup location).

Labels:

Friday, September 08, 2006

Partial file matching in host intrusion prevention systems

A few weeks ago Jesse Kornblum released the SSdeep [1] tool. The main propose of this tool is to identify similar files by calculating hashes and comparing those hash values to the known values computed previously and stored in database.
T
he big difference between SSdeep and other well-known tools to generate hash values (like md5sum) is that SSdeep calculates hash values for small chunks of target file. So if someone modifies only few bytes of a file, the new calculated hash value will be similar to the previous one [2].

49152:1xY5ndv7xb2OnhONkVCUDNl3lBB6U0ahgyFFvebjj:1xsdvH9RbMU0HyFFve3j,"gg.exe"
49152:PxY5ndv7xb2OnhONkVCUDNl3lBB6U0ahKvFBvebjj:PxsdvH9RbMU0VvFBve3j,"gg.exe"


The main features (altered document matching and partial file matching) of the SSdeep are very helpful during forensic analysis. But you can also use partial file matching feature in host intrusion prevention systems. You can use this function to disable execution of specific programs.

It was rather useless to use “normal” hash values in IPS to prevent execution because such solution can be easily cheated. After changing even one byte in executable file the new hash value is completely different. As you can guess there are a lot of places in an executable file which can be modified and the exe file still will be executed without any problems.

Before one byte modification:

C:\ssdeep>md5sum gg.exe
\0323de930ed3e8e0552843db7e16dab7 *C:\\ssdeep\\gg.exe

After one byte modification:

C:\ssdeep>md5sum gg.exe
\bfa5aed4078c2a316786c1e7cb1e4f8e *C:\\ssdeep\\gg.exe

As mentioned above the SSDeep generates has values for small blocks of target file. So few modifications of target file will not change the whole value of generated hash as it is presented below:

Before modifications:

C:\ssdeep>ssdeep -l gg.exe
ssdeep,1.0--blocksize:hash:hash,filename
49152:PxY5ndv7xb2OnhONkVCUDNl3lBB6U0ahKvFBvebjj:PxsdvH9RbMU0VvFBve3j,"gg.exe"

C:\ssdeep>ssdeep gg.exe > sum.txt
C:\ssdeep>ssdeep -m sum.txt gg.exe
C:\ssdeep\gg.exe matches C:\ssdeep\gg.exe (100)

After few modifications in .rsrc section:

C:\ssdeep>ssdeep -l gg.exe
ssdeep,1.0--blocksize:hash:hash,filename
49152:1xY5ndv7xb2OnhONkVCUDNl3lBB6U0ahgyFFvebjj:1xsdvH9RbMU0HyFFve3j,"gg.exe"

C:\ssdeep>ssdeep -m sum.txt -p gg.exe
C:\ssdeep\gg.exe matches C:\ssdeep\gg.exe (91)

Additionally, the percentage value of similarity is generated (value in brackets).
By setting the value to 60 or 70 we can implement quite effective method of blocking execution of specific files.

This solution could block the execution of particular program and even new versions of it because very often new releases are based on the previous one.

Useful links:

[1] http://ssdeep.sourceforge.net/

[2] http://www.dfrws.org/2006/proceedings/12-Kornblum.pdf

Tuesday, August 22, 2006

Grsecurity and forensic analysis

A few weeks ago a new version of grsecurity 2.1.9 was released [1]. It is worth to mention about it because one new features affect how Linux physical memory forensic analysis will be performed.

Firstly, all physical memory pages which are freed are overwritten. During freeing page frames, a new PaX feature zeroes out them. It means that it will be impossible to recover content of pages such as memory mapped files from memory images which represent /dev/mem or /proc/kcore. Still, we can use methods of analysis which are based on interpreting internal kernel structures or trying to detect and recover hidden data [2].

Secondly, swap areas can be encrypted. It means that creating bit-by-bit copy of swap space partition from hard disk which was removed from compromised machine is useless.

Useful links:
[1] http://www.grsecurity.net/news.php#grsec219
[2] http://forensic.seccure.net/pdf/mburdach_digital_forensics_of_physical_memory.pdf

Monday, August 21, 2006

“Memory forensics” related debugger extension DLLs for Microsoft Debuggers

One of the biggest problem with Windows “memory forensics” related tools is that such tools have to be updated concurrently because of new version of operating system or service pack. I’m thinking about offsets to fields inside various internal kernel structures which can vary. It is obvious that sooner or later you will have to use description of some internal structures to find digital evidence. If you write your own script or tool to parse a physical memory image you will have to take into consideration this problem. Even if you prepare signatures to grep some objects you will need information about offsets. Now just think about generic solution which is based on using symbols which can be download automatically or manually from Microsoft servers. Instead of hard coding information about Windows internal kernel structures and offsets for various versions of operating systems you just write one code for all of them. Firstly, your code is smaller (you can avoid many mistakes, too). Secondly, you can save a lot of time. So if you are lazy this solutions will be exactly for you :).

I decided to take a look closer at Microsoft Debugging Tools for Windows and debugger extensions which can be used by MS Debuggers and allow to use new debugger commands.

As we should know physical memory device objects (\\.\PhysicalMemory and \\.\DebugMemory) in Windows operating systems represent a raw data. It is impossible to load an image of such device object into WinDbg or KD because this file will be not recognized by Debuggers. Fortunately, you can convert raw data to recognizable format (crashdump format). A part of dump header format is described by Andreas Schuster at [1].


The next step is to download Debugging Tools for Windows and Symbols from [2]. If you have a direct access to the Internet you will not have to download Symbols because the debugger tool downloads Symbols for you automatically.

Debugger extension commands are exposed by DLLs. A short description about how to write new debugger commands can be found in the debugger.chm file which is installed with Debugging Tools for Windows (You have to use custom installation to install examples).

An environment described above can be used to perform offline analysis of physical memory dumps. In the other hand debugger extension DLLs can be used to verify system integrity on a live system. You have to use the livekd tool from [3] to load and execute commands exported by dll extensions.

A few useful functions which allow you to resolve Symbols or find out the offset are described below.
  1. GetOffsetByName(Symbol, Address), where the Symbol is the name of symbol like PsInitialSystemProcess or PsLoadedModuleList. The address of requested symbol is returned by the Address parameter.
  2. GetSymbolTypeId(Symbol, TypeId, Module), where the TypeId is an index within PDB file which is associated with the Module.
After that you can call other functions which take the TypeId and Module as parameters:
  • GetTypeSize(Module, TypeId, Size) to receive size of requested internal structure,
  • GetFieldName(Module, TypeId, Iteration, Name, MAX_PATH, NULL) to receive the Name of field pointed by a number = the Iteration,
  • or GetFieldOffset(Module, TypeId, Name, Offset) to receive the offset of requested field defined by the Name.

I wrote the function “offset(structname, fieldname)” which receives the offset to requested field of requested structure.

ULONG offset(CHAR *structname, CHAR *fieldname)
{

ULONG64 Module;
ULONG i1, TypeId;
CHAR Name[MAX_PATH];

g_ExtSymbols->GetSymbolTypeId(structname, &TypeId, &Module);

for (i1=0; ;i1++) {
HRESULT Hr1;
ULONG Offset=0;

Hr1 = g_ExtSymbols->GetFieldName(Module, TypeId, i1, Name, MAX_PATH, NULL);
if (Hr1 == S_OK) {
g_ExtSymbols->GetFieldOffset(Module, TypeId, Name, &Offset);
if (strcmp(Name,fieldname) == 0) {
return Offset;
}
}
else
if (Hr1 == E_INVALIDARG) {
break;
}
else {
dprintf("GetFieldName Failed %lx\n", Hr1);
break;
}
}
return 0;
}

You can call this function in the following way:

ULONG OffsetAPL = offset("_EPROCESS","ActiveProcessLinks");

Of course it is enough to use “dt _EPROCESS” command to receive the same result so now something more useful. At my website http://forensic.seccure.net [4] you can find the extension dll called hidden.dll which allows to detect all hidden processes – even hidden by the DKOM method. The command to call proper function is “!"full path to directory with hidden.dll file"hidden. allprocesses”. For example: “kd>!c:\temp\hidden.allprocesses”

You can record in external file all executed commands and results by using the command “.logappend “c:\forensics.log””.

As I mention above dll extensions can be executed at any version of Windows operating system. On a live system you have to use the livekd tool [3].

Useful links:
[1] http://computer.forensikblog.de/en/2006/03/dmp_file_structure.html
[2] http://www.microsoft.com/whdc/devtools/debugging/default.mspx
[3] http://www.sysinternals.com/Utilities/LiveKd.html
[4] http://forensic.seccure.net/tools/hidden.zip