Persistence of documents on file systems
Persistence of data on storage media is always an interesting topic. No one can predict how long deleted file will be stored on hard disk.
Sometimes during investigation it is necessary to present at court the history of resident documents or deleted documents. Today I would like to discuss the behavior of Microsoft Word application and how it influences on creating timeline history of information. Some mentioned behaviors are the same for other applications like AutoCAD. I will focus on method of creating timeline history of documents which were edited by the users in the past. I this article I’ve used the NTFS file system but similar behavior can be observed on FATx file systems. Analyzing data from documents metadata are out of the scope of this document.
1. General behavior of Microsoft Word
Let’s say that we already have a file on local file system. It means that several clusters are allocated for storing the content of the file. For better explanation I will use particular cluster numbers. Our file has at least one run list which starts at cluster number 0x15ca0 (89248).
When we add or remove at least one character to/from the doc file and save changes then a new MFT entry and new clusters will be allocated for storing metadata and content of the updated file. New allocated FILE entry will store the original name of the file. The FILE entry with the previous version of the file is also updated because the file is renamed into ~WRL????.tmp. At this time we have 2 allocated FILE entries which points to different clusters. (There are more changes in the FILE entry but there are not so important at this stage – of course MAC times are always useful ;)).
If we close the file, the MFT entry and clusters of ~WRL????.tmp file will be freed. It means that the operating system can overwrite content of entry and clusters at any time. The picture below shows clusters (previously reserved for first version of our file) which now are marked as unused. As I mentioned above the first cluster number is 0x15ca0.
The content of updated file is now stored at new clusters. The first cluster of the first run list is 0x15cef (89327).
When we repeat above activity (1. open file, 2. change something and finally, 3. close it), the situation will be the same. It means that new entry in the MFT will be allocated (very often the MFT entry freed previously are allocated once again, so only 2 FILE entries are usually used concurrently – I observed this behavior only for the MFT entries – not for clusters). Also new clusters will be allocated for updated file and the old clusters will be freed. In this case we can still recover previous files, even after hours or days, but as always, there is a risk that free clusters which contain previous versions of file will be simply allocated by the operating system.
Anyhow, we have to use data carving techniques to find all doc files on the file system (as we know the header of the doc file is well known ;)).
2. The save button (ctrl + S) during editing documents
The above statements are true only when the user had changed the content of the document before invoking “save process” (save process = press the button save or press CTRL + S).
Such behavior allows us to trace the document history. We can easily recover each file because we can identify FILE entries in the MFT. We can also create the timeline history by analyzing MAC times which are written inside FILE entries. It is worth to mention that above entries and clusters can be allocated by other users or process (because there are not allocated).
It was rather useless to use “normal” hash values in IPS to prevent execution because such solution can be easily cheated. After changing even one byte in executable file the new hash value is completely different. As you can guess there are a lot of places in an executable file which can be modified and the exe file still will be executed without any problems.
Before one byte modification:
C:\ssdeep>md5sum gg.exe
\0323de930ed3e8e0552843db7e16dab7 *C:\\ssdeep\\gg.exe
After one byte modification:
C:\ssdeep>md5sum gg.exe
\bfa5aed4078c2a316786c1e7cb1e4f8e *C:\\ssdeep\\gg.exe
As mentioned above the SSDeep generates has values for small blocks of target file. So few modifications of target file will not change the whole value of generated hash as it is presented below:
Before modifications:
C:\ssdeep>ssdeep -l gg.exe
ssdeep,1.0--blocksize:hash:hash,filename
49152:PxY5ndv7xb2OnhONkVCUDNl3lBB6U0ahKvFBvebjj:PxsdvH9RbMU0VvFBve3j,"gg.exe"
C:\ssdeep>ssdeep gg.exe > sum.txt
C:\ssdeep>ssdeep -m sum.txt gg.exe
C:\ssdeep\gg.exe matches C:\ssdeep\gg.exe (100)
After few modifications in .rsrc section:
C:\ssdeep>ssdeep -l gg.exe
ssdeep,1.0--blocksize:hash:hash,filename
49152:1xY5ndv7xb2OnhONkVCUDNl3lBB6U0ahgyFFvebjj:1xsdvH9RbMU0HyFFve3j,"gg.exe"
C:\ssdeep>ssdeep -m sum.txt -p gg.exe
C:\ssdeep\gg.exe matches C:\ssdeep\gg.exe (91)
Additionally, the percentage value of similarity is generated (value in brackets).
By setting the value to 60 or 70 we can implement quite effective method of blocking execution of specific files.
This solution could block the execution of particular program and even new versions of it because very often new releases are based on the previous one.
A few weeks ago a new version of grsecurity 2.1.9 was released [1]. It is worth to mention about it because one new features affect how Linux physical memory forensic analysis will be performed.
Firstly, all physical memory pages which are freed are overwritten. During freeing page frames, a new PaX feature zeroes out them. It means that it will be impossible to recover content of pages such as memory mapped files from memory images which represent /dev/mem or /proc/kcore. Still, we can use methods of analysis which are based on interpreting internal kernel structures or trying to detect and recover hidden data [2].
Secondly, swap areas can be encrypted. It means that creating bit-by-bit copy of swap space partition from hard disk which was removed from compromised machine is useless.
Useful links:
[1] http://www.grsecurity.net/news.php#grsec219
[2] http://forensic.seccure.net/pdf/mburdach_digital_forensics_of_physical_memory.pdf