As part of my upcoming talk "A Critical Analysis of Sophos Antivirus", I've reverse engineered Sophos' signature scheme, and have written some tools to decode them.
Sophos signatures are distributed as bytecode programs for a simple VM with a RPN-like stack for computation and six named locations (registers). I've reverse engineered their interpreter, and have written a disassembler for their signature scheme.
Here are some sample opcodes, along with the mnemonics I use.
| Opcode | Description | |
|---|---|---|
| VDL_OP_CRC32 | 96 | Match crc32 n bytes (ones complement) |
| VDL_OP_NEXT | FA | Increment the file pointer. |
| VDL_OP_READSW | E1 | Read word onto stack. |
| VDL_OP_LOADIWSW | DE | Load immediate word onto stack. |
| VDL_OP_SEEKSW | E8 | Pop word, seek to absolute offset. |
| VDL_OP_SEEKIB | EB | Move file pointer forward n bytes. |
| VDL_OP_FADJUSTSW | CB | Adjust next value on stack. |
| VDL_OP_SUBSW | D6 | Pop two words, subtract, push result. |
| VDL_OP_SEEKIW | F8 | Seek to immediate offset. |
And so on. Using my disassembler, it's possible to examine arbitrary signatures, there is some sample output below.
[CHUNK 401, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 37 BYTES]
0ba3: 41 23 0a 43 09 41 49 44 53 2d 38 30 36 34 42 01 A..C.AIDS.8064B.
0bb3: 01 49 12 45 10 fb 9a 00 96 1e aa af bf aa 96 20 .I.E............
0bc3: 6c 69 f5 7c ed li...
[CHUNK 402, TYPE IDE_CHUNK_TYPE_SIGFLAGS (10), CLASS IDE_CHUNK_CLASS_EMPTY (0), 1 BYTES]
0ba5: 0a .
[CHUNK 403, TYPE IDE_CHUNK_TYPE_PASCALSTRING (3), CLASS IDE_CHUNK_CLASS_SMALL (4), 11 BYTES]
0ba6: 43 09 41 49 44 53 2d 38 30 36 34 C.AIDS.8064
[CHUNK 404, TYPE IDE_CHUNK_TYPE_SUBCHUNKCOUNT (2), CLASS IDE_CHUNK_CLASS_SMALL (4), 3 BYTES]
0bb1: 42 01 01 B..
[CHUNK 405, TYPE IDE_CHUNK_TYPE_BYTECODEHEADER (9), CLASS IDE_CHUNK_CLASS_SMALL (4), 20 BYTES]
0bb4: 49 12 45 10 fb 9a 00 96 1e aa af bf aa 96 20 6c I.E............l
0bc4: 69 f5 7c ed i...
[CHUNK 406, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 18 BYTES]
0bb6: 45 10 fb 9a 00 96 1e aa af bf aa 96 20 6c 69 f5 E............li.
0bc6: 7c ed ..
0000: fb 9a 00 literaliw 9a 00 ; match literal 16bit immediate
0003: 96 1e aa af bf aa crc32 1e aa af bf aa ; match crc32 n bytes (ones complement)
; generating 30 byte pre-image for crc 0xaaafbfaa...
0000: 36 36 44 ea f8 ec 36 36 66D...66
0008: 36 36 36 36 36 36 36 36 66666666
0010: 36 36 36 36 36 36 36 36 66666666
0018: 36 36 36 36 36 36 666666
0009: 96 20 6c 69 f5 7c crc32 20 6c 69 f5 7c ; match crc32 n bytes (ones complement)
; generating 32 byte pre-image for crc 0x6c69f57c...
0000: bc 32 28 c1 4e 4e 4e 4e .2..NNNN
0008: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
0010: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
0018: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
000f: ed hlt ; end of program
Here we can see the signature scans for the literal 16bit pattern "9A 00", and then tests the following 30 bytes for a matching CRC32, then the following 32 bytes for another CRC32. This is a fairly typical pattern for Sophos signatures (two byte literal followed by two CRC32s of varying length).
As most signatures depend so highly on CRC32, we can automatically generate pre-images for the majority of identities that Sophos ships. This allows some interesting attacks, but is also a fun demonstration:
$ printf "\x9a\x0066D\xea\xf8\xec666666666666666666666666\xbc\x32\x28\xc1NNNNNNNNNNNNNNNNNNNNNNNNNNNN" > VIRUSLOL.EXE $ sav32cli.exe VIRUSLOL.EXE Sophos Anti-Virus Version 1.01.1 [Win32/Intel] Quick Scanning >>> Virus 'AIDS-8064' found in file VIRUSLOL.EXE 1 file swept in 5 seconds. 1 virus was discovered. 1 file out of 1 was infected. Please send infected samples to Sophos for analysis. For advice consult www.sophos.com, email support@sophos.com or telephone +44 1235 559933 Ending Sophos Anti-Virus.
In fact, many signatures are even weaker, often relying on single CRC32s, or even just a few literal bytes. It's quite surprising to me that patterns like this do not constantly suffer accidental collisions.
[CHUNK 245, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 30 BYTES]
0616: 41 1c 0a 43 08 50 72 65 67 6e 61 6e 74 42 01 01 A..C.PregnantB..
0626: 49 0c 45 0a fb b9 9f 96 05 f8 c1 72 2f ed I.E........r..
[CHUNK 250, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 12 BYTES]
0628: 45 0a fb b9 9f 96 05 f8 c1 72 2f ed E........r..
0000: fb b9 9f literaliw b9 9f ; match literal 16bit immediate
0003: 96 05 f8 c1 72 2f crc32 05 f8 c1 72 2f ; match crc32 n bytes (ones complement)
0009: ed hlt ; end of program
[CHUNK 143, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 43 BYTES]
02ce: 41 29 0a 43 06 44 4d 2d 33 33 30 42 01 01 49 1b A..C.DM.330B..I.
02de: 45 19 fc b8 fa fa fb b9 37 fb 01 be fa fa fb 50 E.......7......P
02ee: 80 fc 34 fa fb 46 e2 fb fa c3 ed ..4..F.....
[CHUNK 148, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 27 BYTES]
02de: 45 19 fc b8 fa fa fb b9 37 fb 01 be fa fa fb 50 E.......7......P
02ee: 80 fc 34 fa fb 46 e2 fb fa c3 ed ..4..F.....
0000: fc b8 literalib b8 ; match literal 8bit immediate
0002: fa next ; move file pointer forward 1 byte
0003: fa next ; move file pointer forward 1 byte
0004: fb b9 37 literaliw b9 37 ; match literal 16bit immediate
0007: fb 01 be literaliw 01 be ; match literal 16bit immediate
000a: fa next ; move file pointer forward 1 byte
000b: fa next ; move file pointer forward 1 byte
000c: fb 50 80 literaliw 50 80 ; match literal 16bit immediate
000f: fc 34 literalib 34 ; match literal 8bit immediate
0011: fa next ; move file pointer forward 1 byte
0012: fb 46 e2 literaliw 46 e2 ; match literal 16bit immediate
0015: fb fa c3 literaliw fa c3 ; match literal 16bit immediate
0018: ed hlt ; end of program
$ printf "\xb8AA\xb9\x37\x01\xbeAA\x50\x80\x34A\x46\xe2\xfa\xc3" > lololol.com