Creating Pre-images from Sophos Antivirus signatures

Tavis Ormandy taviso@cmpxchg8b.com

As part of my upcoming talk "A Critical Analysis of Sophos Antivirus", I've reverse engineered Sophos' signature scheme, and have written some tools to decode them.

Sophos signatures are distributed as bytecode programs for a simple VM with a RPN-like stack for computation and six named locations (registers). I've reverse engineered their interpreter, and have written a disassembler for their signature scheme.

Here are some sample opcodes, along with the mnemonics I use.

Opcode Description
VDL_OP_CRC32 96 Match crc32 n bytes (ones complement)
VDL_OP_NEXT FA Increment the file pointer.
VDL_OP_READSW E1 Read word onto stack.
VDL_OP_LOADIWSW DE Load immediate word onto stack.
VDL_OP_SEEKSW E8 Pop word, seek to absolute offset.
VDL_OP_SEEKIB EB Move file pointer forward n bytes.
VDL_OP_FADJUSTSW CB Adjust next value on stack.
VDL_OP_SUBSW D6 Pop two words, subtract, push result.
VDL_OP_SEEKIW F8 Seek to immediate offset.

And so on. Using my disassembler, it's possible to examine arbitrary signatures, there is some sample output below.

[CHUNK 401, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 37 BYTES]
    0ba3: 41 23 0a 43 09 41 49 44 53 2d 38 30 36 34 42 01 A..C.AIDS.8064B.
    0bb3: 01 49 12 45 10 fb 9a 00 96 1e aa af bf aa 96 20 .I.E............
    0bc3: 6c 69 f5 7c ed                                  li...
[CHUNK 402, TYPE IDE_CHUNK_TYPE_SIGFLAGS (10), CLASS IDE_CHUNK_CLASS_EMPTY (0), 1 BYTES]
    0ba5: 0a                                              .
[CHUNK 403, TYPE IDE_CHUNK_TYPE_PASCALSTRING (3), CLASS IDE_CHUNK_CLASS_SMALL (4), 11 BYTES]
    0ba6: 43 09 41 49 44 53 2d 38 30 36 34                C.AIDS.8064
[CHUNK 404, TYPE IDE_CHUNK_TYPE_SUBCHUNKCOUNT (2), CLASS IDE_CHUNK_CLASS_SMALL (4), 3 BYTES]
    0bb1: 42 01 01                                        B..
[CHUNK 405, TYPE IDE_CHUNK_TYPE_BYTECODEHEADER (9), CLASS IDE_CHUNK_CLASS_SMALL (4), 20 BYTES]
    0bb4: 49 12 45 10 fb 9a 00 96 1e aa af bf aa 96 20 6c I.E............l
    0bc4: 69 f5 7c ed                                     i...
[CHUNK 406, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 18 BYTES]
    0bb6: 45 10 fb 9a 00 96 1e aa af bf aa 96 20 6c 69 f5 E............li.
    0bc6: 7c ed                                           ..
        0000: fb 9a 00                      literaliw       9a 00                      ; match literal 16bit immediate
        0003: 96 1e aa af bf aa             crc32           1e aa af bf aa             ; match crc32 n bytes (ones complement)
                ; generating 30 byte pre-image for crc 0xaaafbfaa...
                0000: 36 36 44 ea f8 ec 36 36 66D...66
                0008: 36 36 36 36 36 36 36 36 66666666
                0010: 36 36 36 36 36 36 36 36 66666666
                0018: 36 36 36 36 36 36       666666
        0009: 96 20 6c 69 f5 7c             crc32           20 6c 69 f5 7c             ; match crc32 n bytes (ones complement)
                ; generating 32 byte pre-image for crc 0x6c69f57c...
                0000: bc 32 28 c1 4e 4e 4e 4e .2..NNNN
                0008: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
                0010: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
                0018: 4e 4e 4e 4e 4e 4e 4e 4e NNNNNNNN
        000f: ed                            hlt                                        ; end of program

Here we can see the signature scans for the literal 16bit pattern "9A 00", and then tests the following 30 bytes for a matching CRC32, then the following 32 bytes for another CRC32. This is a fairly typical pattern for Sophos signatures (two byte literal followed by two CRC32s of varying length).

As most signatures depend so highly on CRC32, we can automatically generate pre-images for the majority of identities that Sophos ships. This allows some interesting attacks, but is also a fun demonstration:

$ printf "\x9a\x0066D\xea\xf8\xec666666666666666666666666\xbc\x32\x28\xc1NNNNNNNNNNNNNNNNNNNNNNNNNNNN" > VIRUSLOL.EXE
$ sav32cli.exe VIRUSLOL.EXE
Sophos Anti-Virus
Version 1.01.1 [Win32/Intel]

Quick Scanning

>>> Virus 'AIDS-8064' found in file VIRUSLOL.EXE

1 file swept in 5 seconds.
1 virus was discovered.
1 file out of 1 was infected.
Please send infected samples to Sophos for analysis.
For advice consult www.sophos.com, email support@sophos.com
or telephone +44 1235 559933
Ending Sophos Anti-Virus.

In fact, many signatures are even weaker, often relying on single CRC32s, or even just a few literal bytes. It's quite surprising to me that patterns like this do not constantly suffer accidental collisions.

[CHUNK 245, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 30 BYTES]
    0616: 41 1c 0a 43 08 50 72 65 67 6e 61 6e 74 42 01 01 A..C.PregnantB..
    0626: 49 0c 45 0a fb b9 9f 96 05 f8 c1 72 2f ed       I.E........r..
[CHUNK 250, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 12 BYTES]
    0628: 45 0a fb b9 9f 96 05 f8 c1 72 2f ed             E........r..
        0000: fb b9 9f                      literaliw       b9 9f                      ; match literal 16bit immediate
        0003: 96 05 f8 c1 72 2f             crc32           05 f8 c1 72 2f             ; match crc32 n bytes (ones complement)
        0009: ed                            hlt                                        ; end of program
 [CHUNK 143, TYPE IDE_CHUNK_TYPE_SIGNATURE (1), CLASS IDE_CHUNK_CLASS_SMALL (4), 43 BYTES]
     02ce: 41 29 0a 43 06 44 4d 2d 33 33 30 42 01 01 49 1b A..C.DM.330B..I.
     02de: 45 19 fc b8 fa fa fb b9 37 fb 01 be fa fa fb 50 E.......7......P
     02ee: 80 fc 34 fa fb 46 e2 fb fa c3 ed                ..4..F.....
 [CHUNK 148, TYPE IDE_CHUNK_TYPE_BYTECODE (5), CLASS IDE_CHUNK_CLASS_SMALL (4), 27 BYTES]
     02de: 45 19 fc b8 fa fa fb b9 37 fb 01 be fa fa fb 50 E.......7......P
     02ee: 80 fc 34 fa fb 46 e2 fb fa c3 ed                ..4..F.....
         0000: fc b8                         literalib       b8                         ; match literal 8bit immediate
         0002: fa                            next                                       ; move file pointer forward 1 byte
         0003: fa                            next                                       ; move file pointer forward 1 byte
         0004: fb b9 37                      literaliw       b9 37                      ; match literal 16bit immediate
         0007: fb 01 be                      literaliw       01 be                      ; match literal 16bit immediate
         000a: fa                            next                                       ; move file pointer forward 1 byte
         000b: fa                            next                                       ; move file pointer forward 1 byte
         000c: fb 50 80                      literaliw       50 80                      ; match literal 16bit immediate
         000f: fc 34                         literalib       34                         ; match literal 8bit immediate
         0011: fa                            next                                       ; move file pointer forward 1 byte
         0012: fb 46 e2                      literaliw       46 e2                      ; match literal 16bit immediate
         0015: fb fa c3                      literaliw       fa c3                      ; match literal 16bit immediate
         0018: ed                            hlt                                        ; end of program

$ printf "\xb8AA\xb9\x37\x01\xbeAA\x50\x80\x34A\x46\xe2\xfa\xc3" > lololol.com

http://www.virustotal.com/file-scan/report.html?id=9a9c4f64e550563a32f5dab07bb93d4508514be21b98cf57bf39ea95d4bc4671-1312028929