Quantcast
Channel: Noise
Viewing all articles
Browse latest Browse all 39498

lcamtuf's blog: Automatically inferring file syntax with afl-analyze

$
0
0

The nice thing about the instrumentation used by American Fuzzy Lop is that it allows us to do much more than just, well, fuzzing stuff. For example, for a while now, the fuzzer shipped with a standalone tool called afl-tmin, which allows you to take an interesting file and automatically shrink it - all while making sure that it still exercises the same functionality in the targeted binary (or triggers the same crash). Another tool, afl-cmin, pulls of the same trick for eliminating redundant files in large fuzzing corpora.

The latest release of AFL features another nifty new addition along these lines: afl-analyze. The tool takes an input file, sequentially flips bytes, and then gives you a human-readable report explaining the structure of the file, based on the observed changes to the execution path within the target binary. It can tell apart:

  • No-op blocks, such as comments.
  • Checksums, magic values, and atomically compared syntax tokens.
  • Blobs of checksummed or encrypted data.
  • "Pure" data blocks with no encryption or checksum guards.

Here's a quick demo, showing afl-analyze figure out that when running cut -d ' ' -f1, only the spaces and newlines really matter in any way:

$ ./afl-analyze -i test ./cut -d' ' -f1 afl-analyze 1.97b by [+] Read 30 bytes from 'test'. [*] Performing dry run (mem limit = 25 MB, timeout = 1000 ms)... [*] Analyzing input file... [000000] h e l l o `-> Apparent no-op blob (len = 5) [000005] 20 `-> Critical byte (len = 1) [000006] w o r l d `-> Apparent no-op blob (len = 5) [000011] 0a `-> Critical byte (len = 1) [000012] w h a t `-> Apparent no-op blob (len = 4) [000016] 20 `-> Critical byte (len = 1) [000017] i s `-> Apparent no-op blob (len = 2) [000019] 20 `-> Critical byte (len = 1) [000020] g o i n g `-> Apparent no-op blob (len = 5) [000025] 20 `-> Critical byte (len = 1) [000026] o n ? `-> Apparent no-op blob (len = 3) [000029] 0a `-> Critical byte (len = 1)

Interestingly, the fact that offset #19 is flagged as a "critical byte" it also tells us that cut always tokenizes the entire line, even if all we're asking is the first field.

Of course, the program is better-suited for incomprehensible binary formats than for simple text utilities; it can also work with black-box binaries, thanks to the QEMU integration supported in AFL for a longer while. Let's try libpng instead:

[000000] 89 P N G `-> Potential checksum or magic value (len = 4) [000004] 0d 0a 1a 0a `-> Atomically compared value (len = 4) [000008] 00 00 00 0d `-> Potential length field (len = 2) [000012] I H D R `-> Atomically compared value (len = 4) [000016] 00 00 00 20 [width] `-> Critical data blob (len = 4) ...

This checks out: we have two four-byte signatures, followed by chunk length, four-byte chunk name, and chunk length. Neat, right? Now, be warned that it shipped just moments ago and is still a bit experimental - field testing and feedback welcome!

(The approach can be likely refined by looking at how much the execution path changes in response to input tweaks. I'm tempted to let the tool just generate a color-coded hex dump, based on Hamming distance to the original exec map.)


Viewing all articles
Browse latest Browse all 39498

Trending Articles