diff mbox

[11/14] fuzz/x86_emulate: Make input more compact

Message ID 20170825164343.29015-11-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap Aug. 25, 2017, 4:43 p.m. UTC
At the moment, AFL reckons that for any given input, 87% of it is
completely irrelevant: that is, it can change it as much as it wants
but have no impact on the result of the test; and yet it can't remove
it.

This is largely because we interpret the blob handed to us as a large
struct, including CR values, MSR values, segment registers, and a full
cpu_user_regs.

Instead, modify our interpretation to have a "set state" stanza at the
front.  Begin by reading a byte; if it is lower than a certain
threshold, set some state according to what byte it is, and repeat.
Continue until the byte is above a certain threshold.

This allows AFL to compact any given test case much smaller; to the
point where now it reckons there is not a single byte of the test file
which becomes irrelevant.  Testing have shown that this option both
allows AFL to reach coverage much faster, and to have a total coverage
higher than with the old format.

Make this an option (rather than a unilateral change) to enable
side-by-side performance comparison of the old and new formats.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
I'll reply to this e-mail with a graph of some tests I ran.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 13 +++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   | 94 ++++++++++++++++++++---
 2 files changed, 97 insertions(+), 10 deletions(-)

Comments

George Dunlap Aug. 25, 2017, 4:52 p.m. UTC | #1
On 08/25/2017 05:43 PM, George Dunlap wrote:
> At the moment, AFL reckons that for any given input, 87% of it is
> completely irrelevant: that is, it can change it as much as it wants
> but have no impact on the result of the test; and yet it can't remove
> it.
> 
> This is largely because we interpret the blob handed to us as a large
> struct, including CR values, MSR values, segment registers, and a full
> cpu_user_regs.
> 
> Instead, modify our interpretation to have a "set state" stanza at the
> front.  Begin by reading a byte; if it is lower than a certain
> threshold, set some state according to what byte it is, and repeat.
> Continue until the byte is above a certain threshold.
> 
> This allows AFL to compact any given test case much smaller; to the
> point where now it reckons there is not a single byte of the test file
> which becomes irrelevant.  Testing have shown that this option both
> allows AFL to reach coverage much faster, and to have a total coverage
> higher than with the old format.
> 
> Make this an option (rather than a unilateral change) to enable
> side-by-side performance comparison of the old and new formats.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> I'll reply to this e-mail with a graph of some tests I ran.

Here are the results of my testing justifying the new defaults for
whether to use 'compact' state definition or not, and how many
instructions to execute.

'new-format' had --compact=1, 'old-format' was run with --compact=0.

no-limit was run with --instruction-limit=0, limit-1 with
--instruction-limit=1, and so on.

In each case I ran 4 parallel AFL instances on my workstation for 24
hours.

The attached graph shows the number of unique AFL 'tuples' touched
over time.  (See [1] for more information.)  Having the line be higher
overall at the end of the run (i.e., best branch coverage) is the
primary goal; having the line shift over to the left (quickest
discovery) is a secondary goal.  The two are related in the sense that
quicker discovery should allow more time to explore further search
space.

The combination that had both the quickest discovery and the highest
overall branch coverage was the new 'compact' format with no
instruction limit (the defaults set by this patch).

In the 'compact' format, limiting the number of instructions seemed
only to slow things down and reduce the final coverage amount
(although the final coverage for limit-1 did beat the final coverages
for other limits).

In the 'non-compact' format, having unlimited instructions seems very
much to slow down discovery; but it also increased the final coverage
count.

 -George

[1] http://lcamtuf.coredump.cx/afl/technical_details.txt
diff mbox

Patch

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 79f8aec653..12b3765dcc 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -4,6 +4,7 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <getopt.h>
+#include <stdbool.h>
 
 extern int LLVMFuzzerInitialize(int *argc, char ***argv);
 extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
@@ -12,6 +13,8 @@  extern unsigned int fuzz_minimal_input_size(void);
 #define INPUT_SIZE  4096
 static uint8_t input[INPUT_SIZE];
 
+extern bool opt_compact;
+
 int main(int argc, char **argv)
 {
     size_t size;
@@ -22,13 +25,17 @@  int main(int argc, char **argv)
     setbuf(stdin, NULL);
     setbuf(stdout, NULL);
 
+    opt_compact = true;
+
     while ( 1 )
     {
         enum {
             OPT_MIN_SIZE,
+            OPT_COMPACT,
         };
         static const struct option lopts[] = {
             { "min-input-size", no_argument, NULL, OPT_MIN_SIZE },
+            { "compact", required_argument, NULL, OPT_COMPACT },
             { 0, 0, 0, 0 }
         };
         int c = getopt_long_only(argc, argv, "", lopts, NULL);
@@ -43,8 +50,12 @@  int main(int argc, char **argv)
             exit(0);
             break;
 
+        case OPT_COMPACT:
+            opt_compact = atoi(optarg);
+            break;
+            
         case '?':
-            printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s [--compact=0|1] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 89d1714125..48b02f2bf6 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -53,6 +53,15 @@  struct fuzz_state
 };
 #define DATA_OFFSET offsetof(struct fuzz_state, corpus)
 
+bool opt_compact;
+
+unsigned int fuzz_minimal_input_size(void)
+{
+    if ( opt_compact )
+        return sizeof(unsigned long) + 1;
+    else
+        return DATA_OFFSET + 1;
+}
 
 static inline int davail(struct fuzz_state *s, size_t size)
 {
@@ -647,9 +656,81 @@  static void setup_state(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
 
-    /* Fuzz all of the state in one go */
-    if (!dread(s, s, DATA_OFFSET))
-        exit(-1);
+    if ( !opt_compact )
+    {
+        /* Fuzz all of the state in one go */
+        if (!dread(s, s, DATA_OFFSET))
+            exit(-1);
+        return;
+    }
+
+    /* Modify only select bits of state */
+
+    /* Always read 'options' */
+    if ( !dread(s, &s->options, sizeof(s->options)) )
+        return;
+    
+    while(1) {
+        uint16_t offset;
+
+        /* Read 16 bits to decide what bit of state to modify */
+        if ( !dread(s, &offset, sizeof(offset)) )
+            return;
+
+        /* 
+         * Then decide if it's "pointing to" different bits of the
+         * state 
+         */
+
+        /* cr[]? */
+        if ( offset < 5 )
+        {
+            if ( !dread(s, s->cr + offset, sizeof(*s->cr)) )
+                return;
+            printf("Setting CR %d to %lx\n", offset, s->cr[offset]);
+            continue;
+        }
+        
+        offset -= 5;
+
+        /* msr[]? */
+        if ( offset < MSR_INDEX_MAX )
+        {
+            if ( !dread(s, s->msr + offset, sizeof(*s->msr)) )
+                return;
+            printf("Setting MSR i%d (%x) to %lx\n", offset,
+                   msr_index[offset], s->msr[offset]);
+            continue;
+        }
+
+        offset -= MSR_INDEX_MAX;
+
+        /* segments[]? */
+        if ( offset < SEG_NUM )
+        {
+            if ( !dread(s, s->segments + offset, sizeof(*s->segments)) )
+                return;
+            printf("Setting Segment %d\n", offset);
+            continue;
+            
+        }
+
+        offset -= SEG_NUM;
+
+        /* regs? */
+        if ( offset < sizeof(struct cpu_user_regs)
+             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
+        {
+            if ( !dread(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
+                return;
+            printf("Setting cpu_user_regs offset %x\n", offset);
+            continue;
+        }
+
+        /* None of the above -- take that as "start emulating" */
+        
+        return;
+    }
 }
 
 #define CANONICALIZE(x)                                   \
@@ -821,7 +902,7 @@  int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     /* Reset all global state variables */
     memset(&input, 0, sizeof(input));
 
-    if ( size <= DATA_OFFSET )
+    if ( size < fuzz_minimal_input_size() )
     {
         printf("Input too small\n");
         return 1;
@@ -858,11 +939,6 @@  int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     return 0;
 }
 
-unsigned int fuzz_minimal_input_size(void)
-{
-    return DATA_OFFSET + 1;
-}
-
 /*
  * Local variables:
  * mode: C