diff mbox

[v2,10/13] fuzz/x86_emulate: Make input more compact

Message ID 20170925142648.25959-10-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap Sept. 25, 2017, 2:26 p.m. UTC
At the moment, AFL reckons that for any given input, 87% of it is
completely irrelevant: that is, it can change it as much as it wants
but have no impact on the result of the test; and yet it can't remove
it.

This is largely because we interpret the blob handed to us as a large
struct, including CR values, MSR values, segment registers, and a full
cpu_user_regs.

Instead, modify our interpretation to have a "set state" stanza at the
front.  Begin by reading a byte; if it is lower than a certain
threshold, set some state according to what byte it is, and repeat.
Continue until the byte is above a certain threshold.

This allows AFL to compact any given test case much smaller; to the
point where now it reckons there is not a single byte of the test file
which becomes irrelevant.  Testing have shown that this option both
allows AFL to reach coverage much faster, and to have a total coverage
higher than with the old format.

Make this an option (rather than a unilateral change) to enable
side-by-side performance comparison of the old and new formats.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Port over previous changes

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 13 +++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   | 94 ++++++++++++++++++++---
 2 files changed, 97 insertions(+), 10 deletions(-)

Comments

Jan Beulich Oct. 4, 2017, 8:26 a.m. UTC | #1
>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
> @@ -22,13 +25,17 @@ int main(int argc, char **argv)
>      setbuf(stdin, NULL);
>      setbuf(stdout, NULL);
>  
> +    opt_compact = true;

How about giving the variable an initializer instead?

> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> @@ -53,6 +53,15 @@ struct fuzz_state
>  };
>  #define DATA_OFFSET offsetof(struct fuzz_state, corpus)
>  
> +bool opt_compact;
> +
> +unsigned int fuzz_minimal_input_size(void)
> +{
> +    if ( opt_compact )
> +        return sizeof(unsigned long) + 1;

What is this value choice based on / derived from? Oh, judging from
code further down it may be one more than the size of the options
field, in which case it should be sizeof(...->options) here.

> +    else

I'd prefer if an else like this one was dropped.

> @@ -647,9 +656,81 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>  {
>      struct fuzz_state *s = ctxt->data;
>  
> -    /* Fuzz all of the state in one go */
> -    if (!input_read(s, s, DATA_OFFSET))
> -        exit(-1);
> +    if ( !opt_compact )
> +    {
> +        /* Fuzz all of the state in one go */
> +        if (!input_read(s, s, DATA_OFFSET))

Missing blanks.

> +            exit(-1);
> +        return;
> +    }
> +
> +    /* Modify only select bits of state */
> +
> +    /* Always read 'options' */
> +    if ( !input_read(s, &s->options, sizeof(s->options)) )
> +        return;
> +    
> +    while(1) {

Style. And for compatibility (read: no warnings) with as wide a range
of compilers as possible, generally for ( ; ; ) is better to use.

> +        uint16_t offset;
> +
> +        /* Read 16 bits to decide what bit of state to modify */
> +        if ( !input_read(s, &offset, sizeof(offset)) )
> +            return;

Doesn't this suggest minimal input size wants to be one higher than
what you currently enforce? And isn't the use of uint16_t here in
conflict with the description talking about reading a byte every time?

> +        /* 
> +         * Then decide if it's "pointing to" different bits of the
> +         * state 
> +         */
> +
> +        /* cr[]? */
> +        if ( offset < 5 )

ARRAY_SIZE()

> +        {
> +            if ( !input_read(s, s->cr + offset, sizeof(*s->cr)) )
> +                return;
> +            printf("Setting CR %d to %lx\n", offset, s->cr[offset]);
> +            continue;
> +        }
> +        
> +        offset -= 5;

Same here then.

> +        /* msr[]? */
> +        if ( offset < MSR_INDEX_MAX )

Even here (and below) use of ARRAY_SIZE() may be better.

> +        {
> +            if ( !input_read(s, s->msr + offset, sizeof(*s->msr)) )
> +                return;
> +            printf("Setting MSR i%d (%x) to %lx\n", offset,
> +                   msr_index[offset], s->msr[offset]);
> +            continue;
> +        }
> +
> +        offset -= MSR_INDEX_MAX;
> +
> +        /* segments[]? */
> +        if ( offset < SEG_NUM )
> +        {
> +            if ( !input_read(s, s->segments + offset, sizeof(*s->segments)) )
> +                return;
> +            printf("Setting Segment %d\n", offset);
> +            continue;
> +            
> +        }
> +
> +        offset -= SEG_NUM;
> +
> +        /* regs? */
> +        if ( offset < sizeof(struct cpu_user_regs)
> +             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
> +        {
> +            if ( !input_read(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
> +                return;
> +            printf("Setting cpu_user_regs offset %x\n", offset);
> +            continue;
> +        }
> +
> +        /* None of the above -- take that as "start emulating" */
> +        
> +        return;
> +    }

Having come here I wonder whether the use of "byte" in the description
is right, and you mean "uint8_t offset" above, as you're far from
consuming the entire 256 value range.

Additionally, was the order of elements here chosen for any specific
reason? It would seem to me that elements having a more significant
effect on emulation may be worth filling first, and I'm not convinced
the "all CRs, all MSRs, all SREGs, all GPRs" order matches that.

Jan
George Dunlap Oct. 5, 2017, 3:04 p.m. UTC | #2
On 10/04/2017 09:26 AM, Jan Beulich wrote:
>>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
>> @@ -22,13 +25,17 @@ int main(int argc, char **argv)
>>      setbuf(stdin, NULL);
>>      setbuf(stdout, NULL);
>>  
>> +    opt_compact = true;
> 
> How about giving the variable an initializer instead?

Actually, if we want fuzz-emul.c to be usable by itself (e.g., for the
Google ossfuz project), we *must* use a static initializer from within
fuzz-emul.c for it to have the correct defaults.  I'll change that...

> 
>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> @@ -53,6 +53,15 @@ struct fuzz_state
>>  };
>>  #define DATA_OFFSET offsetof(struct fuzz_state, corpus)
>>  
>> +bool opt_compact;
>> +
>> +unsigned int fuzz_minimal_input_size(void)
>> +{
>> +    if ( opt_compact )
>> +        return sizeof(unsigned long) + 1;
> 
> What is this value choice based on / derived from? Oh, judging from
> code further down it may be one more than the size of the options
> field, in which case it should be sizeof(...->options) here.

What about renaming DATA_OFFSET to DATA_SIZE_FULL, and adding
DATA_SIZE_COMPACT?

Then is could be:

    return (opt_compact ? DATA_SIZE_COMPACT : DATA_SIZE_FULL) + 1;

>> @@ -647,9 +656,81 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>>  {
>>      struct fuzz_state *s = ctxt->data;
>>  
>> -    /* Fuzz all of the state in one go */
>> -    if (!input_read(s, s, DATA_OFFSET))
>> -        exit(-1);
>> +    if ( !opt_compact )
>> +    {
>> +        /* Fuzz all of the state in one go */
>> +        if (!input_read(s, s, DATA_OFFSET))
> 
> Missing blanks.

Ack

> 
>> +            exit(-1);
>> +        return;
>> +    }
>> +
>> +    /* Modify only select bits of state */
>> +
>> +    /* Always read 'options' */
>> +    if ( !input_read(s, &s->options, sizeof(s->options)) )
>> +        return;
>> +    
>> +    while(1) {
> 
> Style. And for compatibility (read: no warnings) with as wide a range
> of compilers as possible, generally for ( ; ; ) is better to use.

I can do that; but would you mind explaining?  What kinds of compilers
don't like while(1)?

>> +        uint16_t offset;
>> +
>> +        /* Read 16 bits to decide what bit of state to modify */
>> +        if ( !input_read(s, &offset, sizeof(offset)) )
>> +            return;
> 
> Doesn't this suggest minimal input size wants to be one higher than
> what you currently enforce? And isn't the use of uint16_t here in
> conflict with the description talking about reading a byte every time?

Hmm, actually it rather implies that it should be one less... with the
new format there's no way to guarantee that the very first insn_fetch
will have any data to read.

>> +        /* 
>> +         * Then decide if it's "pointing to" different bits of the
>> +         * state 
>> +         */
>> +
>> +        /* cr[]? */
>> +        if ( offset < 5 )
> 
> ARRAY_SIZE()

Ack

>> +        {
>> +            if ( !input_read(s, s->cr + offset, sizeof(*s->cr)) )
>> +                return;
>> +            printf("Setting CR %d to %lx\n", offset, s->cr[offset]);
>> +            continue;
>> +        }
>> +        
>> +        offset -= 5;
> 
> Same here then.
> 
>> +        /* msr[]? */
>> +        if ( offset < MSR_INDEX_MAX )
> 
> Even here (and below) use of ARRAY_SIZE() may be better.
> 
>> +        {
>> +            if ( !input_read(s, s->msr + offset, sizeof(*s->msr)) )
>> +                return;
>> +            printf("Setting MSR i%d (%x) to %lx\n", offset,
>> +                   msr_index[offset], s->msr[offset]);
>> +            continue;
>> +        }
>> +
>> +        offset -= MSR_INDEX_MAX;
>> +
>> +        /* segments[]? */
>> +        if ( offset < SEG_NUM )
>> +        {
>> +            if ( !input_read(s, s->segments + offset, sizeof(*s->segments)) )
>> +                return;
>> +            printf("Setting Segment %d\n", offset);
>> +            continue;
>> +            
>> +        }
>> +
>> +        offset -= SEG_NUM;
>> +
>> +        /* regs? */
>> +        if ( offset < sizeof(struct cpu_user_regs)
>> +             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
>> +        {
>> +            if ( !input_read(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
>> +                return;
>> +            printf("Setting cpu_user_regs offset %x\n", offset);
>> +            continue;
>> +        }
>> +
>> +        /* None of the above -- take that as "start emulating" */
>> +        
>> +        return;
>> +    }
> 
> Having come here I wonder whether the use of "byte" in the description
> is right, and you mean "uint8_t offset" above, as you're far from
> consuming the entire 256 value range.

Isn't cpu_user_regs larger than 256 bytes?  And in any case, the offset
will become larger than 256 bytes one we include the FPU state.

> Additionally, was the order of elements here chosen for any specific
> reason? It would seem to me that elements having a more significant
> effect on emulation may be worth filling first, and I'm not convinced
> the "all CRs, all MSRs, all SREGs, all GPRs" order matches that.

I'm not aware of any particular order; it's probably some combination of
"the order they were in the cpu_regs struct" and "the order in which I
found it useful to add them".  Given that the input will be more or less
random, I don't think the order in the struct will have too much of an
impact on the order in which AFL explores them.

If you have an alternative suggestion for an order you think would be
more logical I'm happy to rearrange the structure.

 -George
Jan Beulich Oct. 5, 2017, 3:37 p.m. UTC | #3
>>> On 05.10.17 at 17:04, <george.dunlap@citrix.com> wrote:
> On 10/04/2017 09:26 AM, Jan Beulich wrote:
>>>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
>>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> @@ -53,6 +53,15 @@ struct fuzz_state
>>>  };
>>>  #define DATA_OFFSET offsetof(struct fuzz_state, corpus)
>>>  
>>> +bool opt_compact;
>>> +
>>> +unsigned int fuzz_minimal_input_size(void)
>>> +{
>>> +    if ( opt_compact )
>>> +        return sizeof(unsigned long) + 1;
>> 
>> What is this value choice based on / derived from? Oh, judging from
>> code further down it may be one more than the size of the options
>> field, in which case it should be sizeof(...->options) here.
> 
> What about renaming DATA_OFFSET to DATA_SIZE_FULL, and adding
> DATA_SIZE_COMPACT?
> 
> Then is could be:
> 
>     return (opt_compact ? DATA_SIZE_COMPACT : DATA_SIZE_FULL) + 1;

Looks fine.

>>> +            exit(-1);
>>> +        return;
>>> +    }
>>> +
>>> +    /* Modify only select bits of state */
>>> +
>>> +    /* Always read 'options' */
>>> +    if ( !input_read(s, &s->options, sizeof(s->options)) )
>>> +        return;
>>> +    
>>> +    while(1) {
>> 
>> Style. And for compatibility (read: no warnings) with as wide a range
>> of compilers as possible, generally for ( ; ; ) is better to use.
> 
> I can do that; but would you mind explaining?  What kinds of compilers
> don't like while(1)?

In various projects of my own I have on (and targeting) Windows
I see this with almost every compiler I happen to use there. Hence
I've started to avoid the construct many years ago.

>>> +        {
>>> +            if ( !input_read(s, s->msr + offset, sizeof(*s->msr)) )
>>> +                return;
>>> +            printf("Setting MSR i%d (%x) to %lx\n", offset,
>>> +                   msr_index[offset], s->msr[offset]);
>>> +            continue;
>>> +        }
>>> +
>>> +        offset -= MSR_INDEX_MAX;
>>> +
>>> +        /* segments[]? */
>>> +        if ( offset < SEG_NUM )
>>> +        {
>>> +            if ( !input_read(s, s->segments + offset, sizeof(*s->segments)) )
>>> +                return;
>>> +            printf("Setting Segment %d\n", offset);
>>> +            continue;
>>> +            
>>> +        }
>>> +
>>> +        offset -= SEG_NUM;
>>> +
>>> +        /* regs? */
>>> +        if ( offset < sizeof(struct cpu_user_regs)
>>> +             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
>>> +        {
>>> +            if ( !input_read(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
>>> +                return;
>>> +            printf("Setting cpu_user_regs offset %x\n", offset);
>>> +            continue;
>>> +        }
>>> +
>>> +        /* None of the above -- take that as "start emulating" */
>>> +        
>>> +        return;
>>> +    }
>> 
>> Having come here I wonder whether the use of "byte" in the description
>> is right, and you mean "uint8_t offset" above, as you're far from
>> consuming the entire 256 value range.
> 
> Isn't cpu_user_regs larger than 256 bytes?  And in any case, the offset
> will become larger than 256 bytes one we include the FPU state.

Oh, of course. I've somehow stopped summing at the point 
SEG_NUM is being subtracted.

>> Additionally, was the order of elements here chosen for any specific
>> reason? It would seem to me that elements having a more significant
>> effect on emulation may be worth filling first, and I'm not convinced
>> the "all CRs, all MSRs, all SREGs, all GPRs" order matches that.
> 
> I'm not aware of any particular order; it's probably some combination of
> "the order they were in the cpu_regs struct" and "the order in which I
> found it useful to add them".  Given that the input will be more or less
> random, I don't think the order in the struct will have too much of an
> impact on the order in which AFL explores them.

Well, yes, except for very small input (which will leave "higher"
parts unrandomized).

> If you have an alternative suggestion for an order you think would be
> more logical I'm happy to rearrange the structure.

Generally I'd expect GPRs to be most relevant to change value,
but them going first might be sort of ugly, as they're quite big.
For the others I'd say SREGs, CRs, then MSRs. But if it doesn't
really matter (i.e. if small input isn't of much concern, as you
suggest above), you may as well leave things the way they are.

Jan
diff mbox

Patch

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 669f698711..806f54d606 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -4,6 +4,7 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <getopt.h>
+#include <stdbool.h>
 
 extern int LLVMFuzzerInitialize(int *argc, char ***argv);
 extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
@@ -12,6 +13,8 @@  extern unsigned int fuzz_minimal_input_size(void);
 #define INPUT_SIZE  4096
 static uint8_t input[INPUT_SIZE];
 
+extern bool opt_compact;
+
 int main(int argc, char **argv)
 {
     size_t size;
@@ -22,13 +25,17 @@  int main(int argc, char **argv)
     setbuf(stdin, NULL);
     setbuf(stdout, NULL);
 
+    opt_compact = true;
+
     while ( 1 )
     {
         enum {
             OPT_MIN_SIZE,
+            OPT_COMPACT,
         };
         static const struct option lopts[] = {
             { "min-input-size", no_argument, NULL, OPT_MIN_SIZE },
+            { "compact", required_argument, NULL, OPT_COMPACT },
             { 0, 0, 0, 0 }
         };
         int c = getopt_long_only(argc, argv, "", lopts, NULL);
@@ -43,8 +50,12 @@  int main(int argc, char **argv)
             exit(0);
             break;
 
+        case OPT_COMPACT:
+            opt_compact = atoi(optarg);
+            break;
+            
         case '?':
-            printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s [--compact=0|1] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index c8a5507f8b..d99a50d12c 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -53,6 +53,15 @@  struct fuzz_state
 };
 #define DATA_OFFSET offsetof(struct fuzz_state, corpus)
 
+bool opt_compact;
+
+unsigned int fuzz_minimal_input_size(void)
+{
+    if ( opt_compact )
+        return sizeof(unsigned long) + 1;
+    else
+        return DATA_OFFSET + 1;
+}
 
 static inline bool input_available(struct fuzz_state *s, size_t size)
 {
@@ -647,9 +656,81 @@  static void setup_state(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
 
-    /* Fuzz all of the state in one go */
-    if (!input_read(s, s, DATA_OFFSET))
-        exit(-1);
+    if ( !opt_compact )
+    {
+        /* Fuzz all of the state in one go */
+        if (!input_read(s, s, DATA_OFFSET))
+            exit(-1);
+        return;
+    }
+
+    /* Modify only select bits of state */
+
+    /* Always read 'options' */
+    if ( !input_read(s, &s->options, sizeof(s->options)) )
+        return;
+    
+    while(1) {
+        uint16_t offset;
+
+        /* Read 16 bits to decide what bit of state to modify */
+        if ( !input_read(s, &offset, sizeof(offset)) )
+            return;
+
+        /* 
+         * Then decide if it's "pointing to" different bits of the
+         * state 
+         */
+
+        /* cr[]? */
+        if ( offset < 5 )
+        {
+            if ( !input_read(s, s->cr + offset, sizeof(*s->cr)) )
+                return;
+            printf("Setting CR %d to %lx\n", offset, s->cr[offset]);
+            continue;
+        }
+        
+        offset -= 5;
+
+        /* msr[]? */
+        if ( offset < MSR_INDEX_MAX )
+        {
+            if ( !input_read(s, s->msr + offset, sizeof(*s->msr)) )
+                return;
+            printf("Setting MSR i%d (%x) to %lx\n", offset,
+                   msr_index[offset], s->msr[offset]);
+            continue;
+        }
+
+        offset -= MSR_INDEX_MAX;
+
+        /* segments[]? */
+        if ( offset < SEG_NUM )
+        {
+            if ( !input_read(s, s->segments + offset, sizeof(*s->segments)) )
+                return;
+            printf("Setting Segment %d\n", offset);
+            continue;
+            
+        }
+
+        offset -= SEG_NUM;
+
+        /* regs? */
+        if ( offset < sizeof(struct cpu_user_regs)
+             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
+        {
+            if ( !input_read(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
+                return;
+            printf("Setting cpu_user_regs offset %x\n", offset);
+            continue;
+        }
+
+        /* None of the above -- take that as "start emulating" */
+        
+        return;
+    }
 }
 
 #define CANONICALIZE(x)                                   \
@@ -821,7 +902,7 @@  int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     /* Reset all global state variables */
     memset(&input, 0, sizeof(input));
 
-    if ( size <= DATA_OFFSET )
+    if ( size < fuzz_minimal_input_size() )
     {
         printf("Input too small\n");
         return 1;
@@ -858,11 +939,6 @@  int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     return 0;
 }
 
-unsigned int fuzz_minimal_input_size(void)
-{
-    return DATA_OFFSET + 1;
-}
-
 /*
  * Local variables:
  * mode: C