diff mbox

[v2,08/13] fuzz/x86_emulate: Take multiple test files for inputs

Message ID 20170925142648.25959-8-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap Sept. 25, 2017, 2:26 p.m. UTC
Finding aggregate coverage for a set of test files means running each
afl-generated test case through the harness.  At the moment, this is
done by re-executing afl-harness-cov with each input file.  When a
large number of test cases have been generated, this can take a
significant amonut of time; a recent test with 30k total files
generated by 4 parallel fuzzers took over 7 minutes.

The vast majority of this time is taken up with 'exec', however.
Since the harness is already designed to loop over multiple inputs for
llvm "persistent mode", just allow it to take a large number of inputs
on the same when *not* running in llvm "persistent mode"..  Then the
command can be efficiently executed like this:

  ls */queue/id* | xargs $path/afl-harness-cov

For the above-mentioned test on 30k files, the time to generate
coverage data was reduced from 7 minutes to under 30 seconds.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v2:
- Make check for batch processing more clear

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/README.afl                             |  7 +++++++
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 24 ++++++++++++++++-------
 2 files changed, 24 insertions(+), 7 deletions(-)

Comments

Jan Beulich Oct. 4, 2017, 8:24 a.m. UTC | #1
>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
> --- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
> +++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
> @@ -16,6 +16,8 @@ int main(int argc, char **argv)
>  {
>      size_t size;
>      FILE *fp = NULL;
> +    int count = 0;
> +    int max;

Generally speaking these should be unsigned int, but I see how this
collides with the types of the variables max is being calculated from.
In any event both could go on a single line.

> @@ -66,11 +70,14 @@ int main(int argc, char **argv)
>      __AFL_INIT();
>  
>      while ( __AFL_LOOP(1000) )
> +#else
> +    for( count = 0; count < max; count++ )

Initially I've thought the initializer on count was pointless further
up because of the re-initialization here. Of course that's needed
because of the #if/#else this sits in. Hence I wonder whether omitting
the assignment here wouldn't be appropriate - it wouldn't really be
wromng for a compiler to warn about this redundancy.

Either way
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan
George Dunlap Oct. 4, 2017, 4:58 p.m. UTC | #2
On 10/04/2017 09:24 AM, Jan Beulich wrote:
>>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
>> --- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
>> +++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
>> @@ -16,6 +16,8 @@ int main(int argc, char **argv)
>>  {
>>      size_t size;
>>      FILE *fp = NULL;
>> +    int count = 0;
>> +    int max;
> 
> Generally speaking these should be unsigned int, but I see how this
> collides with the types of the variables max is being calculated from.
> In any event both could go on a single line.

Ack

> 
>> @@ -66,11 +70,14 @@ int main(int argc, char **argv)
>>      __AFL_INIT();
>>  
>>      while ( __AFL_LOOP(1000) )
>> +#else
>> +    for( count = 0; count < max; count++ )
> 
> Initially I've thought the initializer on count was pointless further
> up because of the re-initialization here. Of course that's needed
> because of the #if/#else this sits in. Hence I wonder whether omitting
> the assignment here wouldn't be appropriate - it wouldn't really be
> wromng for a compiler to warn about this redundancy.

Could do that I suppose.  The other option would be to change the other
loop to `for ( count = 0; __AFL_LOOP(1000); )`

Let me know if you have any preference.

> Either way
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks,
 -George
Jan Beulich Oct. 5, 2017, 9:08 a.m. UTC | #3
>>> On 04.10.17 at 18:58, <george.dunlap@citrix.com> wrote:
> On 10/04/2017 09:24 AM, Jan Beulich wrote:
>>>>> On 25.09.17 at 16:26, <george.dunlap@citrix.com> wrote:
>>> @@ -66,11 +70,14 @@ int main(int argc, char **argv)
>>>      __AFL_INIT();
>>>  
>>>      while ( __AFL_LOOP(1000) )
>>> +#else
>>> +    for( count = 0; count < max; count++ )
>> 
>> Initially I've thought the initializer on count was pointless further
>> up because of the re-initialization here. Of course that's needed
>> because of the #if/#else this sits in. Hence I wonder whether omitting
>> the assignment here wouldn't be appropriate - it wouldn't really be
>> wromng for a compiler to warn about this redundancy.
> 
> Could do that I suppose.  The other option would be to change the other
> loop to `for ( count = 0; __AFL_LOOP(1000); )`
> 
> Let me know if you have any preference.

No preference, your alternative is fine.

Jan
diff mbox

Patch

diff --git a/tools/fuzz/README.afl b/tools/fuzz/README.afl
index 0d955b2687..e8c23d734c 100644
--- a/tools/fuzz/README.afl
+++ b/tools/fuzz/README.afl
@@ -49,6 +49,13 @@  generate coverage data.  To do this, use the target `afl-cov`:
 
     $ make afl-cov #produces afl-harness-cov
 
+In order to speed up the process of checking total coverage,
+`afl-harness-cov` can take several test inputs on its command-line;
+the speed-up effect should be similar to that of using afl-clang-fast.
+You can use xargs to do this most efficiently, like so:
+
+    $ ls queue/id* | xargs $path/afl-harness-cov
+
 NOTE: Please also note that the coverage instrumentation hard-codes
 the absolute path for the instrumentation read and write files in the
 binary; so coverage data will always show up in the build directory no
diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index b4d15451b5..669f698711 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -16,6 +16,8 @@  int main(int argc, char **argv)
 {
     size_t size;
     FILE *fp = NULL;
+    int count = 0;
+    int max;
 
     setbuf(stdin, NULL);
     setbuf(stdout, NULL);
@@ -42,8 +44,7 @@  int main(int argc, char **argv)
             break;
 
         case '?':
-        usage:
-            printf("Usage: %s $FILE | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
@@ -54,10 +55,13 @@  int main(int argc, char **argv)
         }
     }
 
-    if ( optind == argc ) /* No positional parameters.  Use stdin. */
+    max = argc - optind;
+
+    if ( !max ) /* No positional parameters.  Use stdin. */
+    {
+        max = 1;
         fp = stdin;
-    else if ( optind != (argc - 1) )
-        goto usage;
+    }
 
     if ( LLVMFuzzerInitialize(&argc, &argv) )
         exit(-1);
@@ -66,11 +70,14 @@  int main(int argc, char **argv)
     __AFL_INIT();
 
     while ( __AFL_LOOP(1000) )
+#else
+    for( count = 0; count < max; count++ )
 #endif
     {
         if ( fp != stdin ) /* If not using stdin, open the provided file. */
         {
-            fp = fopen(argv[optind], "rb");
+            printf("Opening file %s\n", argv[optind]);
+            fp = fopen(argv[optind + count], "rb");
             if ( fp == NULL )
             {
                 perror("fopen");
@@ -89,7 +96,10 @@  int main(int argc, char **argv)
         if ( !feof(fp) )
         {
             printf("Input too large\n");
-            exit(-1);
+            /* Don't exit if we're doing batch processing */
+            if ( max == 1 )
+                exit(-1);
+            continue;
         }
 
         if ( fp != stdin )