diff mbox series

[v3,02/16] fuzz: Add general virtual-device fuzzer

Message ID 20200921022506.873303-3-alxndr@bu.edu (mailing list archive)
State New, archived
Headers show
Series Add a General Virtual Device Fuzzer | expand

Commit Message

Alexander Bulekov Sept. 21, 2020, 2:24 a.m. UTC
This is a generic fuzzer designed to fuzz a virtual device's
MemoryRegions, as long as they exist within the Memory or Port IO (if it
exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
of qtest commands (outb, readw, etc). The interpreted commands are
separated by a magic seaparator, which should be easy for the fuzzer to
guess. Without ASan, the separator can be specified as a "dictionary
value" using the -dict argument (see libFuzzer documentation).

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 tests/qtest/fuzz/general_fuzz.c | 498 ++++++++++++++++++++++++++++++++
 tests/qtest/fuzz/meson.build    |   1 +
 2 files changed, 499 insertions(+)
 create mode 100644 tests/qtest/fuzz/general_fuzz.c

Comments

Philippe Mathieu-Daudé Sept. 21, 2020, 5:43 a.m. UTC | #1
Hi Alexander,

On 9/21/20 4:24 AM, Alexander Bulekov wrote:
> This is a generic fuzzer designed to fuzz a virtual device's
> MemoryRegions, as long as they exist within the Memory or Port IO (if it
> exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
> of qtest commands (outb, readw, etc). The interpreted commands are
> separated by a magic seaparator, which should be easy for the fuzzer to
> guess. Without ASan, the separator can be specified as a "dictionary
> value" using the -dict argument (see libFuzzer documentation).
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  tests/qtest/fuzz/general_fuzz.c | 498 ++++++++++++++++++++++++++++++++
>  tests/qtest/fuzz/meson.build    |   1 +
>  2 files changed, 499 insertions(+)
>  create mode 100644 tests/qtest/fuzz/general_fuzz.c
> 
> diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
> new file mode 100644
> index 0000000000..bf75b215ca
> --- /dev/null
> +++ b/tests/qtest/fuzz/general_fuzz.c
> @@ -0,0 +1,498 @@
> +/*
> + * General Virtual-Device Fuzzing Target
> + *
> + * Copyright Red Hat Inc., 2020
> + *
> + * Authors:
> + *  Alexander Bulekov   <alxndr@bu.edu>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include <wordexp.h>
> +
> +#include "hw/core/cpu.h"
> +#include "tests/qtest/libqos/libqtest.h"
> +#include "fuzz.h"
> +#include "fork_fuzz.h"
> +#include "exec/address-spaces.h"
> +#include "string.h"
> +#include "exec/memory.h"
> +#include "exec/ramblock.h"
> +#include "exec/address-spaces.h"
> +#include "hw/qdev-core.h"
> +
> +/*
> + * SEPARATOR is used to separate "operations" in the fuzz input
> + */
> +#define SEPARATOR "FUZZ"

Why use a separator when all pkt sizes are known?

Can you fuzz writing "FUZZ" in memory? Like:
OP_WRITE(0x100000, "UsingLibFUZZerString")?

> +
> +enum cmds {
> +    OP_IN,
> +    OP_OUT,
> +    OP_READ,
> +    OP_WRITE,
> +    OP_CLOCK_STEP,
> +};
> +
> +#define DEFAULT_TIMEOUT_US 100000
> +#define USEC_IN_SEC 100000000

Are you sure this definition is correct?

> +
> +typedef struct {
> +    ram_addr_t addr;
> +    ram_addr_t size; /* The number of bytes until the end of the I/O region */
> +} address_range;
> +
> +static useconds_t timeout = 100000;
[...]
Alexander Bulekov Sept. 21, 2020, 2:34 p.m. UTC | #2
On 200921 0743, Philippe Mathieu-Daudé wrote:
> Hi Alexander,
> 
> On 9/21/20 4:24 AM, Alexander Bulekov wrote:
> > This is a generic fuzzer designed to fuzz a virtual device's
> > MemoryRegions, as long as they exist within the Memory or Port IO (if it
> > exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
> > of qtest commands (outb, readw, etc). The interpreted commands are
> > separated by a magic seaparator, which should be easy for the fuzzer to
> > guess. Without ASan, the separator can be specified as a "dictionary
> > value" using the -dict argument (see libFuzzer documentation).
> > 
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > ---
> >  tests/qtest/fuzz/general_fuzz.c | 498 ++++++++++++++++++++++++++++++++
> >  tests/qtest/fuzz/meson.build    |   1 +
> >  2 files changed, 499 insertions(+)
> >  create mode 100644 tests/qtest/fuzz/general_fuzz.c
> > 
> > diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
> > new file mode 100644
> > index 0000000000..bf75b215ca
> > --- /dev/null
> > +++ b/tests/qtest/fuzz/general_fuzz.c
> > @@ -0,0 +1,498 @@
> > +/*
> > + * General Virtual-Device Fuzzing Target
> > + *
> > + * Copyright Red Hat Inc., 2020
> > + *
> > + * Authors:
> > + *  Alexander Bulekov   <alxndr@bu.edu>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +
> > +#include <wordexp.h>
> > +
> > +#include "hw/core/cpu.h"
> > +#include "tests/qtest/libqos/libqtest.h"
> > +#include "fuzz.h"
> > +#include "fork_fuzz.h"
> > +#include "exec/address-spaces.h"
> > +#include "string.h"
> > +#include "exec/memory.h"
> > +#include "exec/ramblock.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/qdev-core.h"
> > +
> > +/*
> > + * SEPARATOR is used to separate "operations" in the fuzz input
> > + */
> > +#define SEPARATOR "FUZZ"
> 
> Why use a separator when all pkt sizes are known?
Good point. 
1. When we add the DMA Pattern OP in patch 04/16, we now have
variable-width OPs.
2. Even when everything has a known size, take for example the input:
Acb Bd Caaaa Effff
Where Operation A has size 3, B: size 2, C size 5 ...:
Simply by removing the first byte, we now have a completely different
sequence of operations:
Cbbdc Aaa Aef Ff...
Thus the separators "add some stability" to random mutations:
Cb FUZZ Bd FUZZ Caaaa FUZZ Effff ...
(Cb is now invalid/ignored, but the rest of the commands are still
intact)
There is some libfuzzer documentation about this technique:
https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#magic-separator

There is also a promising "FuzzDataProvider" header library that lets
you directly call functions, such as ConsumeBytes, or
ConsumeIntegralInRange, but unfortunately it is a C++ header.
> 
> Can you fuzz writing "FUZZ" in memory? Like:
> OP_WRITE(0x100000, "UsingLibFUZZerString")?

No.. Hopefully that's not a huge problem.

> > +
> > +enum cmds {
> > +    OP_IN,
> > +    OP_OUT,
> > +    OP_READ,
> > +    OP_WRITE,
> > +    OP_CLOCK_STEP,
> > +};
> > +
> > +#define DEFAULT_TIMEOUT_US 100000
> > +#define USEC_IN_SEC 100000000
> 
> Are you sure this definition is correct?
> 
Thanks for the catch...

> > +
> > +typedef struct {
> > +    ram_addr_t addr;
> > +    ram_addr_t size; /* The number of bytes until the end of the I/O region */
> > +} address_range;
> > +
> > +static useconds_t timeout = 100000;
> [...]
>
Alexander Bulekov Sept. 22, 2020, 2:03 p.m. UTC | #3
On 200920 2224, Alexander Bulekov wrote:
[snip]
> +static int locate_fuzz_memory_regions(Object *child, void *opaque)
> +{
> +    const char *name;
> +    MemoryRegion *mr;
> +    if (object_dynamic_cast(child, TYPE_MEMORY_REGION)) {
> +        mr = MEMORY_REGION(child);
> +        if ((memory_region_is_ram(mr) ||
> +            memory_region_is_ram_device(mr) ||
> +            memory_region_is_rom(mr) ||
> +            memory_region_is_romd(mr)) == false) {
> +            name = object_get_canonical_path_component(child);

This isn't a great way to check whether MRs have ops with code that is
interesting to fuzz (for example the pflash MemoryRegions do not pass
these checks, so you can't fuzz the pflash device). Need to think of
some better checks to identify MRs that we are interested in fuzzing.

-Alex

> +            /*
> +             * We don't want duplicate pointers to the same MemoryRegion, so
> +             * try to remove copies of the pointer, before adding it.
> +             */
> +            g_hash_table_insert(fuzzable_memoryregions, mr, (gpointer)true);
> +        }
> +    }
> +    return 0;
> +}
> +static int locate_fuzz_objects(Object *child, void *opaque)
> +{
> +    char *pattern = opaque;
> +    if (g_pattern_match_simple(pattern, object_get_typename(child))) {
> +        /* Find and save ptrs to any child MemoryRegions */
> +        object_child_foreach_recursive(child, locate_fuzz_memory_regions, NULL);
> +
> +    } else if (object_dynamic_cast(OBJECT(child), TYPE_MEMORY_REGION)) {
> +        if (g_pattern_match_simple(pattern,
> +            object_get_canonical_path_component(child))) {
> +            MemoryRegion *mr;
> +            mr = MEMORY_REGION(child);
> +            if ((memory_region_is_ram(mr) ||
> +                 memory_region_is_ram_device(mr) ||
> +                 memory_region_is_rom(mr) ||
> +                 memory_region_is_romd(mr)) == false) {
> +                g_hash_table_insert(fuzzable_memoryregions, mr, (gpointer)true);
> +            }
> +        }
> +    }
> +    return 0;
> +}
Darren Kenny Oct. 1, 2020, 3:29 p.m. UTC | #4
Hi Alex,

On Monday, 2020-09-21 at 10:34:05 -04, Alexander Bulekov wrote:
> On 200921 0743, Philippe Mathieu-Daudé wrote:
>> Hi Alexander,
>> 
>> On 9/21/20 4:24 AM, Alexander Bulekov wrote:
>> > This is a generic fuzzer designed to fuzz a virtual device's
>> > MemoryRegions, as long as they exist within the Memory or Port IO (if it
>> > exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
>> > of qtest commands (outb, readw, etc). The interpreted commands are
>> > separated by a magic seaparator, which should be easy for the fuzzer to
>> > guess. Without ASan, the separator can be specified as a "dictionary
>> > value" using the -dict argument (see libFuzzer documentation).
>> > 
>> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
>> > ---
>> >  tests/qtest/fuzz/general_fuzz.c | 498 ++++++++++++++++++++++++++++++++
>> >  tests/qtest/fuzz/meson.build    |   1 +
>> >  2 files changed, 499 insertions(+)
>> >  create mode 100644 tests/qtest/fuzz/general_fuzz.c
>> > 
>> > diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
>> > new file mode 100644
>> > index 0000000000..bf75b215ca
>> > --- /dev/null
>> > +++ b/tests/qtest/fuzz/general_fuzz.c
>> > @@ -0,0 +1,498 @@
>> > +/*
>> > + * General Virtual-Device Fuzzing Target
>> > + *
>> > + * Copyright Red Hat Inc., 2020
>> > + *
>> > + * Authors:
>> > + *  Alexander Bulekov   <alxndr@bu.edu>
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> > + * See the COPYING file in the top-level directory.
>> > + */
>> > +
>> > +#include "qemu/osdep.h"
>> > +
>> > +#include <wordexp.h>
>> > +
>> > +#include "hw/core/cpu.h"
>> > +#include "tests/qtest/libqos/libqtest.h"
>> > +#include "fuzz.h"
>> > +#include "fork_fuzz.h"
>> > +#include "exec/address-spaces.h"
>> > +#include "string.h"
>> > +#include "exec/memory.h"
>> > +#include "exec/ramblock.h"
>> > +#include "exec/address-spaces.h"
>> > +#include "hw/qdev-core.h"
>> > +
>> > +/*
>> > + * SEPARATOR is used to separate "operations" in the fuzz input
>> > + */
>> > +#define SEPARATOR "FUZZ"
>> 
>> Why use a separator when all pkt sizes are known?
> Good point. 
> 1. When we add the DMA Pattern OP in patch 04/16, we now have
> variable-width OPs.
> 2. Even when everything has a known size, take for example the input:
> Acb Bd Caaaa Effff
> Where Operation A has size 3, B: size 2, C size 5 ...:
> Simply by removing the first byte, we now have a completely different
> sequence of operations:
> Cbbdc Aaa Aef Ff...
> Thus the separators "add some stability" to random mutations:
> Cb FUZZ Bd FUZZ Caaaa FUZZ Effff ...
> (Cb is now invalid/ignored, but the rest of the commands are still
> intact)
> There is some libfuzzer documentation about this technique:
> https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#magic-separator
>
> There is also a promising "FuzzDataProvider" header library that lets
> you directly call functions, such as ConsumeBytes, or
> ConsumeIntegralInRange, but unfortunately it is a C++ header.

It might make sense to put the definition of SEPARATOR and some variant
of the above the comments in patch 9 where you're adding this related
functionality?

It seems a little out of place here.

Thanks,

Darren.

>> 
>> Can you fuzz writing "FUZZ" in memory? Like:
>> OP_WRITE(0x100000, "UsingLibFUZZerString")?
>
> No.. Hopefully that's not a huge problem.
>
>> > +
>> > +enum cmds {
>> > +    OP_IN,
>> > +    OP_OUT,
>> > +    OP_READ,
>> > +    OP_WRITE,
>> > +    OP_CLOCK_STEP,
>> > +};
>> > +
>> > +#define DEFAULT_TIMEOUT_US 100000
>> > +#define USEC_IN_SEC 100000000
>> 
>> Are you sure this definition is correct?
>> 
> Thanks for the catch...
>
>> > +
>> > +typedef struct {
>> > +    ram_addr_t addr;
>> > +    ram_addr_t size; /* The number of bytes until the end of the I/O region */
>> > +} address_range;
>> > +
>> > +static useconds_t timeout = 100000;
>> [...]
>>
Alexander Bulekov Oct. 7, 2020, 1:39 p.m. UTC | #5
On 201001 1629, Darren Kenny wrote:
> Hi Alex,
> 
> On Monday, 2020-09-21 at 10:34:05 -04, Alexander Bulekov wrote:
> > On 200921 0743, Philippe Mathieu-Daudé wrote:
> >> Hi Alexander,
> >> 
> >> On 9/21/20 4:24 AM, Alexander Bulekov wrote:
> >> > This is a generic fuzzer designed to fuzz a virtual device's
> >> > MemoryRegions, as long as they exist within the Memory or Port IO (if it
> >> > exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
> >> > of qtest commands (outb, readw, etc). The interpreted commands are
> >> > separated by a magic seaparator, which should be easy for the fuzzer to
> >> > guess. Without ASan, the separator can be specified as a "dictionary
> >> > value" using the -dict argument (see libFuzzer documentation).
> >> > 
> >> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> >> > ---
> >> >  tests/qtest/fuzz/general_fuzz.c | 498 ++++++++++++++++++++++++++++++++
> >> >  tests/qtest/fuzz/meson.build    |   1 +
> >> >  2 files changed, 499 insertions(+)
> >> >  create mode 100644 tests/qtest/fuzz/general_fuzz.c
> >> > 
> >> > diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
> >> > new file mode 100644
> >> > index 0000000000..bf75b215ca
> >> > --- /dev/null
> >> > +++ b/tests/qtest/fuzz/general_fuzz.c
> >> > @@ -0,0 +1,498 @@
> >> > +/*
> >> > + * General Virtual-Device Fuzzing Target
> >> > + *
> >> > + * Copyright Red Hat Inc., 2020
> >> > + *
> >> > + * Authors:
> >> > + *  Alexander Bulekov   <alxndr@bu.edu>
> >> > + *
> >> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >> > + * See the COPYING file in the top-level directory.
> >> > + */
> >> > +
> >> > +#include "qemu/osdep.h"
> >> > +
> >> > +#include <wordexp.h>
> >> > +
> >> > +#include "hw/core/cpu.h"
> >> > +#include "tests/qtest/libqos/libqtest.h"
> >> > +#include "fuzz.h"
> >> > +#include "fork_fuzz.h"
> >> > +#include "exec/address-spaces.h"
> >> > +#include "string.h"
> >> > +#include "exec/memory.h"
> >> > +#include "exec/ramblock.h"
> >> > +#include "exec/address-spaces.h"
> >> > +#include "hw/qdev-core.h"
> >> > +
> >> > +/*
> >> > + * SEPARATOR is used to separate "operations" in the fuzz input
> >> > + */
> >> > +#define SEPARATOR "FUZZ"
> >> 
> >> Why use a separator when all pkt sizes are known?
> > Good point. 
> > 1. When we add the DMA Pattern OP in patch 04/16, we now have
> > variable-width OPs.
> > 2. Even when everything has a known size, take for example the input:
> > Acb Bd Caaaa Effff
> > Where Operation A has size 3, B: size 2, C size 5 ...:
> > Simply by removing the first byte, we now have a completely different
> > sequence of operations:
> > Cbbdc Aaa Aef Ff...
> > Thus the separators "add some stability" to random mutations:
> > Cb FUZZ Bd FUZZ Caaaa FUZZ Effff ...
> > (Cb is now invalid/ignored, but the rest of the commands are still
> > intact)
> > There is some libfuzzer documentation about this technique:
> > https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#magic-separator
> >
> > There is also a promising "FuzzDataProvider" header library that lets
> > you directly call functions, such as ConsumeBytes, or
> > ConsumeIntegralInRange, but unfortunately it is a C++ header.
> 
> It might make sense to put the definition of SEPARATOR and some variant
> of the above the comments in patch 9 where you're adding this related
> functionality?
> 
> It seems a little out of place here.
> 
> Thanks,
> 
> Darren.
> 

Hi Darren,
If I move the definition of SEPARATOR to Patch 9, I would need some
different way to parse commands here, to keep everything bisectable. I
don't think the separator is only important in the context of the
Crossover functionality (Patch 9) - it is useful in general as a
"stable" way to parse an input into multiple commands.
Is it OK if I keep SEPARATOR in this patch and add the comments you
mention to both this patch and patch 9?
Thanks
-Alex

> >> 
> >> Can you fuzz writing "FUZZ" in memory? Like:
> >> OP_WRITE(0x100000, "UsingLibFUZZerString")?
> >
> > No.. Hopefully that's not a huge problem.
> >
> >> > +
> >> > +enum cmds {
> >> > +    OP_IN,
> >> > +    OP_OUT,
> >> > +    OP_READ,
> >> > +    OP_WRITE,
> >> > +    OP_CLOCK_STEP,
> >> > +};
> >> > +
> >> > +#define DEFAULT_TIMEOUT_US 100000
> >> > +#define USEC_IN_SEC 100000000
> >> 
> >> Are you sure this definition is correct?
> >> 
> > Thanks for the catch...
> >
> >> > +
> >> > +typedef struct {
> >> > +    ram_addr_t addr;
> >> > +    ram_addr_t size; /* The number of bytes until the end of the I/O region */
> >> > +} address_range;
> >> > +
> >> > +static useconds_t timeout = 100000;
> >> [...]
> >>
Darren Kenny Oct. 7, 2020, 1:53 p.m. UTC | #6
On Wednesday, 2020-10-07 at 09:39:32 -04, Alexander Bulekov wrote:
> On 201001 1629, Darren Kenny wrote:

...

>>
>> It might make sense to put the definition of SEPARATOR and some variant
>> of the above the comments in patch 9 where you're adding this related
>> functionality?
>> 
>> It seems a little out of place here.
>> 
>> Thanks,
>> 
>> Darren.
>> 
>
> Hi Darren,
> If I move the definition of SEPARATOR to Patch 9, I would need some
> different way to parse commands here, to keep everything bisectable. I
> don't think the separator is only important in the context of the
> Crossover functionality (Patch 9) - it is useful in general as a
> "stable" way to parse an input into multiple commands.
> Is it OK if I keep SEPARATOR in this patch and add the comments you
> mention to both this patch and patch 9?

Sounds fine, it was just a suggestion since I hadn't seen it being used
in this file, but maybe I missed something.

Thanks,

Darren.

> Thanks
> -Alex
>
>> >> 
>> >> Can you fuzz writing "FUZZ" in memory? Like:
>> >> OP_WRITE(0x100000, "UsingLibFUZZerString")?
>> >
>> > No.. Hopefully that's not a huge problem.
>> >
>> >> > +
>> >> > +enum cmds {
>> >> > +    OP_IN,
>> >> > +    OP_OUT,
>> >> > +    OP_READ,
>> >> > +    OP_WRITE,
>> >> > +    OP_CLOCK_STEP,
>> >> > +};
>> >> > +
>> >> > +#define DEFAULT_TIMEOUT_US 100000
>> >> > +#define USEC_IN_SEC 100000000
>> >> 
>> >> Are you sure this definition is correct?
>> >> 
>> > Thanks for the catch...
>> >
>> >> > +
>> >> > +typedef struct {
>> >> > +    ram_addr_t addr;
>> >> > +    ram_addr_t size; /* The number of bytes until the end of the I/O region */
>> >> > +} address_range;
>> >> > +
>> >> > +static useconds_t timeout = 100000;
>> >> [...]
>> >>
Paolo Bonzini Oct. 8, 2020, 7:03 a.m. UTC | #7
On 21/09/20 16:34, Alexander Bulekov wrote:
>> Can you fuzz writing "FUZZ" in memory? Like:
>> OP_WRITE(0x100000, "UsingLibFUZZerString")?
> No.. Hopefully that's not a huge problem.
> 

Instead of always looking for a separator, can you:

1) skip over it if you find it naturally at the end of a command (that
is, "FUZZ" is like a comment command)

2) actively search for it only if you stumble upon an unrecognized command?

In that case, if you have

  AbcFUZZD0x100000UsingLibFUZZerFUZZ

The first and third instances would be ignored, while the second would
be part of the input.  On the other hand if you have

  bcFUZZD0x100000UsingLibFUZZerFUZZ

"b" is an invalid command and therefore you'd skip directly to "D".

Paolo
Paolo Bonzini Oct. 8, 2020, 7:04 a.m. UTC | #8
On 22/09/20 16:03, Alexander Bulekov wrote:
>> +        if ((memory_region_is_ram(mr) ||
>> +            memory_region_is_ram_device(mr) ||
>> +            memory_region_is_rom(mr) ||
>> +            memory_region_is_romd(mr)) == false) {
>> +            name = object_get_canonical_path_component(child);
> This isn't a great way to check whether MRs have ops with code that is
> interesting to fuzz (for example the pflash MemoryRegions do not pass
> these checks, so you can't fuzz the pflash device). Need to think of
> some better checks to identify MRs that we are interested in fuzzing.

I think you can simply remove the ROMD check.

Paolo
Alexander Bulekov Oct. 11, 2020, 3:35 p.m. UTC | #9
On 201008 0903, Paolo Bonzini wrote:
> On 21/09/20 16:34, Alexander Bulekov wrote:
> >> Can you fuzz writing "FUZZ" in memory? Like:
> >> OP_WRITE(0x100000, "UsingLibFUZZerString")?
> > No.. Hopefully that's not a huge problem.
> > 
> 
> Instead of always looking for a separator, can you:
> 
> 1) skip over it if you find it naturally at the end of a command (that
> is, "FUZZ" is like a comment command)
> 
> 2) actively search for it only if you stumble upon an unrecognized command?
> 

What is the end goal? Is it to be able to use the "FUZZ" bytes to fuzz
devices?
My concern is that we want to keep the "stability" added by the FUZZ
separators (ie removing a single byte shouldn't completely change the
sequence of operations).

> In that case, if you have
> 
>   AbcFUZZD0x100000UsingLibFUZZerFUZZ
> 
> The first and third instances would be ignored, while the second would
> be part of the input.  On the other hand if you have
> 
>   bcFUZZD0x100000UsingLibFUZZerFUZZ
> 
> "b" is an invalid command and therefore you'd skip directly to "D".

There aren't any invalid OPCodes, since we interpret the opcode modulo
the size of the OPcode table. We only have invalid/skipped commands when
there isn't enough data after the opcode to figure out what we should do.

> 
> Paolo
>
Paolo Bonzini Oct. 12, 2020, 7:02 a.m. UTC | #10
On 11/10/20 17:35, Alexander Bulekov wrote:
>> Instead of always looking for a separator, can you:
>>
>> 1) skip over it if you find it naturally at the end of a command (that
>> is, "FUZZ" is like a comment command)
>>
>> 2) actively search for it only if you stumble upon an unrecognized command?
>>
> What is the end goal? Is it to be able to use the "FUZZ" bytes to fuzz
> devices?

Yes, possibly, and perhaps also using a shorter separator that is easier
to learn.  But if the dictionary is enough to work around the learning
time and it's unlikely that crossover produces inputs like that, I guess
it's okay to have the separator.

Paolo
diff mbox series

Patch

diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
new file mode 100644
index 0000000000..bf75b215ca
--- /dev/null
+++ b/tests/qtest/fuzz/general_fuzz.c
@@ -0,0 +1,498 @@ 
+/*
+ * General Virtual-Device Fuzzing Target
+ *
+ * Copyright Red Hat Inc., 2020
+ *
+ * Authors:
+ *  Alexander Bulekov   <alxndr@bu.edu>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include <wordexp.h>
+
+#include "hw/core/cpu.h"
+#include "tests/qtest/libqos/libqtest.h"
+#include "fuzz.h"
+#include "fork_fuzz.h"
+#include "exec/address-spaces.h"
+#include "string.h"
+#include "exec/memory.h"
+#include "exec/ramblock.h"
+#include "exec/address-spaces.h"
+#include "hw/qdev-core.h"
+
+/*
+ * SEPARATOR is used to separate "operations" in the fuzz input
+ */
+#define SEPARATOR "FUZZ"
+
+enum cmds {
+    OP_IN,
+    OP_OUT,
+    OP_READ,
+    OP_WRITE,
+    OP_CLOCK_STEP,
+};
+
+#define DEFAULT_TIMEOUT_US 100000
+#define USEC_IN_SEC 100000000
+
+typedef struct {
+    ram_addr_t addr;
+    ram_addr_t size; /* The number of bytes until the end of the I/O region */
+} address_range;
+
+static useconds_t timeout = 100000;
+
+static bool qtest_log_enabled;
+
+/*
+ * List of memory regions that are children of QOM objects specified by the
+ * user for fuzzing.
+ */
+static GHashTable *fuzzable_memoryregions;
+
+struct get_io_cb_info {
+    int index;
+    int found;
+    address_range result;
+};
+
+static int get_io_address_cb(ram_addr_t start, ram_addr_t size,
+                          const MemoryRegion *mr, void *opaque) {
+    struct get_io_cb_info *info = opaque;
+    if (g_hash_table_lookup(fuzzable_memoryregions, mr)) {
+        if (info->index == 0) {
+            info->result.addr = start;
+            info->result.size = size;
+            info->found = 1;
+            return 1;
+        }
+        info->index--;
+    }
+    return 0;
+}
+
+/*
+ * Here we want to convert a fuzzer-provided [io-region-index, offset] to
+ * a physical address. To do this, we iterate over all of the matched
+ * MemoryRegions. Check whether each region exists within the particular io
+ * space. Return the absolute address of the offset within the index'th region
+ * that is a subregion of the io_space and the distance until the end of the
+ * memory region.
+ */
+static bool get_io_address(address_range *result, AddressSpace *as,
+                            uint8_t index,
+                            uint32_t offset) {
+    FlatView *view;
+    view = as->current_map;
+    g_assert(view);
+    struct get_io_cb_info cb_info = {};
+
+    cb_info.index = index;
+
+    /*
+     * Loop around the FlatView until we match "index" number of
+     * fuzzable_memoryregions, or until we know that there are no matching
+     * memory_regions.
+     */
+    do {
+        flatview_for_each_range(view, get_io_address_cb , &cb_info);
+    } while (cb_info.index != index && !cb_info.found);
+
+    *result = cb_info.result;
+    return cb_info.found;
+}
+static bool get_pio_address(address_range *result,
+                            uint8_t index, uint16_t offset)
+{
+    /*
+     * PIO BARs can be set past the maximum port address (0xFFFF). Thus, result
+     * can contain an addr that extends past the PIO space. When we pass this
+     * address to qtest_in/qtest_out, it is cast to a uint16_t, so we might end
+     * up fuzzing a completely different MemoryRegion/Device. Therefore, check
+     * that the address here is within the PIO space limits.
+     */
+    bool found = get_io_address(result, &address_space_io, index, offset);
+    return result->addr <= 0xFFFF ? found : false;
+}
+static bool get_mmio_address(address_range *result,
+                             uint8_t index, uint32_t offset)
+{
+    return get_io_address(result, &address_space_memory, index, offset);
+}
+
+static void op_in(QTestState *s, const unsigned char * data, size_t len)
+{
+    enum Sizes {Byte, Word, Long, end_sizes};
+    struct {
+        uint8_t size;
+        uint8_t base;
+        uint16_t offset;
+    } a;
+    address_range abs;
+
+    if (len < sizeof(a)) {
+        return;
+    }
+    memcpy(&a, data, sizeof(a));
+    if (get_pio_address(&abs, a.base, a.offset) == 0) {
+        return;
+    }
+
+    switch (a.size %= end_sizes) {
+    case Byte:
+        qtest_inb(s, abs.addr);
+        break;
+    case Word:
+        if (abs.size >= 2) {
+            qtest_inw(s, abs.addr);
+        }
+        break;
+    case Long:
+        if (abs.size >= 4) {
+            qtest_inl(s, abs.addr);
+        }
+        break;
+    }
+}
+
+static void op_out(QTestState *s, const unsigned char * data, size_t len)
+{
+    enum Sizes {Byte, Word, Long, end_sizes};
+    struct {
+        uint8_t size;
+        uint8_t base;
+        uint16_t offset;
+        uint32_t value;
+    } a;
+    address_range abs;
+
+    if (len < sizeof(a)) {
+        return;
+    }
+    memcpy(&a, data, sizeof(a));
+
+    if (get_pio_address(&abs, a.base, a.offset) == 0) {
+        return;
+    }
+
+    switch (a.size %= end_sizes) {
+    case Byte:
+        qtest_outb(s, abs.addr, a.value & 0xFF);
+        break;
+    case Word:
+        if (abs.size >= 2) {
+            qtest_outw(s, abs.addr, a.value & 0xFFFF);
+        }
+        break;
+    case Long:
+        if (abs.size >= 4) {
+            qtest_outl(s, abs.addr, a.value);
+        }
+        break;
+    }
+}
+
+static void op_read(QTestState *s, const unsigned char * data, size_t len)
+{
+    enum Sizes {Byte, Word, Long, Quad, end_sizes};
+    struct {
+        uint8_t size;
+        uint8_t base;
+        uint32_t offset;
+    } a;
+    address_range abs;
+
+    if (len < sizeof(a)) {
+        return;
+    }
+    memcpy(&a, data, sizeof(a));
+
+    if (get_mmio_address(&abs, a.base, a.offset) == 0) {
+        return;
+    }
+
+    switch (a.size %= end_sizes) {
+    case Byte:
+        qtest_readb(s, abs.addr);
+        break;
+    case Word:
+        if (abs.size >= 2) {
+            qtest_readw(s, abs.addr);
+        }
+        break;
+    case Long:
+        if (abs.size >= 4) {
+            qtest_readl(s, abs.addr);
+        }
+        break;
+    case Quad:
+        if (abs.size >= 8) {
+            qtest_readq(s, abs.addr);
+        }
+        break;
+    }
+}
+
+static void op_write(QTestState *s, const unsigned char * data, size_t len)
+{
+    enum Sizes {Byte, Word, Long, Quad, end_sizes};
+    struct {
+        uint8_t size;
+        uint8_t base;
+        uint32_t offset;
+        uint64_t value;
+    } a;
+    address_range abs;
+
+    if (len < sizeof(a)) {
+        return;
+    }
+    memcpy(&a, data, sizeof(a));
+
+    if (get_mmio_address(&abs, a.base, a.offset) == 0) {
+        return;
+    }
+
+    switch (a.size %= end_sizes) {
+    case Byte:
+            qtest_writeb(s, abs.addr, a.value & 0xFF);
+        break;
+    case Word:
+        if (abs.size >= 2) {
+            qtest_writew(s, abs.addr, a.value & 0xFFFF);
+        }
+        break;
+    case Long:
+        if (abs.size >= 4) {
+            qtest_writel(s, abs.addr, a.value & 0xFFFFFFFF);
+        }
+        break;
+    case Quad:
+        if (abs.size >= 8) {
+            qtest_writeq(s, abs.addr, a.value);
+        }
+        break;
+    }
+}
+static void op_clock_step(QTestState *s, const unsigned char *data, size_t len)
+{
+    qtest_clock_step_next(s);
+}
+
+static void handle_timeout(int sig)
+{
+    if (qtest_log_enabled) {
+        fprintf(stderr, "[Timeout]\n");
+        fflush(stderr);
+    }
+    _Exit(0);
+}
+
+/*
+ * Here, we interpret random bytes from the fuzzer, as a sequence of commands.
+ * Our commands are variable-width, so we use a separator, SEPARATOR, to specify
+ * the boundaries between commands. This is just a random 32-bit value, which
+ * is easily identified by libfuzzer+AddressSanitizer, as long as we use
+ * memmem. It can also be included in the fuzzer's dictionary. More details
+ * here:
+ * https://github.com/google/fuzzing/blob/master/docs/split-inputs.md
+ *
+ * As a result, the stream of bytes is converted into a sequence of commands.
+ * In a simplified example where SEPARATOR is 0xFF:
+ * 00 01 02 FF 03 04 05 06 FF 01 FF ...
+ * becomes this sequence of commands:
+ * 00 01 02    -> op00 (0102)   -> in (0102, 2)
+ * 03 04 05 06 -> op03 (040506) -> write (040506, 3)
+ * 01          -> op01 (-,0)    -> out (-,0)
+ * ...
+ *
+ * Note here that it is the job of the individual opcode functions to check
+ * that enough data was provided. I.e. in the last command out (,0), out needs
+ * to check that there is not enough data provided to select an address/value
+ * for the operation.
+ */
+static void general_fuzz(QTestState *s, const unsigned char *Data, size_t Size)
+{
+    void (*ops[]) (QTestState *s, const unsigned char* , size_t) = {
+        [OP_IN]                 = op_in,
+        [OP_OUT]                = op_out,
+        [OP_READ]               = op_read,
+        [OP_WRITE]              = op_write,
+        [OP_CLOCK_STEP]         = op_clock_step,
+    };
+    const unsigned char *cmd = Data;
+    const unsigned char *nextcmd;
+    size_t cmd_len;
+    uint8_t op;
+
+    if (fork() == 0) {
+        /*
+         * Sometimes the fuzzer will find inputs that take quite a long time to
+         * process. Often times, these inputs do not result in new coverage.
+         * Even if these inputs might be interesting, they can slow down the
+         * fuzzer, overall. Set a timeout to avoid hurting performance, too much
+         */
+        if (timeout) {
+            struct sigaction sact;
+            struct itimerval timer;
+
+            sigemptyset(&sact.sa_mask);
+            sact.sa_flags   = SA_NODEFER;
+            sact.sa_handler = handle_timeout;
+            sigaction(SIGALRM, &sact, NULL);
+
+            memset(&timer, 0, sizeof(timer));
+            timer.it_value.tv_sec = timeout / USEC_IN_SEC;
+            timer.it_value.tv_usec = timeout % USEC_IN_SEC;
+            setitimer(ITIMER_VIRTUAL, &timer, NULL);
+        }
+
+        while (cmd && Size) {
+            /* Get the length until the next command or end of input */
+            nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
+            cmd_len = nextcmd ? nextcmd - cmd : Size;
+
+            if (cmd_len > 0) {
+                /* Interpret the first byte of the command as an opcode */
+                op = *cmd % (sizeof(ops) / sizeof((ops)[0]));
+                ops[op](s, cmd + 1, cmd_len - 1);
+
+                /* Run the main loop */
+                flush_events(s);
+            }
+            /* Advance to the next command */
+            cmd = nextcmd ? nextcmd + sizeof(SEPARATOR) - 1 : nextcmd;
+            Size = Size - (cmd_len + sizeof(SEPARATOR) - 1);
+        }
+        _Exit(0);
+    } else {
+        flush_events(s);
+        wait(0);
+    }
+}
+
+static void usage(void)
+{
+    printf("Please specify the following environment variables:\n");
+    printf("QEMU_FUZZ_ARGS= the command line arguments passed to qemu\n");
+    printf("QEMU_FUZZ_OBJECTS= "
+            "a space separated list of QOM type names for objects to fuzz\n");
+    printf("Optionally: QEMU_FUZZ_TIMEOUT= Specify a custom timeout (us). "
+            "0 to disable. %d by default\n", timeout);
+    exit(0);
+}
+
+static int locate_fuzz_memory_regions(Object *child, void *opaque)
+{
+    const char *name;
+    MemoryRegion *mr;
+    if (object_dynamic_cast(child, TYPE_MEMORY_REGION)) {
+        mr = MEMORY_REGION(child);
+        if ((memory_region_is_ram(mr) ||
+            memory_region_is_ram_device(mr) ||
+            memory_region_is_rom(mr) ||
+            memory_region_is_romd(mr)) == false) {
+            name = object_get_canonical_path_component(child);
+            /*
+             * We don't want duplicate pointers to the same MemoryRegion, so
+             * try to remove copies of the pointer, before adding it.
+             */
+            g_hash_table_insert(fuzzable_memoryregions, mr, (gpointer)true);
+        }
+    }
+    return 0;
+}
+static int locate_fuzz_objects(Object *child, void *opaque)
+{
+    char *pattern = opaque;
+    if (g_pattern_match_simple(pattern, object_get_typename(child))) {
+        /* Find and save ptrs to any child MemoryRegions */
+        object_child_foreach_recursive(child, locate_fuzz_memory_regions, NULL);
+
+    } else if (object_dynamic_cast(OBJECT(child), TYPE_MEMORY_REGION)) {
+        if (g_pattern_match_simple(pattern,
+            object_get_canonical_path_component(child))) {
+            MemoryRegion *mr;
+            mr = MEMORY_REGION(child);
+            if ((memory_region_is_ram(mr) ||
+                 memory_region_is_ram_device(mr) ||
+                 memory_region_is_rom(mr) ||
+                 memory_region_is_romd(mr)) == false) {
+                g_hash_table_insert(fuzzable_memoryregions, mr, (gpointer)true);
+            }
+        }
+    }
+    return 0;
+}
+
+static void general_pre_fuzz(QTestState *s)
+{
+    GHashTableIter iter;
+    MemoryRegion *mr;
+    char **result;
+
+    if (!getenv("QEMU_FUZZ_OBJECTS")) {
+        usage();
+    }
+    if (getenv("QTEST_LOG")) {
+        qtest_log_enabled = 1;
+    }
+    if (getenv("QEMU_FUZZ_TIMEOUT")) {
+        timeout = g_ascii_strtoll(getenv("QEMU_FUZZ_TIMEOUT"), NULL, 0);
+    }
+
+    fuzzable_memoryregions = g_hash_table_new(NULL, NULL);
+
+    result = g_strsplit(getenv("QEMU_FUZZ_OBJECTS"), " ", -1);
+    for (int i = 0; result[i] != NULL; i++) {
+        printf("Matching objects by name %s\n", result[i]);
+        object_child_foreach_recursive(qdev_get_machine(),
+                                    locate_fuzz_objects,
+                                    result[i]);
+    }
+    g_strfreev(result);
+    printf("This process will try to fuzz the following MemoryRegions:\n");
+
+    g_hash_table_iter_init(&iter, fuzzable_memoryregions);
+    while (g_hash_table_iter_next(&iter, (gpointer)&mr, NULL)) {
+        printf("  * %s (size %lx)\n",
+               object_get_canonical_path_component(&(mr->parent_obj)),
+               mr->addr);
+    }
+
+    if (!g_hash_table_size(fuzzable_memoryregions)) {
+        printf("No fuzzable memory regions found...\n");
+        exit(1);
+    }
+
+    counter_shm_init();
+}
+static GString *general_fuzz_cmdline(FuzzTarget *t)
+{
+    GString *cmd_line = g_string_new(TARGET_NAME);
+    if (!getenv("QEMU_FUZZ_ARGS")) {
+        usage();
+    }
+    g_string_append_printf(cmd_line, " -display none \
+                                      -machine accel=qtest, \
+                                      -m 64 %s ", getenv("QEMU_FUZZ_ARGS"));
+    return cmd_line;
+}
+
+static void register_general_fuzz_targets(void)
+{
+    fuzz_add_target(&(FuzzTarget){
+            .name = "general-fuzz",
+            .description = "Fuzz based on any qemu command-line args. ",
+            .get_init_cmdline = general_fuzz_cmdline,
+            .pre_fuzz = general_pre_fuzz,
+            .fuzz = general_fuzz});
+}
+
+fuzz_target_init(register_general_fuzz_targets);
diff --git a/tests/qtest/fuzz/meson.build b/tests/qtest/fuzz/meson.build
index b31ace7d5a..a59de6aa8c 100644
--- a/tests/qtest/fuzz/meson.build
+++ b/tests/qtest/fuzz/meson.build
@@ -5,6 +5,7 @@  specific_fuzz_ss.add(files('fuzz.c', 'fork_fuzz.c', 'qos_fuzz.c',
 specific_fuzz_ss.add(when: 'CONFIG_I440FX', if_true: files('i440fx_fuzz.c'))
 specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio_net_fuzz.c'))
 specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_SCSI', if_true: files('virtio_scsi_fuzz.c'))
+specific_fuzz_ss.add(files('general_fuzz.c'))
 
 fork_fuzz = declare_dependency(
   link_args: config_host['FUZZ_EXE_LDFLAGS'].split() +