mbox series

[0/3] xen/domain: More structured teardown

Message ID 20201221181446.7791-1-andrew.cooper3@citrix.com (mailing list archive)
Headers show
Series xen/domain: More structured teardown | expand

Message

Andrew Cooper Dec. 21, 2020, 6:14 p.m. UTC
Follow up from both the investigation leading to "Fix memory leak in
vcpu_create() error path", and the older work to deduplicate the domain
teardown paths.

Andrew Cooper (3):
  xen/domain: Reorder trivial initialisation in early domain_create()
  xen/domain: Introduce domain_teardown()
  xen/evtchn: Clean up teardown handling

 xen/common/domain.c        | 113 ++++++++++++++++++++++++++++++++++-----------
 xen/common/event_channel.c |   8 ++--
 xen/include/xen/sched.h    |  12 ++++-
 3 files changed, 101 insertions(+), 32 deletions(-)

Comments

Andrew Cooper Dec. 21, 2020, 7:36 p.m. UTC | #1
Hello,

We have some very complicated hypercalls, createdomain, and max_vcpus a
close second, with immense complexity, and very hard-to-test error handling.

It is no surprise that the error handling is riddled with bugs.

Random failures from core functions is one way, but I'm not sure that
will be especially helpful.  In particular, we'd need a way to exclude
"dom0 critical" operations so we've got a usable system to run testing on.

As an alternative, how about adding a fault_ttl field into the hypercall?

The exact paths taken in {domain,vcpu}_create() are sensitive to the
hardware, Xen Kconfig, and other parameters passed into the
hypercall(s).  The testing logic doesn't really want to care about what
failed; simply that the error was handled correctly.

So a test for this might look like:

cfg = { ... };
while ( xc_create_domain(xch, cfg) < 0 )
    cfg.fault_ttl++;


The pro's of this approach is that for a specific build of Xen on a
piece of hardware, it ought to check every failure path in
domain_create(), until the ttl finally gets higher than the number of
fail-able actions required to construct a domain.  Also, the test
doesn't need changing as the complexity of domain_create() changes.

The main con will mostly likely be the invasiveness of code in Xen, but
I suppose any fault injection is going to be invasive to a certain extent.

Fault injection like this would also want pairing with some other plans
I had for dalloc() & friends which wrap the current allocators, and
count (in struct domain) the number and/or size of domain allocations,
so we can a) check for leaks, and b) report how much memory a domain
object (and all its decendent objects) actually takes (seeing as we
don't know this value at all).

Thoughts?

~Andrew
Jan Beulich Dec. 22, 2020, 10 a.m. UTC | #2
On 21.12.2020 20:36, Andrew Cooper wrote:
> Hello,
> 
> We have some very complicated hypercalls, createdomain, and max_vcpus a
> close second, with immense complexity, and very hard-to-test error handling.
> 
> It is no surprise that the error handling is riddled with bugs.
> 
> Random failures from core functions is one way, but I'm not sure that
> will be especially helpful.  In particular, we'd need a way to exclude
> "dom0 critical" operations so we've got a usable system to run testing on.
> 
> As an alternative, how about adding a fault_ttl field into the hypercall?
> 
> The exact paths taken in {domain,vcpu}_create() are sensitive to the
> hardware, Xen Kconfig, and other parameters passed into the
> hypercall(s).  The testing logic doesn't really want to care about what
> failed; simply that the error was handled correctly.
> 
> So a test for this might look like:
> 
> cfg = { ... };
> while ( xc_create_domain(xch, cfg) < 0 )
>     cfg.fault_ttl++;
> 
> 
> The pro's of this approach is that for a specific build of Xen on a
> piece of hardware, it ought to check every failure path in
> domain_create(), until the ttl finally gets higher than the number of
> fail-able actions required to construct a domain.  Also, the test
> doesn't need changing as the complexity of domain_create() changes.
> 
> The main con will mostly likely be the invasiveness of code in Xen, but
> I suppose any fault injection is going to be invasive to a certain extent.

While I like the idea in principle, the innocent looking

cfg = { ... };

is quite a bit of a concern here as well: Depending on the precise
settings, paths taken in the hypervisor may heavily vary, and hence
such a test will only end up being useful if it covers a wide
variety of settings. Even if the number of tests to execute turned
out to still be manageable today, it may quickly turn out not
sufficiently scalable as we add new settings controllable right at
domain creation (which I understand is the plan).

Jan
Andrew Cooper Dec. 22, 2020, 11:14 a.m. UTC | #3
On 22/12/2020 10:00, Jan Beulich wrote:
> On 21.12.2020 20:36, Andrew Cooper wrote:
>> Hello,
>>
>> We have some very complicated hypercalls, createdomain, and max_vcpus a
>> close second, with immense complexity, and very hard-to-test error handling.
>>
>> It is no surprise that the error handling is riddled with bugs.
>>
>> Random failures from core functions is one way, but I'm not sure that
>> will be especially helpful.  In particular, we'd need a way to exclude
>> "dom0 critical" operations so we've got a usable system to run testing on.
>>
>> As an alternative, how about adding a fault_ttl field into the hypercall?
>>
>> The exact paths taken in {domain,vcpu}_create() are sensitive to the
>> hardware, Xen Kconfig, and other parameters passed into the
>> hypercall(s).  The testing logic doesn't really want to care about what
>> failed; simply that the error was handled correctly.
>>
>> So a test for this might look like:
>>
>> cfg = { ... };
>> while ( xc_create_domain(xch, cfg) < 0 )
>>     cfg.fault_ttl++;
>>
>>
>> The pro's of this approach is that for a specific build of Xen on a
>> piece of hardware, it ought to check every failure path in
>> domain_create(), until the ttl finally gets higher than the number of
>> fail-able actions required to construct a domain.  Also, the test
>> doesn't need changing as the complexity of domain_create() changes.
>>
>> The main con will mostly likely be the invasiveness of code in Xen, but
>> I suppose any fault injection is going to be invasive to a certain extent.
> While I like the idea in principle, the innocent looking
>
> cfg = { ... };
>
> is quite a bit of a concern here as well: Depending on the precise
> settings, paths taken in the hypervisor may heavily vary, and hence
> such a test will only end up being useful if it covers a wide
> variety of settings. Even if the number of tests to execute turned
> out to still be manageable today, it may quickly turn out not
> sufficiently scalable as we add new settings controllable right at
> domain creation (which I understand is the plan).

Well - there are two aspects here.

First, 99% of all VMs in practice are one of 3 or 4 configurations.  An
individual configuration is O(n) time complexity to test with fault_ttl,
depending on the size of Xen's logic, and we absolutely want to be able
to test these deterministically and to completion.

For the plethora of other configurations, I agree that it is infeasible
to test them all.  However, a hypercall like this is easy to wire up
into a fuzzing harness.

TBH, I was thinking of something like
https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen
and XTF "dom0" poking just this hypercall.  All the other complicated
bits of wiring AFL up appear to have been done.

Perhaps when we exhaust that as a source of bugs, we move onto fuzzing
the L0 Xen, because running on native will give it more paths to
explore.  We'd need some way of reporting path/trace data back to AFL in
dom0 which might require a bit plumbing.

~Andrew
Tamas K Lengyel Dec. 22, 2020, 3:47 p.m. UTC | #4
On Tue, Dec 22, 2020 at 6:14 AM Andrew Cooper <andrew.cooper3@citrix.com>
wrote:

> On 22/12/2020 10:00, Jan Beulich wrote:
> > On 21.12.2020 20:36, Andrew Cooper wrote:
> >> Hello,
> >>
> >> We have some very complicated hypercalls, createdomain, and max_vcpus a
> >> close second, with immense complexity, and very hard-to-test error
> handling.
> >>
> >> It is no surprise that the error handling is riddled with bugs.
> >>
> >> Random failures from core functions is one way, but I'm not sure that
> >> will be especially helpful.  In particular, we'd need a way to exclude
> >> "dom0 critical" operations so we've got a usable system to run testing
> on.
> >>
> >> As an alternative, how about adding a fault_ttl field into the
> hypercall?
> >>
> >> The exact paths taken in {domain,vcpu}_create() are sensitive to the
> >> hardware, Xen Kconfig, and other parameters passed into the
> >> hypercall(s).  The testing logic doesn't really want to care about what
> >> failed; simply that the error was handled correctly.
> >>
> >> So a test for this might look like:
> >>
> >> cfg = { ... };
> >> while ( xc_create_domain(xch, cfg) < 0 )
> >>     cfg.fault_ttl++;
> >>
> >>
> >> The pro's of this approach is that for a specific build of Xen on a
> >> piece of hardware, it ought to check every failure path in
> >> domain_create(), until the ttl finally gets higher than the number of
> >> fail-able actions required to construct a domain.  Also, the test
> >> doesn't need changing as the complexity of domain_create() changes.
> >>
> >> The main con will mostly likely be the invasiveness of code in Xen, but
> >> I suppose any fault injection is going to be invasive to a certain
> extent.
> > While I like the idea in principle, the innocent looking
> >
> > cfg = { ... };
> >
> > is quite a bit of a concern here as well: Depending on the precise
> > settings, paths taken in the hypervisor may heavily vary, and hence
> > such a test will only end up being useful if it covers a wide
> > variety of settings. Even if the number of tests to execute turned
> > out to still be manageable today, it may quickly turn out not
> > sufficiently scalable as we add new settings controllable right at
> > domain creation (which I understand is the plan).
>
> Well - there are two aspects here.
>
> First, 99% of all VMs in practice are one of 3 or 4 configurations.  An
> individual configuration is O(n) time complexity to test with fault_ttl,
> depending on the size of Xen's logic, and we absolutely want to be able
> to test these deterministically and to completion.
>
> For the plethora of other configurations, I agree that it is infeasible
> to test them all.  However, a hypercall like this is easy to wire up
> into a fuzzing harness.
>
> TBH, I was thinking of something like
> https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen
> and XTF "dom0" poking just this hypercall.  All the other complicated
> bits of wiring AFL up appear to have been done.
>
> Perhaps when we exhaust that as a source of bugs, we move onto fuzzing
> the L0 Xen, because running on native will give it more paths to
> explore.  We'd need some way of reporting path/trace data back to AFL in
> dom0 which might require a bit plumbing.


This is a pretty cool idea, I would be very interested in trying this out.
If running Xen nested in a HVM domain is possible (my experiments with
nested setups using Xen have only worked on ancient hw last time I tried)
then running the fuzzer would be entirely possible using VM forks. You
don't even need a special "dom0", you could just add the fuzzer's CPUID
harness to Xen's hypercall handler and the only thing needed from the
nested dom0 would be to trigger the hypercall with a normal config. The
fuzzer would take it from there and replace the config with the fuzzed
version directly in VM forks. Defining what to report as a "crash" to AFL
would still need to be defined manually for Xen as the current sink points
are Linux specific (
https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h),
but that should be straight forward.

Also, running the fuzzer with PVH guests hasn't been tested but since all
VM forking needs is EPT it should work.

Tamas
Andrew Cooper Dec. 22, 2020, 5:17 p.m. UTC | #5
On 22/12/2020 15:47, Tamas K Lengyel wrote:
>
>
> On Tue, Dec 22, 2020 at 6:14 AM Andrew Cooper
> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote:
>
>     On 22/12/2020 10:00, Jan Beulich wrote:
>     > On 21.12.2020 20:36, Andrew Cooper wrote:
>     >> Hello,
>     >>
>     >> We have some very complicated hypercalls, createdomain, and
>     max_vcpus a
>     >> close second, with immense complexity, and very hard-to-test
>     error handling.
>     >>
>     >> It is no surprise that the error handling is riddled with bugs.
>     >>
>     >> Random failures from core functions is one way, but I'm not
>     sure that
>     >> will be especially helpful.  In particular, we'd need a way to
>     exclude
>     >> "dom0 critical" operations so we've got a usable system to run
>     testing on.
>     >>
>     >> As an alternative, how about adding a fault_ttl field into the
>     hypercall?
>     >>
>     >> The exact paths taken in {domain,vcpu}_create() are sensitive
>     to the
>     >> hardware, Xen Kconfig, and other parameters passed into the
>     >> hypercall(s).  The testing logic doesn't really want to care
>     about what
>     >> failed; simply that the error was handled correctly.
>     >>
>     >> So a test for this might look like:
>     >>
>     >> cfg = { ... };
>     >> while ( xc_create_domain(xch, cfg) < 0 )
>     >>     cfg.fault_ttl++;
>     >>
>     >>
>     >> The pro's of this approach is that for a specific build of Xen on a
>     >> piece of hardware, it ought to check every failure path in
>     >> domain_create(), until the ttl finally gets higher than the
>     number of
>     >> fail-able actions required to construct a domain.  Also, the test
>     >> doesn't need changing as the complexity of domain_create() changes.
>     >>
>     >> The main con will mostly likely be the invasiveness of code in
>     Xen, but
>     >> I suppose any fault injection is going to be invasive to a
>     certain extent.
>     > While I like the idea in principle, the innocent looking
>     >
>     > cfg = { ... };
>     >
>     > is quite a bit of a concern here as well: Depending on the precise
>     > settings, paths taken in the hypervisor may heavily vary, and hence
>     > such a test will only end up being useful if it covers a wide
>     > variety of settings. Even if the number of tests to execute turned
>     > out to still be manageable today, it may quickly turn out not
>     > sufficiently scalable as we add new settings controllable right at
>     > domain creation (which I understand is the plan).
>
>     Well - there are two aspects here.
>
>     First, 99% of all VMs in practice are one of 3 or 4
>     configurations.  An
>     individual configuration is O(n) time complexity to test with
>     fault_ttl,
>     depending on the size of Xen's logic, and we absolutely want to be
>     able
>     to test these deterministically and to completion.
>
>     For the plethora of other configurations, I agree that it is
>     infeasible
>     to test them all.  However, a hypercall like this is easy to wire up
>     into a fuzzing harness.
>
>     TBH, I was thinking of something like
>     https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen
>     and XTF "dom0" poking just this hypercall.  All the other complicated
>     bits of wiring AFL up appear to have been done.
>
>     Perhaps when we exhaust that as a source of bugs, we move onto fuzzing
>     the L0 Xen, because running on native will give it more paths to
>     explore.  We'd need some way of reporting path/trace data back to
>     AFL in
>     dom0 which might require a bit plumbing.
>
>
> This is a pretty cool idea, I would be very interested in trying this
> out. If running Xen nested in a HVM domain is possible (my experiments
> with nested setups using Xen have only worked on ancient hw last time
> I tried) then running the fuzzer would be entirely possible using VM
> forks. You don't even need a special "dom0", you could just add the
> fuzzer's CPUID harness to Xen's hypercall handler and the only thing
> needed from the nested dom0 would be to trigger the hypercall with a
> normal config. The fuzzer would take it from there and replace the
> config with the fuzzed version directly in VM forks. Defining what to
> report as a "crash" to AFL would still need to be defined manually for
> Xen as the current sink points are Linux specific
> (https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h),
> but that should be straight forward.
>
> Also, running the fuzzer with PVH guests hasn't been tested but since
> all VM forking needs is EPT it should work.

Xen running inside Xen definitely works, and is even supported as far as
PV-Shim goes (i.e. no nested virt).  That would limit testing to just
creation of PV guests at L1, which is plenty to get started with.

Xen nested under Xen does work to a first approximation, and for the
purposes of fuzzing areas other than the nested-virt logic, might even
be ok.  (I use this configuration for a fair chunk of hvm development).

~Andrew
Tamas K Lengyel Dec. 22, 2020, 6:24 p.m. UTC | #6
On Tue, Dec 22, 2020 at 12:18 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 22/12/2020 15:47, Tamas K Lengyel wrote:
>
>
>
> On Tue, Dec 22, 2020 at 6:14 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 22/12/2020 10:00, Jan Beulich wrote:
>> > On 21.12.2020 20:36, Andrew Cooper wrote:
>> >> Hello,
>> >>
>> >> We have some very complicated hypercalls, createdomain, and max_vcpus a
>> >> close second, with immense complexity, and very hard-to-test error handling.
>> >>
>> >> It is no surprise that the error handling is riddled with bugs.
>> >>
>> >> Random failures from core functions is one way, but I'm not sure that
>> >> will be especially helpful.  In particular, we'd need a way to exclude
>> >> "dom0 critical" operations so we've got a usable system to run testing on.
>> >>
>> >> As an alternative, how about adding a fault_ttl field into the hypercall?
>> >>
>> >> The exact paths taken in {domain,vcpu}_create() are sensitive to the
>> >> hardware, Xen Kconfig, and other parameters passed into the
>> >> hypercall(s).  The testing logic doesn't really want to care about what
>> >> failed; simply that the error was handled correctly.
>> >>
>> >> So a test for this might look like:
>> >>
>> >> cfg = { ... };
>> >> while ( xc_create_domain(xch, cfg) < 0 )
>> >>     cfg.fault_ttl++;
>> >>
>> >>
>> >> The pro's of this approach is that for a specific build of Xen on a
>> >> piece of hardware, it ought to check every failure path in
>> >> domain_create(), until the ttl finally gets higher than the number of
>> >> fail-able actions required to construct a domain.  Also, the test
>> >> doesn't need changing as the complexity of domain_create() changes.
>> >>
>> >> The main con will mostly likely be the invasiveness of code in Xen, but
>> >> I suppose any fault injection is going to be invasive to a certain extent.
>> > While I like the idea in principle, the innocent looking
>> >
>> > cfg = { ... };
>> >
>> > is quite a bit of a concern here as well: Depending on the precise
>> > settings, paths taken in the hypervisor may heavily vary, and hence
>> > such a test will only end up being useful if it covers a wide
>> > variety of settings. Even if the number of tests to execute turned
>> > out to still be manageable today, it may quickly turn out not
>> > sufficiently scalable as we add new settings controllable right at
>> > domain creation (which I understand is the plan).
>>
>> Well - there are two aspects here.
>>
>> First, 99% of all VMs in practice are one of 3 or 4 configurations.  An
>> individual configuration is O(n) time complexity to test with fault_ttl,
>> depending on the size of Xen's logic, and we absolutely want to be able
>> to test these deterministically and to completion.
>>
>> For the plethora of other configurations, I agree that it is infeasible
>> to test them all.  However, a hypercall like this is easy to wire up
>> into a fuzzing harness.
>>
>> TBH, I was thinking of something like
>> https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen
>> and XTF "dom0" poking just this hypercall.  All the other complicated
>> bits of wiring AFL up appear to have been done.
>>
>> Perhaps when we exhaust that as a source of bugs, we move onto fuzzing
>> the L0 Xen, because running on native will give it more paths to
>> explore.  We'd need some way of reporting path/trace data back to AFL in
>> dom0 which might require a bit plumbing.
>
>
> This is a pretty cool idea, I would be very interested in trying this out. If running Xen nested in a HVM domain is possible (my experiments with nested setups using Xen have only worked on ancient hw last time I tried) then running the fuzzer would be entirely possible using VM forks. You don't even need a special "dom0", you could just add the fuzzer's CPUID harness to Xen's hypercall handler and the only thing needed from the nested dom0 would be to trigger the hypercall with a normal config. The fuzzer would take it from there and replace the config with the fuzzed version directly in VM forks. Defining what to report as a "crash" to AFL would still need to be defined manually for Xen as the current sink points are Linux specific (https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h), but that should be straight forward.
>
> Also, running the fuzzer with PVH guests hasn't been tested but since all VM forking needs is EPT it should work.
>
>
> Xen running inside Xen definitely works, and is even supported as far as PV-Shim goes (i.e. no nested virt).  That would limit testing to just creation of PV guests at L1, which is plenty to get started with.
>
> Xen nested under Xen does work to a first approximation, and for the purposes of fuzzing areas other than the nested-virt logic, might even be ok.  (I use this configuration for a fair chunk of hvm development).
>

Sounds like there is no blocker on fuzzing any hypercall handler then.
Just have to add the harness, define what code-path needs to be
defined as a sink, and off you go. Should work with PT-based coverage
no problem. If you need to fuzz multiple hypercalls that may require
some tweaking as the PT based coverage doesn't support a change in CR3
right now (it's just a limitation of libxdc that does the PT
decoding). You could always fall-back to the
disassemble/breakpoint/singlestep coverage option but would need to
add vmcall/vmenter instruction to the control-flow instruction list
here https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/tracer.c#L46.

Tamas