diff mbox series

[4.17] x86/shadow: drop (replace) bogus assertions

Message ID e447da22-23d6-d3db-313d-4e4ca009c3df@suse.com (mailing list archive)
State New, archived
Headers show
Series [4.17] x86/shadow: drop (replace) bogus assertions | expand

Commit Message

Jan Beulich Oct. 14, 2022, 8:49 a.m. UTC
The addition of a call to shadow_blow_tables() from shadow_teardown()
has resulted in the "no vcpus" related assertion becoming triggerable:
If domain_create() fails with at least one page successfully allocated
in the course of shadow_enable(), or if domain_create() succeeds and
the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.

The assertion's comment was bogus anyway: Shadow mode has been getting
enabled before allocation of vCPU-s for quite some time. Convert the
assertion to a conditional: As long as there are no vCPU-s, there's
nothing to blow away.

Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>

A similar assertion/comment pair exists in _shadow_prealloc(); the
comment is similarly bogus, and the assertion could in principle trigger
e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
at the same time by a similar early return, here indicating failure to
the caller (which will generally lead to the domain being crashed in
shadow_prealloc()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While in shadow_blow_tables() the option exists to simply remove the
assertion without adding a new conditional (the two loops simply will
do nothing), the same isn't true for _shadow_prealloc(): There we
would then trigger the ASSERT_UNREACHABLE() near the end of the
function.

Comments

Andrew Cooper Oct. 14, 2022, 10:02 a.m. UTC | #1
On 14/10/2022 09:49, Jan Beulich wrote:
> The addition of a call to shadow_blow_tables() from shadow_teardown()
> has resulted in the "no vcpus" related assertion becoming triggerable:
> If domain_create() fails with at least one page successfully allocated
> in the course of shadow_enable(), or if domain_create() succeeds and
> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.

It warrants pointing out that are unit tests in the tree which do
exactly this.

> The assertion's comment was bogus anyway: Shadow mode has been getting
> enabled before allocation of vCPU-s for quite some time.

I agree with the principle of what you're saying, but I don't think it's
accurate.

Shadow (vs hap) has always been part of domain create.  But we've always
had a problem where we need to wait for a shadow op to allocate some
shadow memory before various domain-centric operations can proceed.

As with the ARM GICv2 issues, we do want to address this and let
domain_create() (or some continuable version of it) allocate the bare
minimum shadow pool size, which simplifies a load of other codepaths.

>  Convert the
> assertion to a conditional: As long as there are no vCPU-s, there's
> nothing to blow away.
>
> Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
> A similar assertion/comment pair exists in _shadow_prealloc(); the
> comment is similarly bogus, and the assertion could in principle trigger
> e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
> at the same time by a similar early return, here indicating failure to
> the caller (which will generally lead to the domain being crashed in
> shadow_prealloc()).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> While in shadow_blow_tables() the option exists to simply remove the
> assertion without adding a new conditional (the two loops simply will
> do nothing), the same isn't true for _shadow_prealloc(): There we
> would then trigger the ASSERT_UNREACHABLE() near the end of the
> function.

IMO, blow_tables() has no business caring about vcpus.  It should be
idempotent, and ideally wants to be left in a state where it doesn't
need modifying when the aformentioned create changes happen.

For prealloc(), I'd argue that it shouldn't care, but as this is a
bugfix to an XSA, leaving it in this form for now is the safer course of
action.  Whomever cleans up the create path can do the work to ensure
that all prealloc() paths are safe before vcpus are allocated.

~Andrew
Roger Pau Monne Oct. 14, 2022, 10:30 a.m. UTC | #2
On Fri, Oct 14, 2022 at 10:49:55AM +0200, Jan Beulich wrote:
> The addition of a call to shadow_blow_tables() from shadow_teardown()
> has resulted in the "no vcpus" related assertion becoming triggerable:
> If domain_create() fails with at least one page successfully allocated
> in the course of shadow_enable(), or if domain_create() succeeds and
> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
> 
> The assertion's comment was bogus anyway: Shadow mode has been getting
> enabled before allocation of vCPU-s for quite some time. Convert the
> assertion to a conditional: As long as there are no vCPU-s, there's
> nothing to blow away.
> 
> Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> A similar assertion/comment pair exists in _shadow_prealloc(); the
> comment is similarly bogus, and the assertion could in principle trigger
> e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
> at the same time by a similar early return, here indicating failure to
> the caller (which will generally lead to the domain being crashed in
> shadow_prealloc()).

It's my understanding we do care about this because a control domain
could try to populate the p2m before calling XEN_DOMCTL_max_vcpus, and
hence could trigger the ASSERT, as otherwise asserting would be fine.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> While in shadow_blow_tables() the option exists to simply remove the
> assertion without adding a new conditional (the two loops simply will
> do nothing), the same isn't true for _shadow_prealloc(): There we
> would then trigger the ASSERT_UNREACHABLE() near the end of the
> function.

I think it's fine to exit early.

Thanks, Roger.
Jan Beulich Oct. 14, 2022, 10:43 a.m. UTC | #3
On 14.10.2022 12:02, Andrew Cooper wrote:
> On 14/10/2022 09:49, Jan Beulich wrote:
>> The addition of a call to shadow_blow_tables() from shadow_teardown()
>> has resulted in the "no vcpus" related assertion becoming triggerable:
>> If domain_create() fails with at least one page successfully allocated
>> in the course of shadow_enable(), or if domain_create() succeeds and
>> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
> 
> It warrants pointing out that are unit tests in the tree which do
> exactly this.

Can do.

>> The assertion's comment was bogus anyway: Shadow mode has been getting
>> enabled before allocation of vCPU-s for quite some time.
> 
> I agree with the principle of what you're saying, but I don't think it's
> accurate.

I'm afraid I can't derive from ...

> Shadow (vs hap) has always been part of domain create.  But we've always
> had a problem where we need to wait for a shadow op to allocate some
> shadow memory before various domain-centric operations can proceed.
> 
> As with the ARM GICv2 issues, we do want to address this and let
> domain_create() (or some continuable version of it) allocate the bare
> minimum shadow pool size, which simplifies a load of other codepaths.

... this why the statement isn't accurate. What both functions are trying
to do is reclaim pages from in-use shadows. None can exist without vCPU-s.
Yet still shadow mode has been getting enabled before vCPU-s are allocated.

>> ---
>> While in shadow_blow_tables() the option exists to simply remove the
>> assertion without adding a new conditional (the two loops simply will
>> do nothing), the same isn't true for _shadow_prealloc(): There we
>> would then trigger the ASSERT_UNREACHABLE() near the end of the
>> function.
> 
> IMO, blow_tables() has no business caring about vcpus.  It should be
> idempotent, and ideally wants to be left in a state where it doesn't
> need modifying when the aformentioned create changes happen.

First: Both the change as done by the patch as well as the alternative
pointed out fulfill this requirement, afaict at least. Hence what you
say doesn't make clear whether you agree with the change as done or
whether you'd prefer the alternative (and if so, why).

Then the two functions do about the same thing, just with prealloc
having an early exit condition (once having reached the intended
count of available pages). Therefore ...

> For prealloc(), I'd argue that it shouldn't care, but as this is a
> bugfix to an XSA, leaving it in this form for now is the safer course of
> action.  Whomever cleans up the create path can do the work to ensure
> that all prealloc() paths are safe before vcpus are allocated.

... I think the two functions want to remain as closely in sync as
possible (I'm afraid I didn't fully have this in mind when writing
the remark - it should have been worded a little differently); really
I think it would be best if the duplicate code was folded. Hence to
me leaving alone that function (which is what I understand you
suggest) is not a good option, yet as explained in the post-commit-
message remark not replacing the assertion by an if() would have an
unwanted consequence.

Jan
Jan Beulich Oct. 14, 2022, 10:50 a.m. UTC | #4
On 14.10.2022 12:30, Roger Pau Monné wrote:
> On Fri, Oct 14, 2022 at 10:49:55AM +0200, Jan Beulich wrote:
>> The addition of a call to shadow_blow_tables() from shadow_teardown()
>> has resulted in the "no vcpus" related assertion becoming triggerable:
>> If domain_create() fails with at least one page successfully allocated
>> in the course of shadow_enable(), or if domain_create() succeeds and
>> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
>>
>> The assertion's comment was bogus anyway: Shadow mode has been getting
>> enabled before allocation of vCPU-s for quite some time. Convert the
>> assertion to a conditional: As long as there are no vCPU-s, there's
>> nothing to blow away.
>>
>> Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> A similar assertion/comment pair exists in _shadow_prealloc(); the
>> comment is similarly bogus, and the assertion could in principle trigger
>> e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
>> at the same time by a similar early return, here indicating failure to
>> the caller (which will generally lead to the domain being crashed in
>> shadow_prealloc()).
> 
> It's my understanding we do care about this because a control domain
> could try to populate the p2m before calling XEN_DOMCTL_max_vcpus, and
> hence could trigger the ASSERT, as otherwise asserting would be fine.

Yes, that's the scenario I had in mind.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, but Andrew and I will need to reach agreement before I can put
this (or whatever alternative) in.

Jan
Jan Beulich Oct. 24, 2022, 1:36 p.m. UTC | #5
On 14.10.2022 12:30, Roger Pau Monné wrote:
> On Fri, Oct 14, 2022 at 10:49:55AM +0200, Jan Beulich wrote:
>> The addition of a call to shadow_blow_tables() from shadow_teardown()
>> has resulted in the "no vcpus" related assertion becoming triggerable:
>> If domain_create() fails with at least one page successfully allocated
>> in the course of shadow_enable(), or if domain_create() succeeds and
>> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
>>
>> The assertion's comment was bogus anyway: Shadow mode has been getting
>> enabled before allocation of vCPU-s for quite some time. Convert the
>> assertion to a conditional: As long as there are no vCPU-s, there's
>> nothing to blow away.
>>
>> Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> A similar assertion/comment pair exists in _shadow_prealloc(); the
>> comment is similarly bogus, and the assertion could in principle trigger
>> e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
>> at the same time by a similar early return, here indicating failure to
>> the caller (which will generally lead to the domain being crashed in
>> shadow_prealloc()).
> 
> It's my understanding we do care about this because a control domain
> could try to populate the p2m before calling XEN_DOMCTL_max_vcpus, and
> hence could trigger the ASSERT, as otherwise asserting would be fine.
> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

In a discussion amongst maintainers we've settled Andrew's reservations.
May I please ask for a release-ack for this change, so it can go in (as
a bug fix on top of the recent batch of XSAs)?

Thanks, Jan
Henry Wang Oct. 24, 2022, 1:40 p.m. UTC | #6
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH][4.17] x86/shadow: drop (replace) bogus assertions
> 
> On 14.10.2022 12:30, Roger Pau Monné wrote:
> > On Fri, Oct 14, 2022 at 10:49:55AM +0200, Jan Beulich wrote:
> >> The addition of a call to shadow_blow_tables() from shadow_teardown()
> >> has resulted in the "no vcpus" related assertion becoming triggerable:
> >> If domain_create() fails with at least one page successfully allocated
> >> in the course of shadow_enable(), or if domain_create() succeeds and
> >> the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
> >>
> >> The assertion's comment was bogus anyway: Shadow mode has been
> getting
> >> enabled before allocation of vCPU-s for quite some time. Convert the
> >> assertion to a conditional: As long as there are no vCPU-s, there's
> >> nothing to blow away.
> >>
> >> Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool
> preemptively")
> >> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>
> >> A similar assertion/comment pair exists in _shadow_prealloc(); the
> >> comment is similarly bogus, and the assertion could in principle trigger
> >> e.g. when shadow_alloc_p2m_page() is called early enough. Replace
> those
> >> at the same time by a similar early return, here indicating failure to
> >> the caller (which will generally lead to the domain being crashed in
> >> shadow_prealloc()).
> >
> > It's my understanding we do care about this because a control domain
> > could try to populate the p2m before calling XEN_DOMCTL_max_vcpus,
> and
> > hence could trigger the ASSERT, as otherwise asserting would be fine.
> >
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> In a discussion amongst maintainers we've settled Andrew's reservations.
> May I please ask for a release-ack for this change, so it can go in (as
> a bug fix on top of the recent batch of XSAs)?

Absolutely. Thanks for noticing.

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry

> 
> Thanks, Jan
diff mbox series

Patch

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -943,8 +943,9 @@  static bool __must_check _shadow_preallo
         /* No reclaim when the domain is dying, teardown will take care of it. */
         return false;
 
-    /* Shouldn't have enabled shadows if we've no vcpus. */
-    ASSERT(d->vcpu && d->vcpu[0]);
+    /* Nothing to reclaim when there are no vcpus yet. */
+    if ( !d->vcpu[0] )
+        return false;
 
     /* Stage one: walk the list of pinned pages, unpinning them */
     perfc_incr(shadow_prealloc_1);
@@ -1034,8 +1035,9 @@  void shadow_blow_tables(struct domain *d
     mfn_t smfn;
     int i;
 
-    /* Shouldn't have enabled shadows if we've no vcpus. */
-    ASSERT(d->vcpu && d->vcpu[0]);
+    /* Nothing to do when there are no vcpus yet. */
+    if ( !d->vcpu[0] )
+        return;
 
     /* Pass one: unpin all pinned pages */
     foreach_pinned_shadow(d, sp, t)