diff mbox series

[2/2] x86/hap: Resolve mm-lock order violations when forking VMs with nested p2m

Message ID 19aab6220bf191a31902488ed38c51d239572269.1609781242.git.tamas.lengyel@intel.com (mailing list archive)
State New
Headers show
Series [1/2] x86/mem_sharing: copy cpuid during vm forking | expand

Commit Message

Lengyel, Tamas Jan. 4, 2021, 5:41 p.m. UTC
Several lock-order violations have been encountered while attempting to fork
VMs with nestedhvm=1 set. This patch resolves the issues.

The order violations stems from a call to p2m_flush_nestedp2m being performed
whenever the hostp2m changes. This functions always takes the p2m lock for the
nested_p2m. However, with sharing the p2m locks always have to be taken before
the sharing lock. To resolve this issue we avoid taking the sharing lock where
possible (and was actually unecessary to begin with). But we also make
p2m_flush_nestedp2m aware that the p2m lock may have already been taken and
preemptively take all nested_p2m locks before unsharing a page where taking the
sharing lock is necessary.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 43 +++++++++++++++++++++++------------
 xen/arch/x86/mm/p2m.c         | 11 ++++++++-
 2 files changed, 39 insertions(+), 15 deletions(-)

Comments

Jan Beulich Jan. 6, 2021, 12:03 p.m. UTC | #1
On 04.01.2021 18:41, Tamas K Lengyel wrote:
> @@ -893,13 +894,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
>          goto out;
>  
>      /*
> -     * Now that the page is validated, we can lock it. There is no
> -     * race because we're holding the p2m entry, so no one else
> -     * could be nominating this gfn.
> +     * Now that the page is validated, we can make it shared. There is no race
> +     * because we're holding the p2m entry, so no one else could be nominating
> +     * this gfn & and it is evidently not yet shared with any other VM, thus we
> +     * don't need to take the mem_sharing_page_lock here.
>       */
> -    ret = -ENOENT;
> -    if ( !mem_sharing_page_lock(page) )
> -        goto out;

Isn't it too limited to mention just nomination in the comment?
Unsharing, for example, also needs to be prevented (or else
the tail of sharing could race with the beginning of unsharing).
The other paths looks to similarly hold the GFN, so the
reasoning is fine for them as well. Except maybe audit() - what
about races with that one?

> @@ -1214,7 +1212,7 @@ int __mem_sharing_unshare_page(struct domain *d,
>      p2m_type_t p2mt;
>      mfn_t mfn;
>      struct page_info *page, *old_page;
> -    int last_gfn;
> +    int last_gfn, rc = 0;

I consider this unhelpful: last_gfn really wants to be bool, and
hence wants to not share a declaration with rc. But you're the
maintainer ...

> @@ -1226,6 +1224,15 @@ int __mem_sharing_unshare_page(struct domain *d,
>          return 0;
>      }
>  
> +    /* lock nested p2ms to avoid lock-order violation */

Would you mind mentioning here the other side of the possible
violation, to aid the reader?

> +    if ( unlikely(nestedhvm_enabled(d)) )
> +    {
> +        int i;

unsigned int please (also further down), no matter that there may
be other similar examples of (bad) use of plain int.

> +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
> +            p2m_lock(d->arch.nested_p2m[i]);

From a brief scan, this is the first instance of acquiring all
nested p2m locks in one go. Ordering these by index is perhaps
fine, but I think this wants spelling out in e.g. mm-locks.h. Of
course the question is if you really need to go this far, i.e.
whether really all of the locks need holding. This is even more
so with p2m_flush_table_locked() not really looking to be a
quick operation, when there have many pages accumulated for it.
I.e. the overall lock holding time may turn out even more
excessive this way than it apparently already is.

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1598,8 +1598,17 @@ void
>  p2m_flush_nestedp2m(struct domain *d)
>  {
>      int i;
> +    struct p2m_domain *p2m;
> +
>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
> -        p2m_flush_table(d->arch.nested_p2m[i]);
> +    {
> +        p2m = d->arch.nested_p2m[i];

Please move the declaration here, making this the variable's
initializer (unless line length constraints make the latter
undesirable).

Jan
Tamas K Lengyel Jan. 6, 2021, 3:29 p.m. UTC | #2
On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.01.2021 18:41, Tamas K Lengyel wrote:
> > @@ -893,13 +894,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
> >          goto out;
> >
> >      /*
> > -     * Now that the page is validated, we can lock it. There is no
> > -     * race because we're holding the p2m entry, so no one else
> > -     * could be nominating this gfn.
> > +     * Now that the page is validated, we can make it shared. There is no race
> > +     * because we're holding the p2m entry, so no one else could be nominating
> > +     * this gfn & and it is evidently not yet shared with any other VM, thus we
> > +     * don't need to take the mem_sharing_page_lock here.
> >       */
> > -    ret = -ENOENT;
> > -    if ( !mem_sharing_page_lock(page) )
> > -        goto out;
>
> Isn't it too limited to mention just nomination in the comment?
> Unsharing, for example, also needs to be prevented (or else
> the tail of sharing could race with the beginning of unsharing).
> The other paths looks to similarly hold the GFN, so the
> reasoning is fine for them as well. Except maybe audit() - what
> about races with that one?

Audit is unused and should be removed. I don't plan on maintaining
that bit, it might already be broken and I don't see a use for it.

Unsharing is already protected. No other domain could be doing an
unshare since no other domain can have this page mapped as shared
before nominate finishes as you need a sharing ref for it that's
returned from nominate. If the same domain is triggering an unshare we
are protected because the first thing unshare_page() does is get_gfn,
which we hold already until nominate finishes. So we are all good.

>
> > @@ -1214,7 +1212,7 @@ int __mem_sharing_unshare_page(struct domain *d,
> >      p2m_type_t p2mt;
> >      mfn_t mfn;
> >      struct page_info *page, *old_page;
> > -    int last_gfn;
> > +    int last_gfn, rc = 0;
>
> I consider this unhelpful: last_gfn really wants to be bool, and
> hence wants to not share a declaration with rc. But you're the
> maintainer ...

Doesn't really matter in the end IMHO.

>
> > @@ -1226,6 +1224,15 @@ int __mem_sharing_unshare_page(struct domain *d,
> >          return 0;
> >      }
> >
> > +    /* lock nested p2ms to avoid lock-order violation */
>
> Would you mind mentioning here the other side of the possible
> violation, to aid the reader?

You mean what the nested p2m locks would conflict with? I think in the
context of mem_sharing it's clear that the only thing it can conflict
with is the mem_sharing mm lock.

>
> > +    if ( unlikely(nestedhvm_enabled(d)) )
> > +    {
> > +        int i;
>
> unsigned int please (also further down), no matter that there may
> be other similar examples of (bad) use of plain int.

IMHO this is the type of change request that makes absolutely 0
difference at the end.

>
> > +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
> > +            p2m_lock(d->arch.nested_p2m[i]);
>
> From a brief scan, this is the first instance of acquiring all
> nested p2m locks in one go. Ordering these by index is perhaps
> fine, but I think this wants spelling out in e.g. mm-locks.h. Of
> course the question is if you really need to go this far, i.e.
> whether really all of the locks need holding. This is even more
> so with p2m_flush_table_locked() not really looking to be a
> quick operation, when there have many pages accumulated for it.
> I.e. the overall lock holding time may turn out even more
> excessive this way than it apparently already is.

I agree this is not ideal but it gets things working without Xen
crashing. I would prefer if we could get rid of the mm lock ordering
altogether in this context. We already hold the host p2m lock and the
sharing lock, that ought to suffice. But I don't have a better way to
work around it for now.

>
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1598,8 +1598,17 @@ void
> >  p2m_flush_nestedp2m(struct domain *d)
> >  {
> >      int i;
> > +    struct p2m_domain *p2m;
> > +
> >      for ( i = 0; i < MAX_NESTEDP2M; i++ )
> > -        p2m_flush_table(d->arch.nested_p2m[i]);
> > +    {
> > +        p2m = d->arch.nested_p2m[i];
>
> Please move the declaration here, making this the variable's
> initializer (unless line length constraints make the latter
> undesirable).

I really don't get what difference this would make.

Thanks for the review,
Tamas
Jan Beulich Jan. 6, 2021, 4:11 p.m. UTC | #3
On 06.01.2021 16:29, Tamas K Lengyel wrote:
> On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 04.01.2021 18:41, Tamas K Lengyel wrote:
>>> @@ -1226,6 +1224,15 @@ int __mem_sharing_unshare_page(struct domain *d,
>>>          return 0;
>>>      }
>>>
>>> +    /* lock nested p2ms to avoid lock-order violation */
>>
>> Would you mind mentioning here the other side of the possible
>> violation, to aid the reader?
> 
> You mean what the nested p2m locks would conflict with? I think in the
> context of mem_sharing it's clear that the only thing it can conflict
> with is the mem_sharing mm lock.

I don't think it's all this obvious. It wouldn't been to me, at
least, without also having this change's description at hand.

>>> +    if ( unlikely(nestedhvm_enabled(d)) )
>>> +    {
>>> +        int i;
>>
>> unsigned int please (also further down), no matter that there may
>> be other similar examples of (bad) use of plain int.
> 
> IMHO this is the type of change request that makes absolutely 0
> difference at the end.

(see below, applies here as well)

>>> +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
>>> +            p2m_lock(d->arch.nested_p2m[i]);
>>
>> From a brief scan, this is the first instance of acquiring all
>> nested p2m locks in one go. Ordering these by index is perhaps
>> fine, but I think this wants spelling out in e.g. mm-locks.h. Of
>> course the question is if you really need to go this far, i.e.
>> whether really all of the locks need holding. This is even more
>> so with p2m_flush_table_locked() not really looking to be a
>> quick operation, when there have many pages accumulated for it.
>> I.e. the overall lock holding time may turn out even more
>> excessive this way than it apparently already is.
> 
> I agree this is not ideal but it gets things working without Xen
> crashing. I would prefer if we could get rid of the mm lock ordering
> altogether in this context.

How would this do any good? You'd then be at risk of actually
hitting a lock order violation. These are often quite hard to
debug.

> We already hold the host p2m lock and the
> sharing lock, that ought to suffice.

I don't see how holding any locks can prevent lock order
violations when further ones get acquired. I also didn't think
the nested p2m locks were redundant with the host one.

>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -1598,8 +1598,17 @@ void
>>>  p2m_flush_nestedp2m(struct domain *d)
>>>  {
>>>      int i;
>>> +    struct p2m_domain *p2m;
>>> +
>>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
>>> -        p2m_flush_table(d->arch.nested_p2m[i]);
>>> +    {
>>> +        p2m = d->arch.nested_p2m[i];
>>
>> Please move the declaration here, making this the variable's
>> initializer (unless line length constraints make the latter
>> undesirable).
> 
> I really don't get what difference this would make.

Both choice of (generally) inappropriate types (further up)
and placement of declarations (here) (and of course also
other style violations) can set bad precedents even if in a
specific case it may not matter much. So yes, it may be
good enough here, but it would violate our desire to
- use unsigned types when a variable will hold only non-
  negative values (which in the general case may improve
  generated code in particular on x86-64),
- limit the scopes of variables as much as possible, to
  more easily spot inappropriate uses (like bypassing
  initialization).

This code here actually demonstrates such a bad precedent,
using plain int for the loop induction variable. While I
can't be any way near sure, there's a certain chance you
actually took it and copied it to
__mem_sharing_unshare_page(). The chance of such happening
is what we'd like to reduce over time.

Jan
Tamas K Lengyel Jan. 6, 2021, 4:26 p.m. UTC | #4
On Wed, Jan 6, 2021 at 11:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.01.2021 16:29, Tamas K Lengyel wrote:
> > On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 04.01.2021 18:41, Tamas K Lengyel wrote:
> >>> @@ -1226,6 +1224,15 @@ int __mem_sharing_unshare_page(struct domain *d,
> >>>          return 0;
> >>>      }
> >>>
> >>> +    /* lock nested p2ms to avoid lock-order violation */
> >>
> >> Would you mind mentioning here the other side of the possible
> >> violation, to aid the reader?
> >
> > You mean what the nested p2m locks would conflict with? I think in the
> > context of mem_sharing it's clear that the only thing it can conflict
> > with is the mem_sharing mm lock.
>
> I don't think it's all this obvious. It wouldn't been to me, at
> least, without also having this change's description at hand.
>
> >>> +    if ( unlikely(nestedhvm_enabled(d)) )
> >>> +    {
> >>> +        int i;
> >>
> >> unsigned int please (also further down), no matter that there may
> >> be other similar examples of (bad) use of plain int.
> >
> > IMHO this is the type of change request that makes absolutely 0
> > difference at the end.
>
> (see below, applies here as well)
>
> >>> +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
> >>> +            p2m_lock(d->arch.nested_p2m[i]);
> >>
> >> From a brief scan, this is the first instance of acquiring all
> >> nested p2m locks in one go. Ordering these by index is perhaps
> >> fine, but I think this wants spelling out in e.g. mm-locks.h. Of
> >> course the question is if you really need to go this far, i.e.
> >> whether really all of the locks need holding. This is even more
> >> so with p2m_flush_table_locked() not really looking to be a
> >> quick operation, when there have many pages accumulated for it.
> >> I.e. the overall lock holding time may turn out even more
> >> excessive this way than it apparently already is.
> >
> > I agree this is not ideal but it gets things working without Xen
> > crashing. I would prefer if we could get rid of the mm lock ordering
> > altogether in this context.
>
> How would this do any good? You'd then be at risk of ac"ually
> hitting a lock order violation. These are often quite hard to
> debug.

The whole lock ordering is just a pain and it gets us into situations
like this where we are forced to take a bunch of locks to just change
one thing. I don't have a better solution but I'm also not 100%
convinced that this lock ordering setup is even sane. Sometimes it
really ought to be enough to just take one "mm master lock" without
having to chase down all of them individually.

>
> > We already hold the host p2m lock and the
> > sharing lock, that ought to suffice.
>
> I don't see how holding any locks can prevent lock order
> violations when further ones get acquired. I also didn't think
> the nested p2m locks were redundant with the host one.
>
> >>> --- a/xen/arch/x86/mm/p2m.c
> >>> +++ b/xen/arch/x86/mm/p2m.c
> >>> @@ -1598,8 +1598,17 @@ void
> >>>  p2m_flush_nestedp2m(struct domain *d)
> >>>  {
> >>>      int i;
> >>> +    struct p2m_domain *p2m;
> >>> +
> >>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
> >>> -        p2m_flush_table(d->arch.nested_p2m[i]);
> >>> +    {
> >>> +        p2m = d->arch.nested_p2m[i];
> >>
> >> Please move the declaration here, making this the variable's
> >> initializer (unless line length constraints make the latter
> >> undesirable).
> >
> > I really don't get what difference this would make.
>
> Both choice of (generally) inappropriate types (further up)
> and placement of declarations (here) (and of course also
> other style violations) can set bad precedents even if in a
> specific case it may not matter much. So yes, it may be
> good enough here, but it would violate our desire to
> - use unsigned types when a variable will hold only non-
>   negative values (which in the general case may improve
>   generated code in particular on x86-64),
> - limit the scopes of variables as much as possible, to
>   more easily spot inappropriate uses (like bypassing
>   initialization).
>
> This code here actually demonstrates such a bad precedent,
> using plain int for the loop induction variable. While I
> can't be any way near sure, there's a certain chance you
> actually took it and copied it to
> __mem_sharing_unshare_page(). The chance of such happening
> is what we'd like to reduce over time.

Yes, I copied it from p2m.c. All I meant was that such minor changes
are generally speaking not worth a round-trip of sending new patches.
I obviously don't care whether this is signed or unsigned. Minor stuff
like that could be changed on commit and is not even worth having a
discussion about.

Tamas
Jan Beulich Jan. 7, 2021, 12:25 p.m. UTC | #5
On 06.01.2021 17:26, Tamas K Lengyel wrote:
> On Wed, Jan 6, 2021 at 11:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 06.01.2021 16:29, Tamas K Lengyel wrote:
>>> On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 04.01.2021 18:41, Tamas K Lengyel wrote:
>>>>> +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
>>>>> +            p2m_lock(d->arch.nested_p2m[i]);
>>>>
>>>> From a brief scan, this is the first instance of acquiring all
>>>> nested p2m locks in one go. Ordering these by index is perhaps
>>>> fine, but I think this wants spelling out in e.g. mm-locks.h. Of
>>>> course the question is if you really need to go this far, i.e.
>>>> whether really all of the locks need holding. This is even more
>>>> so with p2m_flush_table_locked() not really looking to be a
>>>> quick operation, when there have many pages accumulated for it.
>>>> I.e. the overall lock holding time may turn out even more
>>>> excessive this way than it apparently already is.
>>>
>>> I agree this is not ideal but it gets things working without Xen
>>> crashing. I would prefer if we could get rid of the mm lock ordering
>>> altogether in this context.
>>
>> How would this do any good? You'd then be at risk of ac"ually
>> hitting a lock order violation. These are often quite hard to
>> debug.
> 
> The whole lock ordering is just a pain and it gets us into situations
> like this where we are forced to take a bunch of locks to just change
> one thing. I don't have a better solution but I'm also not 100%
> convinced that this lock ordering setup is even sane. Sometimes it
> really ought to be enough to just take one "mm master lock" without
> having to chase down all of them individually.

The concept of a "master lock" would imply all parties wanting to
make _any_ change anywhere (or even just not being certain whether
a change will need making) would need to hold it for writing. This
is clearly against the overall goal of reducing lock contention on
in particular the p2m lock. 

>>>>> --- a/xen/arch/x86/mm/p2m.c
>>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>>> @@ -1598,8 +1598,17 @@ void
>>>>>  p2m_flush_nestedp2m(struct domain *d)
>>>>>  {
>>>>>      int i;
>>>>> +    struct p2m_domain *p2m;
>>>>> +
>>>>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
>>>>> -        p2m_flush_table(d->arch.nested_p2m[i]);
>>>>> +    {
>>>>> +        p2m = d->arch.nested_p2m[i];
>>>>
>>>> Please move the declaration here, making this the variable's
>>>> initializer (unless line length constraints make the latter
>>>> undesirable).
>>>
>>> I really don't get what difference this would make.
>>
>> Both choice of (generally) inappropriate types (further up)
>> and placement of declarations (here) (and of course also
>> other style violations) can set bad precedents even if in a
>> specific case it may not matter much. So yes, it may be
>> good enough here, but it would violate our desire to
>> - use unsigned types when a variable will hold only non-
>>   negative values (which in the general case may improve
>>   generated code in particular on x86-64),
>> - limit the scopes of variables as much as possible, to
>>   more easily spot inappropriate uses (like bypassing
>>   initialization).
>>
>> This code here actually demonstrates such a bad precedent,
>> using plain int for the loop induction variable. While I
>> can't be any way near sure, there's a certain chance you
>> actually took it and copied it to
>> __mem_sharing_unshare_page(). The chance of such happening
>> is what we'd like to reduce over time.
> 
> Yes, I copied it from p2m.c. All I meant was that such minor changes
> are generally speaking not worth a round-trip of sending new patches.
> I obviously don't care whether this is signed or unsigned. Minor stuff
> like that could be changed on commit and is not even worth having a
> discussion about.

I'm sorry, but no. Committing ought to be a purely mechanical
thing. It is a _courtesy_ of the committer if they offer to
make adjustments. If us offering this regularly gets taken as
"expected behavior", I'm afraid I personally would stop making
such an offer, and instead insist on further round trips.

Jan
Tamas K Lengyel Jan. 7, 2021, 12:43 p.m. UTC | #6
On Thu, Jan 7, 2021 at 7:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.01.2021 17:26, Tamas K Lengyel wrote:
> > On Wed, Jan 6, 2021 at 11:11 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 06.01.2021 16:29, Tamas K Lengyel wrote:
> >>> On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 04.01.2021 18:41, Tamas K Lengyel wrote:
> >>>>> +        for ( i = 0; i < MAX_NESTEDP2M; i++ )
> >>>>> +            p2m_lock(d->arch.nested_p2m[i]);
> >>>>
> >>>> From a brief scan, this is the first instance of acquiring all
> >>>> nested p2m locks in one go. Ordering these by index is perhaps
> >>>> fine, but I think this wants spelling out in e.g. mm-locks.h. Of
> >>>> course the question is if you really need to go this far, i.e.
> >>>> whether really all of the locks need holding. This is even more
> >>>> so with p2m_flush_table_locked() not really looking to be a
> >>>> quick operation, when there have many pages accumulated for it.
> >>>> I.e. the overall lock holding time may turn out even more
> >>>> excessive this way than it apparently already is.
> >>>
> >>> I agree this is not ideal but it gets things working without Xen
> >>> crashing. I would prefer if we could get rid of the mm lock ordering
> >>> altogether in this context.
> >>
> >> How would this do any good? You'd then be at risk of ac"ually
> >> hitting a lock order violation. These are often quite hard to
> >> debug.
> >
> > The whole lock ordering is just a pain and it gets us into situations
> > like this where we are forced to take a bunch of locks to just change
> > one thing. I don't have a better solution but I'm also not 100%
> > convinced that this lock ordering setup is even sane. Sometimes it
> > really ought to be enough to just take one "mm master lock" without
> > having to chase down all of them individually.
>
> The concept of a "master lock" would imply all parties wanting to
> make _any_ change anywhere (or even just not being certain whether
> a change will need making) would need to hold it for writing. This
> is clearly against the overall goal of reducing lock contention on
> in particular the p2m lock.
>
> >>>>> --- a/xen/arch/x86/mm/p2m.c
> >>>>> +++ b/xen/arch/x86/mm/p2m.c
> >>>>> @@ -1598,8 +1598,17 @@ void
> >>>>>  p2m_flush_nestedp2m(struct domain *d)
> >>>>>  {
> >>>>>      int i;
> >>>>> +    struct p2m_domain *p2m;
> >>>>> +
> >>>>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
> >>>>> -        p2m_flush_table(d->arch.nested_p2m[i]);
> >>>>> +    {
> >>>>> +        p2m = d->arch.nested_p2m[i];
> >>>>
> >>>> Please move the declaration here, making this the variable's
> >>>> initializer (unless line length constraints make the latter
> >>>> undesirable).
> >>>
> >>> I really don't get what difference this would make.
> >>
> >> Both choice of (generally) inappropriate types (further up)
> >> and placement of declarations (here) (and of course also
> >> other style violations) can set bad precedents even if in a
> >> specific case it may not matter much. So yes, it may be
> >> good enough here, but it would violate our desire to
> >> - use unsigned types when a variable will hold only non-
> >>   negative values (which in the general case may improve
> >>   generated code in particular on x86-64),
> >> - limit the scopes of variables as much as possible, to
> >>   more easily spot inappropriate uses (like bypassing
> >>   initialization).
> >>
> >> This code here actually demonstrates such a bad precedent,
> >> using plain int for the loop induction variable. While I
> >> can't be any way near sure, there's a certain chance you
> >> actually took it and copied it to
> >> __mem_sharing_unshare_page(). The chance of such happening
> >> is what we'd like to reduce over time.
> >
> > Yes, I copied it from p2m.c. All I meant was that such minor changes
> > are generally speaking not worth a round-trip of sending new patches.
> > I obviously don't care whether this is signed or unsigned. Minor stuff
> > like that could be changed on commit and is not even worth having a
> > discussion about.
>
> I'm sorry, but no. Committing ought to be a purely mechanical
> thing. It is a _courtesy_ of the committer if they offer to
> make adjustments. If us offering this regularly gets taken as
> "expected behavior", I'm afraid I personally would stop making
> such an offer, and instead insist on further round trips.

So you requested changes I consider purely cosmetic, no objections to
any of them. It would be faster if you just made those changes -
literally 2 seconds - instead of insisting on this back and forth. I
really have no idea under what metric this saves *you* time. Now you
have to write emails to point out the issues and review the patch
twice, including the lag time of when the sender has time to do the
changes and resend the patches. I think this process is just bad for
everyone involved. And now you are saying out of principle you would
be insisting on more of this just to prove a point? Yea, that would
certainly solve the problem of commit lag and backlog of reviewing
patches. But it's your call, I really don't care enough to argue any
more about it.

Tamas
Jan Beulich Jan. 7, 2021, 12:56 p.m. UTC | #7
On 07.01.2021 13:43, Tamas K Lengyel wrote:
> On Thu, Jan 7, 2021 at 7:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 06.01.2021 17:26, Tamas K Lengyel wrote:
>>> On Wed, Jan 6, 2021 at 11:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 06.01.2021 16:29, Tamas K Lengyel wrote:
>>>>> On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> On 04.01.2021 18:41, Tamas K Lengyel wrote:
>>>>>>> --- a/xen/arch/x86/mm/p2m.c
>>>>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>>>>> @@ -1598,8 +1598,17 @@ void
>>>>>>>  p2m_flush_nestedp2m(struct domain *d)
>>>>>>>  {
>>>>>>>      int i;
>>>>>>> +    struct p2m_domain *p2m;
>>>>>>> +
>>>>>>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
>>>>>>> -        p2m_flush_table(d->arch.nested_p2m[i]);
>>>>>>> +    {
>>>>>>> +        p2m = d->arch.nested_p2m[i];
>>>>>>
>>>>>> Please move the declaration here, making this the variable's
>>>>>> initializer (unless line length constraints make the latter
>>>>>> undesirable).
>>>>>
>>>>> I really don't get what difference this would make.
>>>>
>>>> Both choice of (generally) inappropriate types (further up)
>>>> and placement of declarations (here) (and of course also
>>>> other style violations) can set bad precedents even if in a
>>>> specific case it may not matter much. So yes, it may be
>>>> good enough here, but it would violate our desire to
>>>> - use unsigned types when a variable will hold only non-
>>>>   negative values (which in the general case may improve
>>>>   generated code in particular on x86-64),
>>>> - limit the scopes of variables as much as possible, to
>>>>   more easily spot inappropriate uses (like bypassing
>>>>   initialization).
>>>>
>>>> This code here actually demonstrates such a bad precedent,
>>>> using plain int for the loop induction variable. While I
>>>> can't be any way near sure, there's a certain chance you
>>>> actually took it and copied it to
>>>> __mem_sharing_unshare_page(). The chance of such happening
>>>> is what we'd like to reduce over time.
>>>
>>> Yes, I copied it from p2m.c. All I meant was that such minor changes
>>> are generally speaking not worth a round-trip of sending new patches.
>>> I obviously don't care whether this is signed or unsigned. Minor stuff
>>> like that could be changed on commit and is not even worth having a
>>> discussion about.
>>
>> I'm sorry, but no. Committing ought to be a purely mechanical
>> thing. It is a _courtesy_ of the committer if they offer to
>> make adjustments. If us offering this regularly gets taken as
>> "expected behavior", I'm afraid I personally would stop making
>> such an offer, and instead insist on further round trips.
> 
> So you requested changes I consider purely cosmetic, no objections to
> any of them. It would be faster if you just made those changes -
> literally 2 seconds - instead of insisting on this back and forth. I
> really have no idea under what metric this saves *you* time. Now you
> have to write emails to point out the issues and review the patch
> twice, including the lag time of when the sender has time to do the
> changes and resend the patches.

For one, I didn't talk about either process saving time, I don't
think. Then I had comments beyond these purely cosmetic ones.
Therefore I didn't think it was justified to offer making the
mechanical adjustments at commit time. Making such an offer will
please remain subject to the individual's judgement, without
having to justify _at all_ when not making such an offer.

As to time savings - even if I had offered making these changes,
I'd still have to give the respective comments. Both for your
awareness (after all I'd be changing your patch, and you might
not like me doing so), and to hopefully have the effect that in
future submissions you'd take care of such aspects yourself
right away (plus same for any possible observers of the thread).

> I think this process is just bad for
> everyone involved. And now you are saying out of principle you would
> be insisting on more of this just to prove a point? Yea, that would
> certainly solve the problem of commit lag and backlog of reviewing
> patches. But it's your call, I really don't care enough to argue any
> more about it.

I have to admit that I find this odd: If there's disagreement,
wouldn't it generally be better to get it resolved?

Jan
Tamas K Lengyel Jan. 7, 2021, 1:27 p.m. UTC | #8
On Thu, Jan 7, 2021 at 7:56 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 07.01.2021 13:43, Tamas K Lengyel wrote:
> > On Thu, Jan 7, 2021 at 7:26 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 06.01.2021 17:26, Tamas K Lengyel wrote:
> >>> On Wed, Jan 6, 2021 at 11:11 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 06.01.2021 16:29, Tamas K Lengyel wrote:
> >>>>> On Wed, Jan 6, 2021 at 7:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>> On 04.01.2021 18:41, Tamas K Lengyel wrote:
> >>>>>>> --- a/xen/arch/x86/mm/p2m.c
> >>>>>>> +++ b/xen/arch/x86/mm/p2m.c
> >>>>>>> @@ -1598,8 +1598,17 @@ void
> >>>>>>>  p2m_flush_nestedp2m(struct domain *d)
> >>>>>>>  {
> >>>>>>>      int i;
> >>>>>>> +    struct p2m_domain *p2m;
> >>>>>>> +
> >>>>>>>      for ( i = 0; i < MAX_NESTEDP2M; i++ )
> >>>>>>> -        p2m_flush_table(d->arch.nested_p2m[i]);
> >>>>>>> +    {
> >>>>>>> +        p2m = d->arch.nested_p2m[i];
> >>>>>>
> >>>>>> Please move the declaration here, making this the variable's
> >>>>>> initializer (unless line length constraints make the latter
> >>>>>> undesirable).
> >>>>>
> >>>>> I really don't get what difference this would make.
> >>>>
> >>>> Both choice of (generally) inappropriate types (further up)
> >>>> and placement of declarations (here) (and of course also
> >>>> other style violations) can set bad precedents even if in a
> >>>> specific case it may not matter much. So yes, it may be
> >>>> good enough here, but it would violate our desire to
> >>>> - use unsigned types when a variable will hold only non-
> >>>>   negative values (which in the general case may improve
> >>>>   generated code in particular on x86-64),
> >>>> - limit the scopes of variables as much as possible, to
> >>>>   more easily spot inappropriate uses (like bypassing
> >>>>   initialization).
> >>>>
> >>>> This code here actually demonstrates such a bad precedent,
> >>>> using plain int for the loop induction variable. While I
> >>>> can't be any way near sure, there's a certain chance you
> >>>> actually took it and copied it to
> >>>> __mem_sharing_unshare_page(). The chance of such happening
> >>>> is what we'd like to reduce over time.
> >>>
> >>> Yes, I copied it from p2m.c. All I meant was that such minor changes
> >>> are generally speaking not worth a round-trip of sending new patches.
> >>> I obviously don't care whether this is signed or unsigned. Minor stuff
> >>> like that could be changed on commit and is not even worth having a
> >>> discussion about.
> >>
> >> I'm sorry, but no. Committing ought to be a purely mechanical
> >> thing. It is a _courtesy_ of the committer if they offer to
> >> make adjustments. If us offering this regularly gets taken as
> >> "expected behavior", I'm afraid I personally would stop making
> >> such an offer, and instead insist on further round trips.
> >
> > So you requested changes I consider purely cosmetic, no objections to
> > any of them. It would be faster if you just made those changes -
> > literally 2 seconds - instead of insisting on this back and forth. I
> > really have no idea under what metric this saves *you* time. Now you
> > have to write emails to point out the issues and review the patch
> > twice, including the lag time of when the sender has time to do the
> > changes and resend the patches.
>
> For one, I didn't talk about either process saving time, I don't
> think. Then I had comments beyond these purely cosmetic ones.
> Therefore I didn't think it was justified to offer making the
> mechanical adjustments at commit time. Making such an offer will
> please remain subject to the individual's judgement, without
> having to justify _at all_ when not making such an offer.
>
> As to time savings - even if I had offered making these changes,
> I'd still have to give the respective comments. Both for your
> awareness (after all I'd be changing your patch, and you might
> not like me doing so), and to hopefully have the effect that in
> future submissions you'd take care of such aspects yourself
> right away (plus same for any possible observers of the thread).
>
> > I think this process is just bad for
> > everyone involved. And now you are saying out of principle you would
> > be insisting on more of this just to prove a point? Yea, that would
> > certainly solve the problem of commit lag and backlog of reviewing
> > patches. But it's your call, I really don't care enough to argue any
> > more about it.
>
> I have to admit that I find this odd: If there's disagreement,
> wouldn't it generally be better to get it resolved?
>

I don't see where the disagreement was, I had no objections to the
changes you requested. I don't like this unnecessary back and forth on
trivia. But v2 is sent. I'm moving on.

Tamas
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4a02c7d6f2..e2f3f50eef 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -39,6 +39,7 @@ 
 #include <asm/event.h>
 #include <asm/hap.h>
 #include <asm/hvm/hvm.h>
+#include <asm/hvm/nestedhvm.h>
 #include <xsm/xsm.h>
 
 #include <public/hvm/params.h>
@@ -893,13 +894,11 @@  static int nominate_page(struct domain *d, gfn_t gfn,
         goto out;
 
     /*
-     * Now that the page is validated, we can lock it. There is no
-     * race because we're holding the p2m entry, so no one else
-     * could be nominating this gfn.
+     * Now that the page is validated, we can make it shared. There is no race
+     * because we're holding the p2m entry, so no one else could be nominating
+     * this gfn & and it is evidently not yet shared with any other VM, thus we
+     * don't need to take the mem_sharing_page_lock here.
      */
-    ret = -ENOENT;
-    if ( !mem_sharing_page_lock(page) )
-        goto out;
 
     /* Initialize the shared state */
     ret = -ENOMEM;
@@ -935,7 +934,6 @@  static int nominate_page(struct domain *d, gfn_t gfn,
 
     *phandle = page->sharing->handle;
     audit_add_list(page);
-    mem_sharing_page_unlock(page);
     ret = 0;
 
 out:
@@ -1214,7 +1212,7 @@  int __mem_sharing_unshare_page(struct domain *d,
     p2m_type_t p2mt;
     mfn_t mfn;
     struct page_info *page, *old_page;
-    int last_gfn;
+    int last_gfn, rc = 0;
     gfn_info_t *gfn_info = NULL;
 
     mfn = get_gfn(d, gfn, &p2mt);
@@ -1226,6 +1224,15 @@  int __mem_sharing_unshare_page(struct domain *d,
         return 0;
     }
 
+    /* lock nested p2ms to avoid lock-order violation */
+    if ( unlikely(nestedhvm_enabled(d)) )
+    {
+        int i;
+
+        for ( i = 0; i < MAX_NESTEDP2M; i++ )
+            p2m_lock(d->arch.nested_p2m[i]);
+    }
+
     page = __grab_shared_page(mfn);
     if ( page == NULL )
     {
@@ -1276,9 +1283,7 @@  int __mem_sharing_unshare_page(struct domain *d,
             put_page_alloc_ref(page);
 
         put_page_and_type(page);
-        put_gfn(d, gfn);
-
-        return 0;
+        goto out;
     }
 
     if ( last_gfn )
@@ -1295,12 +1300,12 @@  int __mem_sharing_unshare_page(struct domain *d,
         /* Undo dec of nr_saved_mfns, as the retry will decrease again. */
         atomic_inc(&nr_saved_mfns);
         mem_sharing_page_unlock(old_page);
-        put_gfn(d, gfn);
         /*
          * Caller is responsible for placing an event
          * in the ring.
          */
-        return -ENOMEM;
+        rc = -ENOMEM;
+        goto out;
     }
 
     copy_domain_page(page_to_mfn(page), page_to_mfn(old_page));
@@ -1327,8 +1332,18 @@  int __mem_sharing_unshare_page(struct domain *d,
      */
     paging_mark_dirty(d, page_to_mfn(page));
     /* We do not need to unlock a private page */
+
+ out:
+    if ( unlikely(nestedhvm_enabled(d)) )
+    {
+        int i;
+
+        for ( i = 0; i < MAX_NESTEDP2M; i++ )
+            p2m_unlock(d->arch.nested_p2m[i]);
+    }
+
     put_gfn(d, gfn);
-    return 0;
+    return rc;
 }
 
 int relinquish_shared_pages(struct domain *d)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ad4bb94514..79a2b6762b 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1598,8 +1598,17 @@  void
 p2m_flush_nestedp2m(struct domain *d)
 {
     int i;
+    struct p2m_domain *p2m;
+
     for ( i = 0; i < MAX_NESTEDP2M; i++ )
-        p2m_flush_table(d->arch.nested_p2m[i]);
+    {
+        p2m = d->arch.nested_p2m[i];
+
+        if ( p2m_locked_by_me(p2m) )
+            p2m_flush_table_locked(p2m);
+        else
+            p2m_flush_table(p2m);
+    }
 }
 
 void np2m_flush_base(struct vcpu *v, unsigned long np2m_base)