[for-4.14] x86/hap: use get_gfn_type in hap_update_paging_modes
diff mbox series

Message ID 6a2ae3bae4a4ad32bc7caecd8af2655a76a9fb19.1592335579.git.tamas.lengyel@intel.com
State New
Headers show
Series
  • [for-4.14] x86/hap: use get_gfn_type in hap_update_paging_modes
Related show

Commit Message

Tamas K Lengyel June 16, 2020, 7:31 p.m. UTC
While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
observed due to a mm-lock order violation while copying the HVM CPU context
from the parent. This issue has been identified to be due to
hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
creates a shared entry in the fork's memory map for the cr3 gfn. The function
later calls hap_update_cr3 while holding the paging_lock, which results in the
lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.

This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
is benign if PAE is not enabled or if EFER_LMA is set and returns before
trying to unshare and map the page.

Using get_gfn_type to get a lock on the gfn avoids this problem as we can
populate the fork's gfn with a copied page instead of a shared entry if its
needed, thus avoiding the lock order violation while holding paging_lock.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
The bug seems to have been present since commit 4cb6c4f4941, only discovered
now due to the heavy use of mem_sharing with VM forks. As this is a simple
bug-fix it would be nice to include it in the 4.14 release.
---
 xen/arch/x86/mm/hap/hap.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Roger Pau Monné June 17, 2020, 8:23 a.m. UTC | #1
On Tue, Jun 16, 2020 at 12:31:06PM -0700, Tamas K Lengyel wrote:
> While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> observed due to a mm-lock order violation while copying the HVM CPU context
> from the parent. This issue has been identified to be due to
> hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> creates a shared entry in the fork's memory map for the cr3 gfn. The function
> later calls hap_update_cr3 while holding the paging_lock, which results in the
> lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> 
> This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> is benign if PAE is not enabled or if EFER_LMA is set and returns before
> trying to unshare and map the page.
> 
> Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> populate the fork's gfn with a copied page instead of a shared entry if its
> needed, thus avoiding the lock order violation while holding paging_lock.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
> The bug seems to have been present since commit 4cb6c4f4941, only discovered
> now due to the heavy use of mem_sharing with VM forks. As this is a simple
> bug-fix it would be nice to include it in the 4.14 release.

I agree it seems like a candidate bugfix to be included. I've added
Paul to the Cc so he is aware.

> ---
>  xen/arch/x86/mm/hap/hap.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index 7f84d0c6ea..9ae4c3ae6e 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
>      struct domain *d = v->domain;
>      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
>      p2m_type_t t;
> +    p2m_query_t q = P2M_ALLOC;
>  
> -    /* We hold onto the cr3 as it may be modified later, and
> -     * we need to respect lock ordering. No need for 
> -     * checks here as they are performed by vmx_load_pdptrs
> -     * (the potential user of the cr3) */
> -    (void)get_gfn(d, cr3_gfn, &t);
> +    /*
> +     * We hold onto the cr3 as it may be modified later, and
> +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> +     * a lock-order violation later while we are holding the paging_lock.
> +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> +     * the cr3).
> +     */
> +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> +        q |= P2M_UNSHARE;
> +
> +    (void)get_gfn_type(d, cr3_gfn, &t, q);

While there I think you can drop the cast to void.

In order for this to be more resilient, maybe it would be better to
just use get_gfn_unshare directly and avoid checking what paging mode
the guest is currently using?

Or would that be too expensive in terms of performance for the not
affected case?

I feel like relying on the internals of vmx_load_pdptrs here is
fragile.

Thanks, Roger.
Jan Beulich June 17, 2020, 9:58 a.m. UTC | #2
On 16.06.2020 21:31, Tamas K Lengyel wrote:
> While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> observed due to a mm-lock order violation while copying the HVM CPU context
> from the parent. This issue has been identified to be due to
> hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> creates a shared entry in the fork's memory map for the cr3 gfn. The function
> later calls hap_update_cr3 while holding the paging_lock, which results in the
> lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> 
> This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> is benign if PAE is not enabled or if EFER_LMA is set and returns before
> trying to unshare and map the page.
> 
> Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> populate the fork's gfn with a copied page instead of a shared entry if its
> needed, thus avoiding the lock order violation while holding paging_lock.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
> The bug seems to have been present since commit 4cb6c4f4941, only discovered
> now due to the heavy use of mem_sharing with VM forks.

I'm unconvinced that commit is the origin of the problem. Before that,
get_gfn_unshare() was used, which aiui had/has similar locking
properties. In fact I'm having trouble seeing how this works at all,
i.e. with or without mem-sharing: p2m_get_page_from_gfn() at the very
least uses p2m_read_lock(), which also doesn't nest inside the paging
lock. What am I overlooking?

> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
>      struct domain *d = v->domain;
>      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
>      p2m_type_t t;
> +    p2m_query_t q = P2M_ALLOC;
>  
> -    /* We hold onto the cr3 as it may be modified later, and
> -     * we need to respect lock ordering. No need for 
> -     * checks here as they are performed by vmx_load_pdptrs
> -     * (the potential user of the cr3) */
> -    (void)get_gfn(d, cr3_gfn, &t);
> +    /*
> +     * We hold onto the cr3 as it may be modified later, and
> +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> +     * a lock-order violation later while we are holding the paging_lock.
> +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> +     * the cr3).
> +     */

Nit: Please re-flow such that the first line isn't very noticeably
shorter than the later ones.

> +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> +        q |= P2M_UNSHARE;
> +
> +    (void)get_gfn_type(d, cr3_gfn, &t, q);

Imo this then wants to be accompanied by dropping the unsharing
attempt from vmx_load_pdptrs(). By using get_gfn_query_unlocked()
there (assuming all code paths hold the paging lock) the lock
order issue could be addressed in full then. (If taking this
route, the comment ahead of get_gfn_query_unlocked()'s declaration
will want adjusting, to drop mentioning shadow mode explicitly as
leveraging already holding the paging lock.)

If there are code paths of both kinds, which approach to use in
vmx_load_pdptrs() may need to be chosen based on what
paging_locked_by_me() returns. Or perhaps an unlocked query is
fine in either case?

Jan
Tamas K Lengyel June 17, 2020, 12:49 p.m. UTC | #3
On Wed, Jun 17, 2020 at 2:25 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Jun 16, 2020 at 12:31:06PM -0700, Tamas K Lengyel wrote:
> > While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> > observed due to a mm-lock order violation while copying the HVM CPU context
> > from the parent. This issue has been identified to be due to
> > hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> > creates a shared entry in the fork's memory map for the cr3 gfn. The function
> > later calls hap_update_cr3 while holding the paging_lock, which results in the
> > lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> >
> > This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> > is benign if PAE is not enabled or if EFER_LMA is set and returns before
> > trying to unshare and map the page.
> >
> > Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> > populate the fork's gfn with a copied page instead of a shared entry if its
> > needed, thus avoiding the lock order violation while holding paging_lock.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> > The bug seems to have been present since commit 4cb6c4f4941, only discovered
> > now due to the heavy use of mem_sharing with VM forks. As this is a simple
> > bug-fix it would be nice to include it in the 4.14 release.
>
> I agree it seems like a candidate bugfix to be included. I've added
> Paul to the Cc so he is aware.
>
> > ---
> >  xen/arch/x86/mm/hap/hap.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> > index 7f84d0c6ea..9ae4c3ae6e 100644
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
> >      struct domain *d = v->domain;
> >      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
> >      p2m_type_t t;
> > +    p2m_query_t q = P2M_ALLOC;
> >
> > -    /* We hold onto the cr3 as it may be modified later, and
> > -     * we need to respect lock ordering. No need for
> > -     * checks here as they are performed by vmx_load_pdptrs
> > -     * (the potential user of the cr3) */
> > -    (void)get_gfn(d, cr3_gfn, &t);
> > +    /*
> > +     * We hold onto the cr3 as it may be modified later, and
> > +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> > +     * a lock-order violation later while we are holding the paging_lock.
> > +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> > +     * the cr3).
> > +     */
> > +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> > +        q |= P2M_UNSHARE;
> > +
> > +    (void)get_gfn_type(d, cr3_gfn, &t, q);
>
> While there I think you can drop the cast to void.

Sure.

>
> In order for this to be more resilient, maybe it would be better to
> just use get_gfn_unshare directly and avoid checking what paging mode
> the guest is currently using?
>
> Or would that be too expensive in terms of performance for the not
> affected case?

That's what I originally considered sending in but yes, in the fuzzing
case it would mean a full-page copy for each iteration even on
unaffected cases instead of a one-time shared entry setup. That would
be a considerable waste.

>
> I feel like relying on the internals of vmx_load_pdptrs here is
> fragile.

I agree it's not ideal.

Tamas
Tamas K Lengyel June 17, 2020, 1 p.m. UTC | #4
On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.06.2020 21:31, Tamas K Lengyel wrote:
> > While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> > observed due to a mm-lock order violation while copying the HVM CPU context
> > from the parent. This issue has been identified to be due to
> > hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> > creates a shared entry in the fork's memory map for the cr3 gfn. The function
> > later calls hap_update_cr3 while holding the paging_lock, which results in the
> > lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> >
> > This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> > is benign if PAE is not enabled or if EFER_LMA is set and returns before
> > trying to unshare and map the page.
> >
> > Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> > populate the fork's gfn with a copied page instead of a shared entry if its
> > needed, thus avoiding the lock order violation while holding paging_lock.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> > The bug seems to have been present since commit 4cb6c4f4941, only discovered
> > now due to the heavy use of mem_sharing with VM forks.
>
> I'm unconvinced that commit is the origin of the problem. Before that,
> get_gfn_unshare() was used, which aiui had/has similar locking
> properties. In fact I'm having trouble seeing how this works at all,
> i.e. with or without mem-sharing: p2m_get_page_from_gfn() at the very
> least uses p2m_read_lock(), which also doesn't nest inside the paging
> lock. What am I overlooking?

I haven't investigated past what git blame showed, that's why I said
"seems to have been present since". Entirely possible it was broken
before as well due to the lock ordering. I'm not sure about
p2m_get_page_from_gfn, haven't ran into an issue there (yet).

>
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
> >      struct domain *d = v->domain;
> >      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
> >      p2m_type_t t;
> > +    p2m_query_t q = P2M_ALLOC;
> >
> > -    /* We hold onto the cr3 as it may be modified later, and
> > -     * we need to respect lock ordering. No need for
> > -     * checks here as they are performed by vmx_load_pdptrs
> > -     * (the potential user of the cr3) */
> > -    (void)get_gfn(d, cr3_gfn, &t);
> > +    /*
> > +     * We hold onto the cr3 as it may be modified later, and
> > +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> > +     * a lock-order violation later while we are holding the paging_lock.
> > +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> > +     * the cr3).
> > +     */
>
> Nit: Please re-flow such that the first line isn't very noticeably
> shorter than the later ones.

Sure.

>
> > +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> > +        q |= P2M_UNSHARE;
> > +
> > +    (void)get_gfn_type(d, cr3_gfn, &t, q);
>
> Imo this then wants to be accompanied by dropping the unsharing
> attempt from vmx_load_pdptrs(). By using get_gfn_query_unlocked()
> there (assuming all code paths hold the paging lock) the lock
> order issue could be addressed in full then. (If taking this
> route, the comment ahead of get_gfn_query_unlocked()'s declaration
> will want adjusting, to drop mentioning shadow mode explicitly as
> leveraging already holding the paging lock.)

There is at least another path to vmx_load_pdptrs from
arch_set_info_hvm_guest as well which doesn't hold the paging lock.

> If there are code paths of both kinds, which approach to use in
> vmx_load_pdptrs() may need to be chosen based on what
> paging_locked_by_me() returns. Or perhaps an unlocked query is
> fine in either case?

Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
fine. But at that point what is the reason for having the lock
ordering at all? Why not just have a single recursive lock and avoid
issues like this altogether?

Tamas
Jan Beulich June 17, 2020, 1:04 p.m. UTC | #5
On 17.06.2020 15:00, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>> If there are code paths of both kinds, which approach to use in
>> vmx_load_pdptrs() may need to be chosen based on what
>> paging_locked_by_me() returns. Or perhaps an unlocked query is
>> fine in either case?
> 
> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> fine. But at that point what is the reason for having the lock
> ordering at all? Why not just have a single recursive lock and avoid
> issues like this altogether?

With just a single lock, contention problems we already know we
have would be even worse. When the current locking model was
introduced, there was actually a plan to make gfn_lock() more
fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
example.

Jan
Tamas K Lengyel June 17, 2020, 1:21 p.m. UTC | #6
On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> If there are code paths of both kinds, which approach to use in
> >> vmx_load_pdptrs() may need to be chosen based on what
> >> paging_locked_by_me() returns. Or perhaps an unlocked query is
> >> fine in either case?
> >
> > Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> > fine. But at that point what is the reason for having the lock
> > ordering at all? Why not just have a single recursive lock and avoid
> > issues like this altogether?
>
> With just a single lock, contention problems we already know we
> have would be even worse. When the current locking model was
> introduced, there was actually a plan to make gfn_lock() more
> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> example.

Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
unlocked query doesn't seem as straightforward because, well, there is
no unlocked version of p2m_get_page_from_gfn which would also do the
"fixups". What seems redundant to me though is that
hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
paging_lock. Does it really need to take the paging_lock? If it only
held the gfn_lock/p2m_lock then we would have the lock order violation
down the road.

Tamas
Jan Beulich June 17, 2020, 1:28 p.m. UTC | #7
On 17.06.2020 15:21, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> If there are code paths of both kinds, which approach to use in
>>>> vmx_load_pdptrs() may need to be chosen based on what
>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
>>>> fine in either case?
>>>
>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
>>> fine. But at that point what is the reason for having the lock
>>> ordering at all? Why not just have a single recursive lock and avoid
>>> issues like this altogether?
>>
>> With just a single lock, contention problems we already know we
>> have would be even worse. When the current locking model was
>> introduced, there was actually a plan to make gfn_lock() more
>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
>> example.
> 
> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> unlocked query doesn't seem as straightforward because, well, there is
> no unlocked version of p2m_get_page_from_gfn which would also do the
> "fixups".

Which fixups do we need here, in particular? Of course, whenever
any fixups get done, the operation can't be lock-less.

> What seems redundant to me though is that
> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> paging_lock. Does it really need to take the paging_lock?

From mm-locks.h's comments:

 * For HAP, it protects the NPT/EPT tables and mode changes.

Jan
Tamas K Lengyel June 17, 2020, 1:31 p.m. UTC | #8
On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.06.2020 15:21, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> >>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> If there are code paths of both kinds, which approach to use in
> >>>> vmx_load_pdptrs() may need to be chosen based on what
> >>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
> >>>> fine in either case?
> >>>
> >>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> >>> fine. But at that point what is the reason for having the lock
> >>> ordering at all? Why not just have a single recursive lock and avoid
> >>> issues like this altogether?
> >>
> >> With just a single lock, contention problems we already know we
> >> have would be even worse. When the current locking model was
> >> introduced, there was actually a plan to make gfn_lock() more
> >> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> >> example.
> >
> > Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> > unlocked query doesn't seem as straightforward because, well, there is
> > no unlocked version of p2m_get_page_from_gfn which would also do the
> > "fixups".
>
> Which fixups do we need here, in particular? Of course, whenever
> any fixups get done, the operation can't be lock-less.
>
> > What seems redundant to me though is that
> > hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> > paging_lock. Does it really need to take the paging_lock?
>
> From mm-locks.h's comments:
>
>  * For HAP, it protects the NPT/EPT tables and mode changes.

We do the population of the EPT as part of fork_page() if there was a
hole in the p2m when the query was issued using P2M_ALLOC (or
P2M_UNSHARE). I checked and without the paging lock held it throws up
at hap_alloc's ASSERT.. So yea, currently I don't think we have a
better route then what I currently sent in. Perhaps the
"hvm_pae_enabled(v) && !hvm_long_mode_active(v)" can be moved into
hvm.h and be used by vmx_load_pdptrs as well, making it less fragile
in case there is an adjustment to it in the future.

Tamas
Jan Beulich June 17, 2020, 1:36 p.m. UTC | #9
On 17.06.2020 15:31, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> If there are code paths of both kinds, which approach to use in
>>>>>> vmx_load_pdptrs() may need to be chosen based on what
>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
>>>>>> fine in either case?
>>>>>
>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
>>>>> fine. But at that point what is the reason for having the lock
>>>>> ordering at all? Why not just have a single recursive lock and avoid
>>>>> issues like this altogether?
>>>>
>>>> With just a single lock, contention problems we already know we
>>>> have would be even worse. When the current locking model was
>>>> introduced, there was actually a plan to make gfn_lock() more
>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
>>>> example.
>>>
>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
>>> unlocked query doesn't seem as straightforward because, well, there is
>>> no unlocked version of p2m_get_page_from_gfn which would also do the
>>> "fixups".
>>
>> Which fixups do we need here, in particular? Of course, whenever
>> any fixups get done, the operation can't be lock-less.
>>
>>> What seems redundant to me though is that
>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
>>> paging_lock. Does it really need to take the paging_lock?
>>
>> From mm-locks.h's comments:
>>
>>  * For HAP, it protects the NPT/EPT tables and mode changes.
> 
> We do the population of the EPT as part of fork_page() if there was a
> hole in the p2m when the query was issued using P2M_ALLOC (or
> P2M_UNSHARE). I checked and without the paging lock held it throws up
> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
> better route then what I currently sent in.

You didn't answer my question regarding the "fixups" needed, so
for the moment it's not clear to me yet whether indeed there's
no better way.

Jan
Tamas K Lengyel June 17, 2020, 1:43 p.m. UTC | #10
On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.06.2020 15:31, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.06.2020 15:21, Tamas K Lengyel wrote:
> >>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> >>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>> If there are code paths of both kinds, which approach to use in
> >>>>>> vmx_load_pdptrs() may need to be chosen based on what
> >>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
> >>>>>> fine in either case?
> >>>>>
> >>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> >>>>> fine. But at that point what is the reason for having the lock
> >>>>> ordering at all? Why not just have a single recursive lock and avoid
> >>>>> issues like this altogether?
> >>>>
> >>>> With just a single lock, contention problems we already know we
> >>>> have would be even worse. When the current locking model was
> >>>> introduced, there was actually a plan to make gfn_lock() more
> >>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> >>>> example.
> >>>
> >>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> >>> unlocked query doesn't seem as straightforward because, well, there is
> >>> no unlocked version of p2m_get_page_from_gfn which would also do the
> >>> "fixups".
> >>
> >> Which fixups do we need here, in particular? Of course, whenever
> >> any fixups get done, the operation can't be lock-less.
> >>
> >>> What seems redundant to me though is that
> >>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> >>> paging_lock. Does it really need to take the paging_lock?
> >>
> >> From mm-locks.h's comments:
> >>
> >>  * For HAP, it protects the NPT/EPT tables and mode changes.
> >
> > We do the population of the EPT as part of fork_page() if there was a
> > hole in the p2m when the query was issued using P2M_ALLOC (or
> > P2M_UNSHARE). I checked and without the paging lock held it throws up
> > at hap_alloc's ASSERT.. So yea, currently I don't think we have a
> > better route then what I currently sent in.
>
> You didn't answer my question regarding the "fixups" needed, so
> for the moment it's not clear to me yet whether indeed there's
> no better way.

Umm, I did. The fixups entail populating the EPT from the parent as I
described above.

Tamas
Jan Beulich June 17, 2020, 2:24 p.m. UTC | #11
On 17.06.2020 15:43, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.06.2020 15:31, Tamas K Lengyel wrote:
>>> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
>>>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
>>>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>> If there are code paths of both kinds, which approach to use in
>>>>>>>> vmx_load_pdptrs() may need to be chosen based on what
>>>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
>>>>>>>> fine in either case?
>>>>>>>
>>>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
>>>>>>> fine. But at that point what is the reason for having the lock
>>>>>>> ordering at all? Why not just have a single recursive lock and avoid
>>>>>>> issues like this altogether?
>>>>>>
>>>>>> With just a single lock, contention problems we already know we
>>>>>> have would be even worse. When the current locking model was
>>>>>> introduced, there was actually a plan to make gfn_lock() more
>>>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
>>>>>> example.
>>>>>
>>>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
>>>>> unlocked query doesn't seem as straightforward because, well, there is
>>>>> no unlocked version of p2m_get_page_from_gfn which would also do the
>>>>> "fixups".
>>>>
>>>> Which fixups do we need here, in particular? Of course, whenever
>>>> any fixups get done, the operation can't be lock-less.
>>>>
>>>>> What seems redundant to me though is that
>>>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
>>>>> paging_lock. Does it really need to take the paging_lock?
>>>>
>>>> From mm-locks.h's comments:
>>>>
>>>>  * For HAP, it protects the NPT/EPT tables and mode changes.
>>>
>>> We do the population of the EPT as part of fork_page() if there was a
>>> hole in the p2m when the query was issued using P2M_ALLOC (or
>>> P2M_UNSHARE). I checked and without the paging lock held it throws up
>>> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
>>> better route then what I currently sent in.
>>
>> You didn't answer my question regarding the "fixups" needed, so
>> for the moment it's not clear to me yet whether indeed there's
>> no better way.
> 
> Umm, I did. The fixups entail populating the EPT from the parent as I
> described above.

Isn't this taken care of by the new call to get_gfn_type() which you add?
As said before, I think at the point we want to obtain the PDPTs all
other adjustments and arrangements should have been done already, by
higher layers. This code should have no need to do anything beyond a
simple lookup.

Jan
Roger Pau Monné June 17, 2020, 2:29 p.m. UTC | #12
On Wed, Jun 17, 2020 at 06:49:16AM -0600, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 2:25 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 12:31:06PM -0700, Tamas K Lengyel wrote:
> > > While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> > > observed due to a mm-lock order violation while copying the HVM CPU context
> > > from the parent. This issue has been identified to be due to
> > > hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> > > creates a shared entry in the fork's memory map for the cr3 gfn. The function
> > > later calls hap_update_cr3 while holding the paging_lock, which results in the
> > > lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> > >
> > > This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> > > is benign if PAE is not enabled or if EFER_LMA is set and returns before
> > > trying to unshare and map the page.
> > >
> > > Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> > > populate the fork's gfn with a copied page instead of a shared entry if its
> > > needed, thus avoiding the lock order violation while holding paging_lock.
> > >
> > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > > ---
> > > The bug seems to have been present since commit 4cb6c4f4941, only discovered
> > > now due to the heavy use of mem_sharing with VM forks. As this is a simple
> > > bug-fix it would be nice to include it in the 4.14 release.
> >
> > I agree it seems like a candidate bugfix to be included. I've added
> > Paul to the Cc so he is aware.
> >
> > > ---
> > >  xen/arch/x86/mm/hap/hap.c | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> > > index 7f84d0c6ea..9ae4c3ae6e 100644
> > > --- a/xen/arch/x86/mm/hap/hap.c
> > > +++ b/xen/arch/x86/mm/hap/hap.c
> > > @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
> > >      struct domain *d = v->domain;
> > >      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
> > >      p2m_type_t t;
> > > +    p2m_query_t q = P2M_ALLOC;
> > >
> > > -    /* We hold onto the cr3 as it may be modified later, and
> > > -     * we need to respect lock ordering. No need for
> > > -     * checks here as they are performed by vmx_load_pdptrs
> > > -     * (the potential user of the cr3) */
> > > -    (void)get_gfn(d, cr3_gfn, &t);
> > > +    /*
> > > +     * We hold onto the cr3 as it may be modified later, and
> > > +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> > > +     * a lock-order violation later while we are holding the paging_lock.
> > > +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> > > +     * the cr3).
> > > +     */
> > > +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> > > +        q |= P2M_UNSHARE;
> > > +
> > > +    (void)get_gfn_type(d, cr3_gfn, &t, q);
> >
> > While there I think you can drop the cast to void.
> 
> Sure.
> 
> >
> > In order for this to be more resilient, maybe it would be better to
> > just use get_gfn_unshare directly and avoid checking what paging mode
> > the guest is currently using?
> >
> > Or would that be too expensive in terms of performance for the not
> > affected case?
> 
> That's what I originally considered sending in but yes, in the fuzzing
> case it would mean a full-page copy for each iteration even on
> unaffected cases instead of a one-time shared entry setup. That would
> be a considerable waste.

Right, I'm afraid I don't really like the implementation details of
vmx_load_pdptrs leaking into hap_update_paging_modes which is a
generic function, so IMO a cleaner solution would be to always use
P2M_UNSHARE.

Thanks, Roger.
Tamas K Lengyel June 17, 2020, 2:45 p.m. UTC | #13
On Wed, Jun 17, 2020 at 8:29 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jun 17, 2020 at 06:49:16AM -0600, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 2:25 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Tue, Jun 16, 2020 at 12:31:06PM -0700, Tamas K Lengyel wrote:
> > > > While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> > > > observed due to a mm-lock order violation while copying the HVM CPU context
> > > > from the parent. This issue has been identified to be due to
> > > > hap_update_paging_modes getting a lock on the gfn using get_gfn. This call also
> > > > creates a shared entry in the fork's memory map for the cr3 gfn. The function
> > > > later calls hap_update_cr3 while holding the paging_lock, which results in the
> > > > lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry.
> > > >
> > > > This issue has not affected VMs running other OS's as a call to vmx_load_pdptrs
> > > > is benign if PAE is not enabled or if EFER_LMA is set and returns before
> > > > trying to unshare and map the page.
> > > >
> > > > Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> > > > populate the fork's gfn with a copied page instead of a shared entry if its
> > > > needed, thus avoiding the lock order violation while holding paging_lock.
> > > >
> > > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > > > ---
> > > > The bug seems to have been present since commit 4cb6c4f4941, only discovered
> > > > now due to the heavy use of mem_sharing with VM forks. As this is a simple
> > > > bug-fix it would be nice to include it in the 4.14 release.
> > >
> > > I agree it seems like a candidate bugfix to be included. I've added
> > > Paul to the Cc so he is aware.
> > >
> > > > ---
> > > >  xen/arch/x86/mm/hap/hap.c | 17 ++++++++++++-----
> > > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> > > > index 7f84d0c6ea..9ae4c3ae6e 100644
> > > > --- a/xen/arch/x86/mm/hap/hap.c
> > > > +++ b/xen/arch/x86/mm/hap/hap.c
> > > > @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
> > > >      struct domain *d = v->domain;
> > > >      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
> > > >      p2m_type_t t;
> > > > +    p2m_query_t q = P2M_ALLOC;
> > > >
> > > > -    /* We hold onto the cr3 as it may be modified later, and
> > > > -     * we need to respect lock ordering. No need for
> > > > -     * checks here as they are performed by vmx_load_pdptrs
> > > > -     * (the potential user of the cr3) */
> > > > -    (void)get_gfn(d, cr3_gfn, &t);
> > > > +    /*
> > > > +     * We hold onto the cr3 as it may be modified later, and
> > > > +     * we need to respect lock ordering. Unshare here if we have to as to avoid
> > > > +     * a lock-order violation later while we are holding the paging_lock.
> > > > +     * Further checks are performed by vmx_load_pdptrs (the potential user of
> > > > +     * the cr3).
> > > > +     */
> > > > +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> > > > +        q |= P2M_UNSHARE;
> > > > +
> > > > +    (void)get_gfn_type(d, cr3_gfn, &t, q);
> > >
> > > While there I think you can drop the cast to void.
> >
> > Sure.
> >
> > >
> > > In order for this to be more resilient, maybe it would be better to
> > > just use get_gfn_unshare directly and avoid checking what paging mode
> > > the guest is currently using?
> > >
> > > Or would that be too expensive in terms of performance for the not
> > > affected case?
> >
> > That's what I originally considered sending in but yes, in the fuzzing
> > case it would mean a full-page copy for each iteration even on
> > unaffected cases instead of a one-time shared entry setup. That would
> > be a considerable waste.
>
> Right, I'm afraid I don't really like the implementation details of
> vmx_load_pdptrs leaking into hap_update_paging_modes which is a
> generic function, so IMO a cleaner solution would be to always use
> P2M_UNSHARE.

That's not an acceptable route due to the overhead it brings to
unaffected use-cases. I find this reasoning for a purely style
preference we are willing to sacrifice performance. In that case I'm
going to stop sending fixes upstream and simply carry patches
out-of-tree.

Tamas
Tamas K Lengyel June 17, 2020, 2:49 p.m. UTC | #14
On Wed, Jun 17, 2020 at 8:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.06.2020 15:43, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.06.2020 15:31, Tamas K Lengyel wrote:
> >>> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
> >>>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> >>>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>> If there are code paths of both kinds, which approach to use in
> >>>>>>>> vmx_load_pdptrs() may need to be chosen based on what
> >>>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
> >>>>>>>> fine in either case?
> >>>>>>>
> >>>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> >>>>>>> fine. But at that point what is the reason for having the lock
> >>>>>>> ordering at all? Why not just have a single recursive lock and avoid
> >>>>>>> issues like this altogether?
> >>>>>>
> >>>>>> With just a single lock, contention problems we already know we
> >>>>>> have would be even worse. When the current locking model was
> >>>>>> introduced, there was actually a plan to make gfn_lock() more
> >>>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> >>>>>> example.
> >>>>>
> >>>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> >>>>> unlocked query doesn't seem as straightforward because, well, there is
> >>>>> no unlocked version of p2m_get_page_from_gfn which would also do the
> >>>>> "fixups".
> >>>>
> >>>> Which fixups do we need here, in particular? Of course, whenever
> >>>> any fixups get done, the operation can't be lock-less.
> >>>>
> >>>>> What seems redundant to me though is that
> >>>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> >>>>> paging_lock. Does it really need to take the paging_lock?
> >>>>
> >>>> From mm-locks.h's comments:
> >>>>
> >>>>  * For HAP, it protects the NPT/EPT tables and mode changes.
> >>>
> >>> We do the population of the EPT as part of fork_page() if there was a
> >>> hole in the p2m when the query was issued using P2M_ALLOC (or
> >>> P2M_UNSHARE). I checked and without the paging lock held it throws up
> >>> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
> >>> better route then what I currently sent in.
> >>
> >> You didn't answer my question regarding the "fixups" needed, so
> >> for the moment it's not clear to me yet whether indeed there's
> >> no better way.
> >
> > Umm, I did. The fixups entail populating the EPT from the parent as I
> > described above.
>
> Isn't this taken care of by the new call to get_gfn_type() which you add?
> As said before, I think at the point we want to obtain the PDPTs all
> other adjustments and arrangements should have been done already, by
> higher layers. This code should have no need to do anything beyond a
> simple lookup.

I don't really know what else to say. There are multiple paths leading
to vmx_load_pdptrs, some take the paging_lock while some don't. In
this particular case we can do the fixups earlier as I do in this
patch because there happens to be a lookup before the paging_lock is
taken but in other cases there isn't such a route so removing
P2M_UNSHARE from vmx_load_pdptrs is not an option.

Tamas
Jan Beulich June 17, 2020, 3:46 p.m. UTC | #15
On 17.06.2020 16:49, Tamas K Lengyel wrote:
> On Wed, Jun 17, 2020 at 8:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 17.06.2020 15:43, Tamas K Lengyel wrote:
>>> On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 17.06.2020 15:31, Tamas K Lengyel wrote:
>>>>> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
>>>>>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
>>>>>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>> If there are code paths of both kinds, which approach to use in
>>>>>>>>>> vmx_load_pdptrs() may need to be chosen based on what
>>>>>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
>>>>>>>>>> fine in either case?
>>>>>>>>>
>>>>>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
>>>>>>>>> fine. But at that point what is the reason for having the lock
>>>>>>>>> ordering at all? Why not just have a single recursive lock and avoid
>>>>>>>>> issues like this altogether?
>>>>>>>>
>>>>>>>> With just a single lock, contention problems we already know we
>>>>>>>> have would be even worse. When the current locking model was
>>>>>>>> introduced, there was actually a plan to make gfn_lock() more
>>>>>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
>>>>>>>> example.
>>>>>>>
>>>>>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
>>>>>>> unlocked query doesn't seem as straightforward because, well, there is
>>>>>>> no unlocked version of p2m_get_page_from_gfn which would also do the
>>>>>>> "fixups".
>>>>>>
>>>>>> Which fixups do we need here, in particular? Of course, whenever
>>>>>> any fixups get done, the operation can't be lock-less.
>>>>>>
>>>>>>> What seems redundant to me though is that
>>>>>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
>>>>>>> paging_lock. Does it really need to take the paging_lock?
>>>>>>
>>>>>> From mm-locks.h's comments:
>>>>>>
>>>>>>  * For HAP, it protects the NPT/EPT tables and mode changes.
>>>>>
>>>>> We do the population of the EPT as part of fork_page() if there was a
>>>>> hole in the p2m when the query was issued using P2M_ALLOC (or
>>>>> P2M_UNSHARE). I checked and without the paging lock held it throws up
>>>>> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
>>>>> better route then what I currently sent in.
>>>>
>>>> You didn't answer my question regarding the "fixups" needed, so
>>>> for the moment it's not clear to me yet whether indeed there's
>>>> no better way.
>>>
>>> Umm, I did. The fixups entail populating the EPT from the parent as I
>>> described above.
>>
>> Isn't this taken care of by the new call to get_gfn_type() which you add?
>> As said before, I think at the point we want to obtain the PDPTs all
>> other adjustments and arrangements should have been done already, by
>> higher layers. This code should have no need to do anything beyond a
>> simple lookup.
> 
> I don't really know what else to say. There are multiple paths leading
> to vmx_load_pdptrs, some take the paging_lock while some don't. In
> this particular case we can do the fixups earlier as I do in this
> patch because there happens to be a lookup before the paging_lock is
> taken but in other cases there isn't such a route so removing
> P2M_UNSHARE from vmx_load_pdptrs is not an option.

I disagree (because such missing unshare requests could be put
elsewhere), but let me ask another question then: Why is it that
vmx_load_pdptrs() needs to unshare? The function only reads from
the page. Even if the page's content changed later on, we wouldn't
care, as there's no coherence of the PDPTRs once loaded.

Jan
Tamas K Lengyel June 17, 2020, 3:54 p.m. UTC | #16
On Wed, Jun 17, 2020 at 9:46 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.06.2020 16:49, Tamas K Lengyel wrote:
> > On Wed, Jun 17, 2020 at 8:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 17.06.2020 15:43, Tamas K Lengyel wrote:
> >>> On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 17.06.2020 15:31, Tamas K Lengyel wrote:
> >>>>> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
> >>>>>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> >>>>>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>> If there are code paths of both kinds, which approach to use in
> >>>>>>>>>> vmx_load_pdptrs() may need to be chosen based on what
> >>>>>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
> >>>>>>>>>> fine in either case?
> >>>>>>>>>
> >>>>>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> >>>>>>>>> fine. But at that point what is the reason for having the lock
> >>>>>>>>> ordering at all? Why not just have a single recursive lock and avoid
> >>>>>>>>> issues like this altogether?
> >>>>>>>>
> >>>>>>>> With just a single lock, contention problems we already know we
> >>>>>>>> have would be even worse. When the current locking model was
> >>>>>>>> introduced, there was actually a plan to make gfn_lock() more
> >>>>>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> >>>>>>>> example.
> >>>>>>>
> >>>>>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> >>>>>>> unlocked query doesn't seem as straightforward because, well, there is
> >>>>>>> no unlocked version of p2m_get_page_from_gfn which would also do the
> >>>>>>> "fixups".
> >>>>>>
> >>>>>> Which fixups do we need here, in particular? Of course, whenever
> >>>>>> any fixups get done, the operation can't be lock-less.
> >>>>>>
> >>>>>>> What seems redundant to me though is that
> >>>>>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> >>>>>>> paging_lock. Does it really need to take the paging_lock?
> >>>>>>
> >>>>>> From mm-locks.h's comments:
> >>>>>>
> >>>>>>  * For HAP, it protects the NPT/EPT tables and mode changes.
> >>>>>
> >>>>> We do the population of the EPT as part of fork_page() if there was a
> >>>>> hole in the p2m when the query was issued using P2M_ALLOC (or
> >>>>> P2M_UNSHARE). I checked and without the paging lock held it throws up
> >>>>> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
> >>>>> better route then what I currently sent in.
> >>>>
> >>>> You didn't answer my question regarding the "fixups" needed, so
> >>>> for the moment it's not clear to me yet whether indeed there's
> >>>> no better way.
> >>>
> >>> Umm, I did. The fixups entail populating the EPT from the parent as I
> >>> described above.
> >>
> >> Isn't this taken care of by the new call to get_gfn_type() which you add?
> >> As said before, I think at the point we want to obtain the PDPTs all
> >> other adjustments and arrangements should have been done already, by
> >> higher layers. This code should have no need to do anything beyond a
> >> simple lookup.
> >
> > I don't really know what else to say. There are multiple paths leading
> > to vmx_load_pdptrs, some take the paging_lock while some don't. In
> > this particular case we can do the fixups earlier as I do in this
> > patch because there happens to be a lookup before the paging_lock is
> > taken but in other cases there isn't such a route so removing
> > P2M_UNSHARE from vmx_load_pdptrs is not an option.
>
> I disagree (because such missing unshare requests could be put
> elsewhere), but let me ask another question then: Why is it that
> vmx_load_pdptrs() needs to unshare? The function only reads from
> the page. Even if the page's content changed later on, we wouldn't
> care, as there's no coherence of the PDPTRs once loaded.

Ah, I see what you mean. It really just looks like it reads from the
page so P2M_ALLOC would suffice. Let me verify.

Tamas
Tamas K Lengyel June 17, 2020, 3:59 p.m. UTC | #17
On Wed, Jun 17, 2020 at 9:54 AM Tamas K Lengyel
<tamas.k.lengyel@gmail.com> wrote:
>
> On Wed, Jun 17, 2020 at 9:46 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 17.06.2020 16:49, Tamas K Lengyel wrote:
> > > On Wed, Jun 17, 2020 at 8:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>
> > >> On 17.06.2020 15:43, Tamas K Lengyel wrote:
> > >>> On Wed, Jun 17, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>
> > >>>> On 17.06.2020 15:31, Tamas K Lengyel wrote:
> > >>>>> On Wed, Jun 17, 2020 at 7:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>>>
> > >>>>>> On 17.06.2020 15:21, Tamas K Lengyel wrote:
> > >>>>>>> On Wed, Jun 17, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>>>>>
> > >>>>>>>> On 17.06.2020 15:00, Tamas K Lengyel wrote:
> > >>>>>>>>> On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>>>>>>> If there are code paths of both kinds, which approach to use in
> > >>>>>>>>>> vmx_load_pdptrs() may need to be chosen based on what
> > >>>>>>>>>> paging_locked_by_me() returns. Or perhaps an unlocked query is
> > >>>>>>>>>> fine in either case?
> > >>>>>>>>>
> > >>>>>>>>> Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
> > >>>>>>>>> fine. But at that point what is the reason for having the lock
> > >>>>>>>>> ordering at all? Why not just have a single recursive lock and avoid
> > >>>>>>>>> issues like this altogether?
> > >>>>>>>>
> > >>>>>>>> With just a single lock, contention problems we already know we
> > >>>>>>>> have would be even worse. When the current locking model was
> > >>>>>>>> introduced, there was actually a plan to make gfn_lock() more
> > >>>>>>>> fine-grained (i.e. not simply "de-generate" to p2m_lock()), for
> > >>>>>>>> example.
> > >>>>>>>
> > >>>>>>> Sigh. Well, I've been checking and adjust vmx_load_pdptrs to use an
> > >>>>>>> unlocked query doesn't seem as straightforward because, well, there is
> > >>>>>>> no unlocked version of p2m_get_page_from_gfn which would also do the
> > >>>>>>> "fixups".
> > >>>>>>
> > >>>>>> Which fixups do we need here, in particular? Of course, whenever
> > >>>>>> any fixups get done, the operation can't be lock-less.
> > >>>>>>
> > >>>>>>> What seems redundant to me though is that
> > >>>>>>> hap_update_paging_modes takes both the p2m_lock via get_gfn PLUS the
> > >>>>>>> paging_lock. Does it really need to take the paging_lock?
> > >>>>>>
> > >>>>>> From mm-locks.h's comments:
> > >>>>>>
> > >>>>>>  * For HAP, it protects the NPT/EPT tables and mode changes.
> > >>>>>
> > >>>>> We do the population of the EPT as part of fork_page() if there was a
> > >>>>> hole in the p2m when the query was issued using P2M_ALLOC (or
> > >>>>> P2M_UNSHARE). I checked and without the paging lock held it throws up
> > >>>>> at hap_alloc's ASSERT.. So yea, currently I don't think we have a
> > >>>>> better route then what I currently sent in.
> > >>>>
> > >>>> You didn't answer my question regarding the "fixups" needed, so
> > >>>> for the moment it's not clear to me yet whether indeed there's
> > >>>> no better way.
> > >>>
> > >>> Umm, I did. The fixups entail populating the EPT from the parent as I
> > >>> described above.
> > >>
> > >> Isn't this taken care of by the new call to get_gfn_type() which you add?
> > >> As said before, I think at the point we want to obtain the PDPTs all
> > >> other adjustments and arrangements should have been done already, by
> > >> higher layers. This code should have no need to do anything beyond a
> > >> simple lookup.
> > >
> > > I don't really know what else to say. There are multiple paths leading
> > > to vmx_load_pdptrs, some take the paging_lock while some don't. In
> > > this particular case we can do the fixups earlier as I do in this
> > > patch because there happens to be a lookup before the paging_lock is
> > > taken but in other cases there isn't such a route so removing
> > > P2M_UNSHARE from vmx_load_pdptrs is not an option.
> >
> > I disagree (because such missing unshare requests could be put
> > elsewhere), but let me ask another question then: Why is it that
> > vmx_load_pdptrs() needs to unshare? The function only reads from
> > the page. Even if the page's content changed later on, we wouldn't
> > care, as there's no coherence of the PDPTRs once loaded.
>
> Ah, I see what you mean. It really just looks like it reads from the
> page so P2M_ALLOC would suffice. Let me verify.

Alright, that works and it's an even better solution as we don't
unnecessarily unshare the page even for PAE guests.

Thanks, will send v2 shortly,
Tamas

Patch
diff mbox series

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 7f84d0c6ea..9ae4c3ae6e 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -748,12 +748,19 @@  static void hap_update_paging_modes(struct vcpu *v)
     struct domain *d = v->domain;
     unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
     p2m_type_t t;
+    p2m_query_t q = P2M_ALLOC;
 
-    /* We hold onto the cr3 as it may be modified later, and
-     * we need to respect lock ordering. No need for 
-     * checks here as they are performed by vmx_load_pdptrs
-     * (the potential user of the cr3) */
-    (void)get_gfn(d, cr3_gfn, &t);
+    /*
+     * We hold onto the cr3 as it may be modified later, and
+     * we need to respect lock ordering. Unshare here if we have to as to avoid
+     * a lock-order violation later while we are holding the paging_lock.
+     * Further checks are performed by vmx_load_pdptrs (the potential user of
+     * the cr3).
+     */
+    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
+        q |= P2M_UNSHARE;
+
+    (void)get_gfn_type(d, cr3_gfn, &t, q);
     paging_lock(d);
 
     v->arch.paging.mode = hap_paging_get_mode(v);