diff mbox series

[v3,for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE

Message ID a7635e7423f834f44a132114bd3e039dd0435a00.1592490545.git.tamas.lengyel@intel.com (mailing list archive)
State New, archived
Headers show
Series [v3,for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE | expand

Commit Message

Tamas K Lengyel June 18, 2020, 2:39 p.m. UTC
While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
observed due to a mm-lock order violation while copying the HVM CPU context
from the parent. This issue has been identified to be due to
hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
call also creates a shared entry in the fork's memory map for the cr3 gfn. The
function later calls hap_update_cr3 while holding the paging_lock, which
results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
the above entry when it grabs the page with the P2M_UNSHARE flag set.

Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
the p2m is properly populated.

Note that the lock order violation is avoided because before the paging_lock is
taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
which don't take the paging_lock and that have no previous lookup. Currently no
other code-path exists leading there with the paging_lock taken, thus no
further adjustments are necessary.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v3: expand commit message to explain why there is no lock-order violation
---
 xen/arch/x86/hvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Roger Pau Monné June 18, 2020, 3:46 p.m. UTC | #1
On Thu, Jun 18, 2020 at 07:39:04AM -0700, Tamas K Lengyel wrote:
> While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
> observed due to a mm-lock order violation while copying the HVM CPU context
> from the parent. This issue has been identified to be due to
> hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
> call also creates a shared entry in the fork's memory map for the cr3 gfn. The
> function later calls hap_update_cr3 while holding the paging_lock, which
> results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
> the above entry when it grabs the page with the P2M_UNSHARE flag set.
> 
> Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
> unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
> the p2m is properly populated.
> 
> Note that the lock order violation is avoided because before the paging_lock is
> taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
> lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
> P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
> which don't take the paging_lock and that have no previous lookup. Currently no
> other code-path exists leading there with the paging_lock taken, thus no
> further adjustments are necessary.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks!
Paul Durrant June 18, 2020, 3:53 p.m. UTC | #2
> -----Original Message-----
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: 18 June 2020 16:46
> To: Tamas K Lengyel <tamas.lengyel@intel.com>
> Cc: xen-devel@lists.xenproject.org; Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian
> <kevin.tian@intel.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Wei Liu <wl@xen.org>; Paul Durrant <paul@xen.org>
> Subject: Re: [PATCH v3 for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE
> 
> On Thu, Jun 18, 2020 at 07:39:04AM -0700, Tamas K Lengyel wrote:
> > While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
> > observed due to a mm-lock order violation while copying the HVM CPU context
> > from the parent. This issue has been identified to be due to
> > hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
> > call also creates a shared entry in the fork's memory map for the cr3 gfn. The
> > function later calls hap_update_cr3 while holding the paging_lock, which
> > results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
> > the above entry when it grabs the page with the P2M_UNSHARE flag set.
> >
> > Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
> > unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
> > the p2m is properly populated.
> >
> > Note that the lock order violation is avoided because before the paging_lock is
> > taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
> > lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
> > P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
> > which don't take the paging_lock and that have no previous lookup. Currently no
> > other code-path exists leading there with the paging_lock taken, thus no
> > further adjustments are necessary.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 

Release-acked-by: Paul Durrant <paul@xen.org>

> Thanks!
Tian, Kevin June 19, 2020, 1:27 a.m. UTC | #3
> From: Lengyel, Tamas <tamas.lengyel@intel.com>
> Sent: Thursday, June 18, 2020 10:39 PM
> 
> While forking VMs running a small RTOS system (Zephyr) a Xen crash has
> been
> observed due to a mm-lock order violation while copying the HVM CPU
> context
> from the parent. This issue has been identified to be due to
> hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
> call also creates a shared entry in the fork's memory map for the cr3 gfn. The
> function later calls hap_update_cr3 while holding the paging_lock, which
> results in the lock-order violation in vmx_load_pdptrs when it tries to
> unshare
> the above entry when it grabs the page with the P2M_UNSHARE flag set.
> 
> Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE
> was
> unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
> the p2m is properly populated.
> 
> Note that the lock order violation is avoided because before the paging_lock
> is
> taken a lookup is performed with P2M_ALLOC that forks the page, thus the
> second
> lookup in vmx_load_pdptrs succeeds without having to perform the fork. We
> keep
> P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to
> it
> which don't take the paging_lock and that have no previous lookup.
> Currently no
> other code-path exists leading there with the paging_lock taken, thus no
> further adjustments are necessary.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> v3: expand commit message to explain why there is no lock-order violation
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ab19d9424e..cc6d4ece22 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1325,7 +1325,7 @@ static void vmx_load_pdptrs(struct vcpu *v)
>      if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) )
>          goto crash;
> 
> -    page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt,
> P2M_UNSHARE);
> +    page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt,
> P2M_ALLOC);
>      if ( !page )
>      {
>          /* Ideally you don't want to crash but rather go into a wait
> --
> 2.25.1
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ab19d9424e..cc6d4ece22 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1325,7 +1325,7 @@  static void vmx_load_pdptrs(struct vcpu *v)
     if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) )
         goto crash;
 
-    page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, P2M_UNSHARE);
+    page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
     if ( !page )
     {
         /* Ideally you don't want to crash but rather go into a wait