Message ID | a7635e7423f834f44a132114bd3e039dd0435a00.1592490545.git.tamas.lengyel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE | expand |
On Thu, Jun 18, 2020 at 07:39:04AM -0700, Tamas K Lengyel wrote: > While forking VMs running a small RTOS system (Zephyr) a Xen crash has been > observed due to a mm-lock order violation while copying the HVM CPU context > from the parent. This issue has been identified to be due to > hap_update_paging_modes first getting a lock on the gfn using get_gfn. This > call also creates a shared entry in the fork's memory map for the cr3 gfn. The > function later calls hap_update_cr3 while holding the paging_lock, which > results in the lock-order violation in vmx_load_pdptrs when it tries to unshare > the above entry when it grabs the page with the P2M_UNSHARE flag set. > > Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was > unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure > the p2m is properly populated. > > Note that the lock order violation is avoided because before the paging_lock is > taken a lookup is performed with P2M_ALLOC that forks the page, thus the second > lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep > P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it > which don't take the paging_lock and that have no previous lookup. Currently no > other code-path exists leading there with the paging_lock taken, thus no > further adjustments are necessary. > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks!
> -----Original Message----- > From: Roger Pau Monné <roger.pau@citrix.com> > Sent: 18 June 2020 16:46 > To: Tamas K Lengyel <tamas.lengyel@intel.com> > Cc: xen-devel@lists.xenproject.org; Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian > <kevin.tian@intel.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; > Wei Liu <wl@xen.org>; Paul Durrant <paul@xen.org> > Subject: Re: [PATCH v3 for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE > > On Thu, Jun 18, 2020 at 07:39:04AM -0700, Tamas K Lengyel wrote: > > While forking VMs running a small RTOS system (Zephyr) a Xen crash has been > > observed due to a mm-lock order violation while copying the HVM CPU context > > from the parent. This issue has been identified to be due to > > hap_update_paging_modes first getting a lock on the gfn using get_gfn. This > > call also creates a shared entry in the fork's memory map for the cr3 gfn. The > > function later calls hap_update_cr3 while holding the paging_lock, which > > results in the lock-order violation in vmx_load_pdptrs when it tries to unshare > > the above entry when it grabs the page with the P2M_UNSHARE flag set. > > > > Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was > > unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure > > the p2m is properly populated. > > > > Note that the lock order violation is avoided because before the paging_lock is > > taken a lookup is performed with P2M_ALLOC that forks the page, thus the second > > lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep > > P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it > > which don't take the paging_lock and that have no previous lookup. Currently no > > other code-path exists leading there with the paging_lock taken, thus no > > further adjustments are necessary. > > > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > Release-acked-by: Paul Durrant <paul@xen.org> > Thanks!
> From: Lengyel, Tamas <tamas.lengyel@intel.com> > Sent: Thursday, June 18, 2020 10:39 PM > > While forking VMs running a small RTOS system (Zephyr) a Xen crash has > been > observed due to a mm-lock order violation while copying the HVM CPU > context > from the parent. This issue has been identified to be due to > hap_update_paging_modes first getting a lock on the gfn using get_gfn. This > call also creates a shared entry in the fork's memory map for the cr3 gfn. The > function later calls hap_update_cr3 while holding the paging_lock, which > results in the lock-order violation in vmx_load_pdptrs when it tries to > unshare > the above entry when it grabs the page with the P2M_UNSHARE flag set. > > Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE > was > unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure > the p2m is properly populated. > > Note that the lock order violation is avoided because before the paging_lock > is > taken a lookup is performed with P2M_ALLOC that forks the page, thus the > second > lookup in vmx_load_pdptrs succeeds without having to perform the fork. We > keep > P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to > it > which don't take the paging_lock and that have no previous lookup. > Currently no > other code-path exists leading there with the paging_lock taken, thus no > further adjustments are necessary. > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > --- > v3: expand commit message to explain why there is no lock-order violation > --- > xen/arch/x86/hvm/vmx/vmx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index ab19d9424e..cc6d4ece22 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1325,7 +1325,7 @@ static void vmx_load_pdptrs(struct vcpu *v) > if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) ) > goto crash; > > - page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, > P2M_UNSHARE); > + page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, > P2M_ALLOC); > if ( !page ) > { > /* Ideally you don't want to crash but rather go into a wait > -- > 2.25.1
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index ab19d9424e..cc6d4ece22 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1325,7 +1325,7 @@ static void vmx_load_pdptrs(struct vcpu *v) if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) ) goto crash; - page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, P2M_UNSHARE); + page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, P2M_ALLOC); if ( !page ) { /* Ideally you don't want to crash but rather go into a wait
While forking VMs running a small RTOS system (Zephyr) a Xen crash has been observed due to a mm-lock order violation while copying the HVM CPU context from the parent. This issue has been identified to be due to hap_update_paging_modes first getting a lock on the gfn using get_gfn. This call also creates a shared entry in the fork's memory map for the cr3 gfn. The function later calls hap_update_cr3 while holding the paging_lock, which results in the lock-order violation in vmx_load_pdptrs when it tries to unshare the above entry when it grabs the page with the P2M_UNSHARE flag set. Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure the p2m is properly populated. Note that the lock order violation is avoided because before the paging_lock is taken a lookup is performed with P2M_ALLOC that forks the page, thus the second lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it which don't take the paging_lock and that have no previous lookup. Currently no other code-path exists leading there with the paging_lock taken, thus no further adjustments are necessary. Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> --- v3: expand commit message to explain why there is no lock-order violation --- xen/arch/x86/hvm/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)