x86/vvmx: Fix deadlock with MSR bitmap merging

Message ID	20200311183455.23729-1-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Ft0f=44=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29BA1206E9 Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of andrew.cooper3@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="andrew.cooper3@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa6.hc3370-68.iphmx.com: domain of Andrew.Cooper3@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="Andrew.Cooper3@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: IyzGp3cfeIZFUNXD+cf2kmPk/xs0P+vUSoU+S09aNcidd3R4Y/4AKOQa1dUeOXZtCVSPJ7K+2b r4vBP1GG394rhZjmy9eS4ero6fkA009ohWYlspM25UKSFNaOLXwSYEClnh99PuSbNyRsVkuJHp kFpC5BmqQGQvjmtU9JECBu2OveoS6pc8gOpBsL1qRvJBZFclmwMJ4Xiy0Cw3VqlAblp7g9BfIF veRRmB+PaK7QRAPgm7r/T3KtOT7myYYOgpd3/wHs94NsSluGyx9hb8CGnsGJiHddKBabF6ktG3 QsE= From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xenproject.org> Date: Wed, 11 Mar 2020 18:34:55 +0000 Message-ID: <20200311183455.23729-1-andrew.cooper3@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] x86/vvmx: Fix deadlock with MSR bitmap merging Precedence: list Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Kevin Tian <kevin.tian@intel.com>, Wei Liu <wl@xen.org>, Jan Beulich <JBeulich@suse.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	x86/vvmx: Fix deadlock with MSR bitmap merging \| expand x86/vvmx: Fix deadlock with MSR bitmap merging

Andrew Cooper March 11, 2020, 6:34 p.m. UTC

c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
map_domain_page() which may get used in the middle of context switch.

This is not safe, and causes Xen to deadlock on the mapcache lock:

  (XEN) Xen call trace:
  (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
  (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
  (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
  (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
  (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
  (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
  (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
  (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
  (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
  (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
  (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
  (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
  (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
  (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
  (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
  (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30

Convert the domheap page into being a xenheap page.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>

I suspect this is the not-quite-consistent-enough-to-bisect issue which
OSSTest is hitting and interfering with pushes to master.
---
 xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
 xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

Jan Beulich March 12, 2020, 9:21 a.m. UTC | #1

On 11.03.2020 19:34, Andrew Cooper wrote:
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> OSSTest is hitting and interfering with pushes to master.

Having looked at a number of (albeit not all) failures, I don't
think I've seen any sign of a crash like the one above. Do you
think there are more subtle manifestations of the issue? Also
it is my understanding that this issue shouldn't get in the
way of any non-nested tests (of which we've had varying sets of
failures).

Jan

Roger Pau Monne March 12, 2020, 9:30 a.m. UTC | #2

On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.

Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> 
> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> OSSTest is hitting and interfering with pushes to master.
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>  2 files changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 926a11c15f..f049920196 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>  
>      if ( cpu_has_vmx_msr_bitmap )
>      {
> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
> +        nvmx->msr_merged = alloc_xenheap_page();

Could we also use __map_domain_page_global here (keeping the domheap
allocation) in order to map the page on init and keep it mapped until
the domain is destroyed?

>          if ( !nvmx->msr_merged )
> -        {
> -            gdprintk(XENLOG_ERR, "nest: allocation for MSR bitmap failed\n");

I know the message is not specially helpful, but that seems to be
inline with how error handling is done in nvmx_vcpu_initialise.

Thanks, Roger.

Roger Pau Monne March 12, 2020, 10:58 a.m. UTC | #3

On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
> > c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> > map_domain_page() which may get used in the middle of context switch.
> > 
> > This is not safe, and causes Xen to deadlock on the mapcache lock:
> > 
> >   (XEN) Xen call trace:
> >   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
> >   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
> >   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
> >   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
> >   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
> >   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
> >   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
> >   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
> >   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
> >   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
> >   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
> >   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
> >   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
> >   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
> >   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
> >   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> > 
> > Convert the domheap page into being a xenheap page.
> 
> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
> 
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > CC: Jan Beulich <JBeulich@suse.com>
> > CC: Wei Liu <wl@xen.org>
> > CC: Roger Pau Monné <roger.pau@citrix.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > 
> > I suspect this is the not-quite-consistent-enough-to-bisect issue which
> > OSSTest is hitting and interfering with pushes to master.
> > ---
> >  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
> >  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
> >  2 files changed, 5 insertions(+), 16 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index 926a11c15f..f049920196 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
> >  
> >      if ( cpu_has_vmx_msr_bitmap )
> >      {
> > -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
> > +        nvmx->msr_merged = alloc_xenheap_page();
> 
> Could we also use __map_domain_page_global here (keeping the domheap
> allocation) in order to map the page on init and keep it mapped until
> the domain is destroyed?

Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
mention using map_domain_page_global there as an option also, so I
guess you went with the xenheap page option because it was simpler.

Thanks, Roger.

Andrew Cooper March 12, 2020, 12:06 p.m. UTC | #4

On 12/03/2020 10:58, Roger Pau Monné wrote:
> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>> map_domain_page() which may get used in the middle of context switch.
>>>
>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>
>>>   (XEN) Xen call trace:
>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>
>>> Convert the domheap page into being a xenheap page.
>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Wei Liu <wl@xen.org>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>
>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>> OSSTest is hitting and interfering with pushes to master.
>>> ---
>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>> index 926a11c15f..f049920196 100644
>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>  
>>>      if ( cpu_has_vmx_msr_bitmap )
>>>      {
>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>> +        nvmx->msr_merged = alloc_xenheap_page();
>> Could we also use __map_domain_page_global here (keeping the domheap
>> allocation) in order to map the page on init and keep it mapped until
>> the domain is destroyed?
> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
> mention using map_domain_page_global there as an option also, so I
> guess you went with the xenheap page option because it was simpler.

A domheap page which is mapped globally for its entire lifetime is
strictly greater overhead than a xenheap page, because it also uses vmap
space.

global domheap mappings are for where we need to maintain a mapping for
more than a single transient access, but we don't know if/what/where at
the time the domain is created.

~Andrew

Andrew Cooper March 12, 2020, 12:21 p.m. UTC | #5

On 12/03/2020 09:21, Jan Beulich wrote:
> On 11.03.2020 19:34, Andrew Cooper wrote:
>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>> map_domain_page() which may get used in the middle of context switch.
>>
>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>
>> Convert the domheap page into being a xenheap page.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>> OSSTest is hitting and interfering with pushes to master.
> Having looked at a number of (albeit not all) failures, I don't
> think I've seen any sign of a crash like the one above. Do you
> think there are more subtle manifestations of the issue?

This stack trace was produced by an NMI watchdog timeout, and I thought
OSSTest didn't, but I see I'm wrong.

In which case this probably isn't want OSSTest is seeing, but it is a
genuine issue.

~Andrew

Jan Beulich March 12, 2020, 1:25 p.m. UTC | #6

On 12.03.2020 13:06, Andrew Cooper wrote:
> On 12/03/2020 10:58, Roger Pau Monné wrote:
>> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>>> map_domain_page() which may get used in the middle of context switch.
>>>>
>>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>>
>>>>   (XEN) Xen call trace:
>>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>>
>>>> Convert the domheap page into being a xenheap page.
>>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>
>>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>>> OSSTest is hitting and interfering with pushes to master.
>>>> ---
>>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>>> index 926a11c15f..f049920196 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>>  
>>>>      if ( cpu_has_vmx_msr_bitmap )
>>>>      {
>>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>>> +        nvmx->msr_merged = alloc_xenheap_page();
>>> Could we also use __map_domain_page_global here (keeping the domheap
>>> allocation) in order to map the page on init and keep it mapped until
>>> the domain is destroyed?
>> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
>> mention using map_domain_page_global there as an option also, so I
>> guess you went with the xenheap page option because it was simpler.
> 
> A domheap page which is mapped globally for its entire lifetime is
> strictly greater overhead than a xenheap page, because it also uses vmap
> space.
> 
> global domheap mappings are for where we need to maintain a mapping for
> more than a single transient access, but we don't know if/what/where at
> the time the domain is created.

I didn't think that's the only criteria: One large systems the
xenheap may be exhausted while the domheap isn't, and hence
using domheap pages (and global mappings) allows to avoid
-ENOMEM.

Jan

Roger Pau Monne March 12, 2020, 1:32 p.m. UTC | #7

On Thu, Mar 12, 2020 at 12:21:29PM +0000, Andrew Cooper wrote:
> On 12/03/2020 09:21, Jan Beulich wrote:
> > On 11.03.2020 19:34, Andrew Cooper wrote:
> >> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> >> map_domain_page() which may get used in the middle of context switch.
> >>
> >> This is not safe, and causes Xen to deadlock on the mapcache lock:
> >>
> >>   (XEN) Xen call trace:
> >>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
> >>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
> >>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
> >>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
> >>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
> >>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
> >>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
> >>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
> >>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
> >>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
> >>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
> >>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
> >>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
> >>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
> >>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
> >>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> >>
> >> Convert the domheap page into being a xenheap page.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> >> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> >> OSSTest is hitting and interfering with pushes to master.
> > Having looked at a number of (albeit not all) failures, I don't
> > think I've seen any sign of a crash like the one above. Do you
> > think there are more subtle manifestations of the issue?
> 
> This stack trace was produced by an NMI watchdog timeout, and I thought
> OSSTest didn't, but I see I'm wrong.
> 
> In which case this probably isn't want OSSTest is seeing, but it is a
> genuine issue.

osstest issue IIRC was L1 Xen hitting ASSERT(!sp || (peoi[sp -
1].vector < vector)) in do_IRQ_guest, which seems to mean L0 Xen
injects interrupts twice or some such?

Roger.

Andrew Cooper March 12, 2020, 1:34 p.m. UTC | #8

On 12/03/2020 13:25, Jan Beulich wrote:
> On 12.03.2020 13:06, Andrew Cooper wrote:
>> On 12/03/2020 10:58, Roger Pau Monné wrote:
>>> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>>>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>>>> map_domain_page() which may get used in the middle of context switch.
>>>>>
>>>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>>>
>>>>>   (XEN) Xen call trace:
>>>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>>>
>>>>> Convert the domheap page into being a xenheap page.
>>>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>> CC: Wei Liu <wl@xen.org>
>>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>>
>>>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>>>> OSSTest is hitting and interfering with pushes to master.
>>>>> ---
>>>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> index 926a11c15f..f049920196 100644
>>>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>>>  
>>>>>      if ( cpu_has_vmx_msr_bitmap )
>>>>>      {
>>>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>>>> +        nvmx->msr_merged = alloc_xenheap_page();
>>>> Could we also use __map_domain_page_global here (keeping the domheap
>>>> allocation) in order to map the page on init and keep it mapped until
>>>> the domain is destroyed?
>>> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
>>> mention using map_domain_page_global there as an option also, so I
>>> guess you went with the xenheap page option because it was simpler.
>> A domheap page which is mapped globally for its entire lifetime is
>> strictly greater overhead than a xenheap page, because it also uses vmap
>> space.
>>
>> global domheap mappings are for where we need to maintain a mapping for
>> more than a single transient access, but we don't know if/what/where at
>> the time the domain is created.
> I didn't think that's the only criteria:

It isn't the only criteria.  domheap+global does also let you get
working NUMA positioning.  However...

> One large systems the
> xenheap may be exhausted while the domheap isn't, and hence
> using domheap pages (and global mappings) allows to avoid
> -ENOMEM.

... on large systems, you more likely to run out of vmap space than
xenheap space, seeing as the former is limited to 64G (inc iomap/fixmap
mappings), and the latter tops out at 4T.

~Andrew

Tian, Kevin March 13, 2020, 6:10 a.m. UTC | #9

> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: Thursday, March 12, 2020 2:35 AM
> 
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a
> use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F
> x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F
> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F
> arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F
> common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F
> common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/vvmx: Fix deadlock with MSR bitmap merging

Commit Message

Comments

Patch