diff mbox series

x86/vvmx: Fix deadlock with MSR bitmap merging

Message ID 20200311183455.23729-1-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/vvmx: Fix deadlock with MSR bitmap merging | expand

Commit Message

Andrew Cooper March 11, 2020, 6:34 p.m. UTC
c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
map_domain_page() which may get used in the middle of context switch.

This is not safe, and causes Xen to deadlock on the mapcache lock:

  (XEN) Xen call trace:
  (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
  (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
  (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
  (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
  (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
  (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
  (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
  (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
  (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
  (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
  (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
  (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
  (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
  (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
  (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
  (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30

Convert the domheap page into being a xenheap page.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>

I suspect this is the not-quite-consistent-enough-to-bisect issue which
OSSTest is hitting and interfering with pushes to master.
---
 xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
 xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

Comments

Jan Beulich March 12, 2020, 9:21 a.m. UTC | #1
On 11.03.2020 19:34, Andrew Cooper wrote:
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> OSSTest is hitting and interfering with pushes to master.

Having looked at a number of (albeit not all) failures, I don't
think I've seen any sign of a crash like the one above. Do you
think there are more subtle manifestations of the issue? Also
it is my understanding that this issue shouldn't get in the
way of any non-nested tests (of which we've had varying sets of
failures).

Jan
Roger Pau Monné March 12, 2020, 9:30 a.m. UTC | #2
On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.

Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> 
> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> OSSTest is hitting and interfering with pushes to master.
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>  2 files changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 926a11c15f..f049920196 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>  
>      if ( cpu_has_vmx_msr_bitmap )
>      {
> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
> +        nvmx->msr_merged = alloc_xenheap_page();

Could we also use __map_domain_page_global here (keeping the domheap
allocation) in order to map the page on init and keep it mapped until
the domain is destroyed?

>          if ( !nvmx->msr_merged )
> -        {
> -            gdprintk(XENLOG_ERR, "nest: allocation for MSR bitmap failed\n");

I know the message is not specially helpful, but that seems to be
inline with how error handling is done in nvmx_vcpu_initialise.

Thanks, Roger.
Roger Pau Monné March 12, 2020, 10:58 a.m. UTC | #3
On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
> > c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> > map_domain_page() which may get used in the middle of context switch.
> > 
> > This is not safe, and causes Xen to deadlock on the mapcache lock:
> > 
> >   (XEN) Xen call trace:
> >   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
> >   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
> >   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
> >   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
> >   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
> >   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
> >   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
> >   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
> >   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
> >   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
> >   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
> >   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
> >   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
> >   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
> >   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
> >   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> > 
> > Convert the domheap page into being a xenheap page.
> 
> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
> 
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > CC: Jan Beulich <JBeulich@suse.com>
> > CC: Wei Liu <wl@xen.org>
> > CC: Roger Pau Monné <roger.pau@citrix.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > 
> > I suspect this is the not-quite-consistent-enough-to-bisect issue which
> > OSSTest is hitting and interfering with pushes to master.
> > ---
> >  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
> >  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
> >  2 files changed, 5 insertions(+), 16 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index 926a11c15f..f049920196 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
> >  
> >      if ( cpu_has_vmx_msr_bitmap )
> >      {
> > -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
> > +        nvmx->msr_merged = alloc_xenheap_page();
> 
> Could we also use __map_domain_page_global here (keeping the domheap
> allocation) in order to map the page on init and keep it mapped until
> the domain is destroyed?

Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
mention using map_domain_page_global there as an option also, so I
guess you went with the xenheap page option because it was simpler.

Thanks, Roger.
Andrew Cooper March 12, 2020, 12:06 p.m. UTC | #4
On 12/03/2020 10:58, Roger Pau Monné wrote:
> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>> map_domain_page() which may get used in the middle of context switch.
>>>
>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>
>>>   (XEN) Xen call trace:
>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>
>>> Convert the domheap page into being a xenheap page.
>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Wei Liu <wl@xen.org>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>
>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>> OSSTest is hitting and interfering with pushes to master.
>>> ---
>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>> index 926a11c15f..f049920196 100644
>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>  
>>>      if ( cpu_has_vmx_msr_bitmap )
>>>      {
>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>> +        nvmx->msr_merged = alloc_xenheap_page();
>> Could we also use __map_domain_page_global here (keeping the domheap
>> allocation) in order to map the page on init and keep it mapped until
>> the domain is destroyed?
> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
> mention using map_domain_page_global there as an option also, so I
> guess you went with the xenheap page option because it was simpler.

A domheap page which is mapped globally for its entire lifetime is
strictly greater overhead than a xenheap page, because it also uses vmap
space.

global domheap mappings are for where we need to maintain a mapping for
more than a single transient access, but we don't know if/what/where at
the time the domain is created.

~Andrew
Andrew Cooper March 12, 2020, 12:21 p.m. UTC | #5
On 12/03/2020 09:21, Jan Beulich wrote:
> On 11.03.2020 19:34, Andrew Cooper wrote:
>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>> map_domain_page() which may get used in the middle of context switch.
>>
>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>
>> Convert the domheap page into being a xenheap page.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>> OSSTest is hitting and interfering with pushes to master.
> Having looked at a number of (albeit not all) failures, I don't
> think I've seen any sign of a crash like the one above. Do you
> think there are more subtle manifestations of the issue?

This stack trace was produced by an NMI watchdog timeout, and I thought
OSSTest didn't, but I see I'm wrong.

In which case this probably isn't want OSSTest is seeing, but it is a
genuine issue.

~Andrew
Jan Beulich March 12, 2020, 1:25 p.m. UTC | #6
On 12.03.2020 13:06, Andrew Cooper wrote:
> On 12/03/2020 10:58, Roger Pau Monné wrote:
>> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>>> map_domain_page() which may get used in the middle of context switch.
>>>>
>>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>>
>>>>   (XEN) Xen call trace:
>>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>>
>>>> Convert the domheap page into being a xenheap page.
>>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>
>>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>>> OSSTest is hitting and interfering with pushes to master.
>>>> ---
>>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>>> index 926a11c15f..f049920196 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>>  
>>>>      if ( cpu_has_vmx_msr_bitmap )
>>>>      {
>>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>>> +        nvmx->msr_merged = alloc_xenheap_page();
>>> Could we also use __map_domain_page_global here (keeping the domheap
>>> allocation) in order to map the page on init and keep it mapped until
>>> the domain is destroyed?
>> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
>> mention using map_domain_page_global there as an option also, so I
>> guess you went with the xenheap page option because it was simpler.
> 
> A domheap page which is mapped globally for its entire lifetime is
> strictly greater overhead than a xenheap page, because it also uses vmap
> space.
> 
> global domheap mappings are for where we need to maintain a mapping for
> more than a single transient access, but we don't know if/what/where at
> the time the domain is created.

I didn't think that's the only criteria: One large systems the
xenheap may be exhausted while the domheap isn't, and hence
using domheap pages (and global mappings) allows to avoid
-ENOMEM.

Jan
Roger Pau Monné March 12, 2020, 1:32 p.m. UTC | #7
On Thu, Mar 12, 2020 at 12:21:29PM +0000, Andrew Cooper wrote:
> On 12/03/2020 09:21, Jan Beulich wrote:
> > On 11.03.2020 19:34, Andrew Cooper wrote:
> >> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
> >> map_domain_page() which may get used in the middle of context switch.
> >>
> >> This is not safe, and causes Xen to deadlock on the mapcache lock:
> >>
> >>   (XEN) Xen call trace:
> >>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
> >>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
> >>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
> >>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
> >>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
> >>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
> >>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
> >>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
> >>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
> >>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
> >>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
> >>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
> >>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
> >>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
> >>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
> >>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> >>
> >> Convert the domheap page into being a xenheap page.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> >> I suspect this is the not-quite-consistent-enough-to-bisect issue which
> >> OSSTest is hitting and interfering with pushes to master.
> > Having looked at a number of (albeit not all) failures, I don't
> > think I've seen any sign of a crash like the one above. Do you
> > think there are more subtle manifestations of the issue?
> 
> This stack trace was produced by an NMI watchdog timeout, and I thought
> OSSTest didn't, but I see I'm wrong.
> 
> In which case this probably isn't want OSSTest is seeing, but it is a
> genuine issue.

osstest issue IIRC was L1 Xen hitting ASSERT(!sp || (peoi[sp -
1].vector < vector)) in do_IRQ_guest, which seems to mean L0 Xen
injects interrupts twice or some such?

Roger.
Andrew Cooper March 12, 2020, 1:34 p.m. UTC | #8
On 12/03/2020 13:25, Jan Beulich wrote:
> On 12.03.2020 13:06, Andrew Cooper wrote:
>> On 12/03/2020 10:58, Roger Pau Monné wrote:
>>> On Thu, Mar 12, 2020 at 10:30:35AM +0100, Roger Pau Monné wrote:
>>>> On Wed, Mar 11, 2020 at 06:34:55PM +0000, Andrew Cooper wrote:
>>>>> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
>>>>> map_domain_page() which may get used in the middle of context switch.
>>>>>
>>>>> This is not safe, and causes Xen to deadlock on the mapcache lock:
>>>>>
>>>>>   (XEN) Xen call trace:
>>>>>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>>>>>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>>>>>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>>>>>   (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
>>>>>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>>>>>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>>>>>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>>>>>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>>>>>   (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>>>>>   (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
>>>>>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>>>>>   (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
>>>>>   (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
>>>>>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>>>>>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>>>>>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
>>>>>
>>>>> Convert the domheap page into being a xenheap page.
>>>> Fixes: c47984aabead5391 ('nvmx: implement support for MSR bitmaps')
>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>> CC: Wei Liu <wl@xen.org>
>>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>>
>>>>> I suspect this is the not-quite-consistent-enough-to-bisect issue which
>>>>> OSSTest is hitting and interfering with pushes to master.
>>>>> ---
>>>>>  xen/arch/x86/hvm/vmx/vvmx.c        | 19 ++++---------------
>>>>>  xen/include/asm-x86/hvm/vmx/vvmx.h |  2 +-
>>>>>  2 files changed, 5 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> index 926a11c15f..f049920196 100644
>>>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>>>> @@ -130,12 +130,9 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>>>>>  
>>>>>      if ( cpu_has_vmx_msr_bitmap )
>>>>>      {
>>>>> -        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
>>>>> +        nvmx->msr_merged = alloc_xenheap_page();
>>>> Could we also use __map_domain_page_global here (keeping the domheap
>>>> allocation) in order to map the page on init and keep it mapped until
>>>> the domain is destroyed?
>>> Just read 'nvmx deadlock with MSR bitmaps' now and realized that you
>>> mention using map_domain_page_global there as an option also, so I
>>> guess you went with the xenheap page option because it was simpler.
>> A domheap page which is mapped globally for its entire lifetime is
>> strictly greater overhead than a xenheap page, because it also uses vmap
>> space.
>>
>> global domheap mappings are for where we need to maintain a mapping for
>> more than a single transient access, but we don't know if/what/where at
>> the time the domain is created.
> I didn't think that's the only criteria:

It isn't the only criteria.  domheap+global does also let you get
working NUMA positioning.  However...

> One large systems the
> xenheap may be exhausted while the domheap isn't, and hence
> using domheap pages (and global mappings) allows to avoid
> -ENOMEM.

... on large systems, you more likely to run out of vmap space than
xenheap space, seeing as the former is limited to 64G (inc iomap/fixmap
mappings), and the latter tops out at 4T.

~Andrew
Tian, Kevin March 13, 2020, 6:10 a.m. UTC | #9
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: Thursday, March 12, 2020 2:35 AM
> 
> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a
> use of
> map_domain_page() which may get used in the middle of context switch.
> 
> This is not safe, and causes Xen to deadlock on the mapcache lock:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
>   (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
>   (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
>   (XEN)    [<ffff82d08038da3d>] F
> x86_64/entry.S#handle_exception_saved+0x68/0x94
>   (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
>   (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
>   (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
>   (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
>   (XEN)    [<ffff82d08029a3f7>] F
> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
>   (XEN)    [<ffff82d08031d796>] F
> arch/x86/domain.c#__context_switch+0x124/0x4a9
>   (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
>   (XEN)    [<ffff82d080252f3e>] F
> common/sched/core.c#sched_context_switch+0x16a/0x175
>   (XEN)    [<ffff82d080253877>] F
> common/sched/core.c#schedule+0x2ad/0x2bc
>   (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
>   (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
>   (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> Convert the domheap page into being a xenheap page.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 926a11c15f..f049920196 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -130,12 +130,9 @@  int nvmx_vcpu_initialise(struct vcpu *v)
 
     if ( cpu_has_vmx_msr_bitmap )
     {
-        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
+        nvmx->msr_merged = alloc_xenheap_page();
         if ( !nvmx->msr_merged )
-        {
-            gdprintk(XENLOG_ERR, "nest: allocation for MSR bitmap failed\n");
             return -ENOMEM;
-        }
     }
 
     nvmx->ept.enabled = 0;
@@ -198,11 +195,7 @@  static void vcpu_relinquish_resources(struct vcpu *v)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
-    if ( nvmx->msr_merged )
-    {
-        free_domheap_page(nvmx->msr_merged);
-        nvmx->msr_merged = NULL;
-    }
+    FREE_XENHEAP_PAGE(nvmx->msr_merged);
 }
 
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -575,14 +568,12 @@  unsigned long *_shadow_io_bitmap(struct vcpu *v)
 static void update_msrbitmap(struct vcpu *v, uint32_t shadow_ctrl)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
-    struct vmx_msr_bitmap *msr_bitmap;
+    struct vmx_msr_bitmap *msr_bitmap = nvmx->msr_merged;
 
     if ( !(shadow_ctrl & CPU_BASED_ACTIVATE_MSR_BITMAP) ||
          !nvmx->msrbitmap )
        return;
 
-    msr_bitmap = __map_domain_page(nvmx->msr_merged);
-
     bitmap_or(msr_bitmap->read_low, nvmx->msrbitmap->read_low,
               v->arch.hvm.vmx.msr_bitmap->read_low,
               sizeof(msr_bitmap->read_low) * 8);
@@ -603,9 +594,7 @@  static void update_msrbitmap(struct vcpu *v, uint32_t shadow_ctrl)
     bitmap_set(msr_bitmap->read_low, MSR_X2APIC_FIRST, 0x100);
     bitmap_set(msr_bitmap->write_low, MSR_X2APIC_FIRST, 0x100);
 
-    unmap_domain_page(msr_bitmap);
-
-    __vmwrite(MSR_BITMAP, page_to_maddr(nvmx->msr_merged));
+    __vmwrite(MSR_BITMAP, virt_to_maddr(nvmx->msr_merged));
 }
 
 void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index c41f089939..d5f68f30b1 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -38,7 +38,7 @@  struct nestedvmx {
     paddr_t    vmxon_region_pa;
     void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
     struct vmx_msr_bitmap *msrbitmap;	/* map (va) of L1 guest MSR bitmap */
-    struct page_info *msr_merged;	/* merged L1 and L2 MSR bitmap */
+    struct vmx_msr_bitmap *msr_merged;	/* merged L1 and L2 MSR bitmap */
     /* deferred nested interrupt */
     struct {
         unsigned long intr_info;