nvmx: implement support for MSR bitmaps

Message ID	20200108123140.77999-1-roger.pau@citrix.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=xJ9Z=25=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6D2B206DB Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of roger.pau@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="roger.pau@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of roger.pau@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="roger.pau@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: eKCfj0Qq3L9dh71U9Urn3C0KbkkK3CoHkV1CCBpzxUmaunW2zoV9Cd+KPcaD+sAX01Zbx5UUUA F+lyYWNpd5gW63YvLjfoESnYPgwlom4ULI1ikHFJ5GA/sWQ3BEKR7RWwJ8hyvnmiiEB56Xm/LN g9y0zt4beQwh0OD21bkLHLwkxZQ0Ahfg6HEP5c+DFEVL4z0E6+igxzDRB8soi5ClvXUR01QVOY Cx4l0xLNtpC3FJ6EN6zpjBw7lw4IDU/Slf9TAaQNSQgUpafhKUkClVmZRfCcRSNN9Xkmjvxwfe eS4= From: Roger Pau Monne <roger.pau@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Wed, 8 Jan 2020 13:31:40 +0100 Message-ID: <20200108123140.77999-1-roger.pau@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] nvmx: implement support for MSR bitmaps Precedence: list Cc: Kevin Tian <kevin.tian@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>, Roger Pau Monne <roger.pau@citrix.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	nvmx: implement support for MSR bitmaps \| expand nvmx: implement support for MSR bitmaps

Message ID

20200108123140.77999-1-roger.pau@citrix.com (mailing list archive)

State

Superseded

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6D2B206DB
Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender
 authenticity information available from domain of
 roger.pau@citrix.com) identity=pra; client-ip=162.221.158.21;
 receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="roger.pau@citrix.com";
 x-sender="roger.pau@citrix.com"; x-conformance=sidf_compatible
Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of
 roger.pau@citrix.com designates 162.221.158.21 as permitted
 sender) identity=mailfrom; client-ip=162.221.158.21;
 receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="roger.pau@citrix.com";
 x-sender="roger.pau@citrix.com";
 x-conformance=sidf_compatible; x-record-type="v=spf1";
 x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133
 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4
 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88
 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83
 ip4:168.245.78.127 ~all"
Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender
 authenticity information available from domain of
 postmaster@mail.citrix.com) identity=helo;
 client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="roger.pau@citrix.com";
 x-sender="postmaster@mail.citrix.com";
 x-conformance=sidf_compatible
IronPort-SDR: 
 eKCfj0Qq3L9dh71U9Urn3C0KbkkK3CoHkV1CCBpzxUmaunW2zoV9Cd+KPcaD+sAX01Zbx5UUUA
 F+lyYWNpd5gW63YvLjfoESnYPgwlom4ULI1ikHFJ5GA/sWQ3BEKR7RWwJ8hyvnmiiEB56Xm/LN
 g9y0zt4beQwh0OD21bkLHLwkxZQ0Ahfg6HEP5c+DFEVL4z0E6+igxzDRB8soi5ClvXUR01QVOY
 Cx4l0xLNtpC3FJ6EN6zpjBw7lw4IDU/Slf9TAaQNSQgUpafhKUkClVmZRfCcRSNN9Xkmjvxwfe
 eS4=
From: Roger Pau Monne <roger.pau@citrix.com>
To: <xen-devel@lists.xenproject.org>
Date: Wed, 8 Jan 2020 13:31:40 +0100
Message-ID: <20200108123140.77999-1-roger.pau@citrix.com>
MIME-Version: 1.0
Subject: [Xen-devel] [PATCH] nvmx: implement support for MSR bitmaps
Precedence: list
Cc: Kevin Tian <kevin.tian@intel.com>, Jun Nakajima <jun.nakajima@intel.com>,
 Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>,
 Jan Beulich <jbeulich@suse.com>, Roger Pau Monne <roger.pau@citrix.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

nvmx: implement support for MSR bitmaps | expand

Commit Message

Roger Pau Monné Jan. 8, 2020, 12:31 p.m. UTC

Current implementation of nested VMX has a half baked handling of MSR
bitmaps for the L1 VMM: it maps the L1 VMM provided MSR bitmap, but
doesn't actually load it into the nested vmcs, and thus the nested
guest vmcs ends up using the same MSR bitmap as the L1 VMM.

This is wrong as there's no assurance that the set of features enabled
for the L1 vmcs are the same that L1 itself is going to use in the
nested vmcs, and thus can lead to misconfigurations.

For example L1 vmcs can use x2APIC virtualization and virtual
interrupt delivery, and thus some x2APIC MSRs won't be trapped so that
they can be handled directly by the hardware using virtualization
extensions. On the other hand, the nested vmcs created by L1 VMM might
not use any of such features, so using a MSR bitmap that doesn't trap
accesses to the x2APIC MSRs will be leaking them to the underlying
hardware.

Fix this by crafting a merged MSR bitmap between the one used by L1
and the nested guest, and make sure a nested vmcs MSR bitmap always
traps accesses to the x2APIC MSR range, since hardware assisted x2APIC
virtualization or virtual interrupt delivery are never available to
L1 VMM.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
This seems better than what's done currently, but TBH there's a lot of
work to be done in nvmx in order to make it functional and secure that
I'm not sure whether building on top of the current implementation is
something sane to do, or it would be better to start from scratch and
re-implement nvmx to just support the minimum required set of VTx
features in a sane and safe way.
---
 xen/arch/x86/hvm/vmx/vvmx.c        | 76 ++++++++++++++++++++++++++++--
 xen/include/asm-x86/hvm/vmx/vvmx.h |  3 +-
 2 files changed, 75 insertions(+), 4 deletions(-)

Comments

Jan Beulich Jan. 8, 2020, 12:55 p.m. UTC | #1

On 08.01.2020 13:31, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -128,6 +128,16 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>          unmap_domain_page(vw);
>      }
>  
> +    if ( cpu_has_vmx_msr_bitmap )
> +    {
> +        nvmx->msr_merged = alloc_domheap_page(NULL, 0);

Despite this matching other code in the same file, it's not really
what you want, I think. Instead please consider using

        nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);

to honor d's NUMA properties.

> @@ -182,6 +192,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          free_domheap_page(v->arch.hvm.vmx.vmwrite_bitmap);
>          v->arch.hvm.vmx.vmwrite_bitmap = NULL;
>      }
> +    if ( nvmx->msr_merged )
> +    {
> +        free_domheap_page(nvmx->msr_merged);
> +        nvmx->msr_merged = NULL;

Hmm, I'm puzzled that we have FREE_XENHEAP_PAGE(), but no
FREE_DOMHEAP_PAGE().

> @@ -548,6 +563,50 @@ unsigned long *_shadow_io_bitmap(struct vcpu *v)
>      return nestedhvm_vcpu_iomap_get(port80, portED);
>  }
>  
> +static void update_msrbitmap(struct vcpu *v)
> +{
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> +    struct vmx_msr_bitmap *msr_bitmap;
> +    unsigned int msr;
> +
> +    ASSERT(__n2_exec_control(v) & CPU_BASED_ACTIVATE_MSR_BITMAP);
> +
> +    if ( !nvmx->msrbitmap )
> +        return;
> +
> +    msr_bitmap = __map_domain_page(nvmx->msr_merged);
> +
> +    bitmap_or(msr_bitmap->read_low, nvmx->msrbitmap->read_low,
> +              v->arch.hvm.vmx.msr_bitmap->read_low,
> +              sizeof(msr_bitmap->read_low) * 8);
> +    bitmap_or(msr_bitmap->read_high, nvmx->msrbitmap->read_high,
> +              v->arch.hvm.vmx.msr_bitmap->read_high,
> +              sizeof(msr_bitmap->read_high) * 8);
> +    bitmap_or(msr_bitmap->write_low, nvmx->msrbitmap->write_low,
> +              v->arch.hvm.vmx.msr_bitmap->write_low,
> +              sizeof(msr_bitmap->write_low) * 8);
> +    bitmap_or(msr_bitmap->write_high, nvmx->msrbitmap->write_high,
> +              v->arch.hvm.vmx.msr_bitmap->write_high,
> +              sizeof(msr_bitmap->write_high) * 8);
> +
> +    /*
> +     * Nested VMX doesn't support any x2APIC hardware virtualization, so
> +     * make sure all the x2APIC MSRs are trapped.
> +     */
> +    ASSERT(!(__n2_secondary_exec_control(v) &
> +             (SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> +              SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) );
> +    for ( msr = MSR_X2APIC_FIRST; msr <= MSR_X2APIC_FIRST + 0xff; msr++ )
> +    {
> +        set_bit(msr, msr_bitmap->read_low);
> +        set_bit(msr, msr_bitmap->write_low);

Surely __set_bit() will suffice, if all the bitmap_or() above are
fine? Of course ultimately we will want to have something like
bitmap_fill_range() for purposes like this one.

> @@ -558,10 +617,15 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
>      shadow_cntrl = __n2_exec_control(v);
>      pio_cntrl &= shadow_cntrl;
>      /* Enforce the removed features */
> -    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_MSR_BITMAP
> -                      | CPU_BASED_ACTIVATE_IO_BITMAP
> +    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_IO_BITMAP
>                        | CPU_BASED_UNCOND_IO_EXITING);
> -    shadow_cntrl |= host_cntrl;
> +    /*
> +     * Do NOT enforce the MSR bitmap currently used by L1, as certain hardware
> +     * virtualization features require specific MSR bitmap settings, but without
> +     * using such features the bitmap could be leaking through unwanted MSR
> +     * accesses.

Perhaps "..., but without the guest also using these same features
..."? And then - why would a similar argument not apply to the I/O
bitmap as well?

> @@ -584,6 +648,9 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
>          __vmwrite(IO_BITMAP_B, virt_to_maddr(bitmap) + PAGE_SIZE);
>      }
>  
> +    if ( shadow_cntrl & CPU_BASED_ACTIVATE_MSR_BITMAP )
> +        update_msrbitmap(v);

In the function you assert the bit to be set in the vVMCS, but ...

>      /* TODO: change L0 intr window to MTF or NMI window */
>      __vmwrite(CPU_BASED_VM_EXEC_CONTROL, shadow_cntrl);

... it gets written only here.

> @@ -1351,6 +1418,9 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
>      vmx_update_secondary_exec_control(v);
>      vmx_update_exception_bitmap(v);
>  
> +    if ( v->arch.hvm.vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP )
> +        __vmwrite(MSR_BITMAP, virt_to_maddr(v->arch.hvm.vmx.msr_bitmap));
> +
>      load_vvmcs_host_state(v);

Wouldn't the addition better move into this function?

> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -37,7 +37,8 @@ struct nestedvmx {
>       */
>      paddr_t    vmxon_region_pa;
>      void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
> -    void       *msrbitmap;		/* map (va) of L1 guest MSR bitmap */
> +    struct vmx_msr_bitmap *msrbitmap;	/* map (va) of L1 guest MSR bitmap */
> +    struct page_info *msr_merged;       /* merged L1 and L1 guest MSR bitmap */

Either you convert the tab to spaces at least on the line you
change, or you use a tab (or two) as well on the line you add.

Jan

Roger Pau Monné Jan. 8, 2020, 2:44 p.m. UTC | #2

On Wed, Jan 08, 2020 at 01:55:42PM +0100, Jan Beulich wrote:
> On 08.01.2020 13:31, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -128,6 +128,16 @@ int nvmx_vcpu_initialise(struct vcpu *v)
> >          unmap_domain_page(vw);
> >      }
> >  
> > +    if ( cpu_has_vmx_msr_bitmap )
> > +    {
> > +        nvmx->msr_merged = alloc_domheap_page(NULL, 0);
> 
> Despite this matching other code in the same file, it's not really
> what you want, I think. Instead please consider using
> 
>         nvmx->msr_merged = alloc_domheap_page(d, MEMF_no_owner);
> 
> to honor d's NUMA properties.

Right.

> > @@ -182,6 +192,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
> >          free_domheap_page(v->arch.hvm.vmx.vmwrite_bitmap);
> >          v->arch.hvm.vmx.vmwrite_bitmap = NULL;
> >      }
> > +    if ( nvmx->msr_merged )
> > +    {
> > +        free_domheap_page(nvmx->msr_merged);
> > +        nvmx->msr_merged = NULL;
> 
> Hmm, I'm puzzled that we have FREE_XENHEAP_PAGE(), but no
> FREE_DOMHEAP_PAGE().
> 
> > @@ -548,6 +563,50 @@ unsigned long *_shadow_io_bitmap(struct vcpu *v)
> >      return nestedhvm_vcpu_iomap_get(port80, portED);
> >  }
> >  
> > +static void update_msrbitmap(struct vcpu *v)
> > +{
> > +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> > +    struct vmx_msr_bitmap *msr_bitmap;
> > +    unsigned int msr;
> > +
> > +    ASSERT(__n2_exec_control(v) & CPU_BASED_ACTIVATE_MSR_BITMAP);
> > +
> > +    if ( !nvmx->msrbitmap )
> > +        return;
> > +
> > +    msr_bitmap = __map_domain_page(nvmx->msr_merged);
> > +
> > +    bitmap_or(msr_bitmap->read_low, nvmx->msrbitmap->read_low,
> > +              v->arch.hvm.vmx.msr_bitmap->read_low,
> > +              sizeof(msr_bitmap->read_low) * 8);
> > +    bitmap_or(msr_bitmap->read_high, nvmx->msrbitmap->read_high,
> > +              v->arch.hvm.vmx.msr_bitmap->read_high,
> > +              sizeof(msr_bitmap->read_high) * 8);
> > +    bitmap_or(msr_bitmap->write_low, nvmx->msrbitmap->write_low,
> > +              v->arch.hvm.vmx.msr_bitmap->write_low,
> > +              sizeof(msr_bitmap->write_low) * 8);
> > +    bitmap_or(msr_bitmap->write_high, nvmx->msrbitmap->write_high,
> > +              v->arch.hvm.vmx.msr_bitmap->write_high,
> > +              sizeof(msr_bitmap->write_high) * 8);
> > +
> > +    /*
> > +     * Nested VMX doesn't support any x2APIC hardware virtualization, so
> > +     * make sure all the x2APIC MSRs are trapped.
> > +     */
> > +    ASSERT(!(__n2_secondary_exec_control(v) &
> > +             (SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> > +              SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) );
> > +    for ( msr = MSR_X2APIC_FIRST; msr <= MSR_X2APIC_FIRST + 0xff; msr++ )
> > +    {
> > +        set_bit(msr, msr_bitmap->read_low);
> > +        set_bit(msr, msr_bitmap->write_low);
> 
> Surely __set_bit() will suffice, if all the bitmap_or() above are
> fine? Of course ultimately we will want to have something like
> bitmap_fill_range() for purposes like this one.

Oh sure, and I already looked for a bitmap_fill_range when coding this
:).

> > @@ -558,10 +617,15 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
> >      shadow_cntrl = __n2_exec_control(v);
> >      pio_cntrl &= shadow_cntrl;
> >      /* Enforce the removed features */
> > -    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_MSR_BITMAP
> > -                      | CPU_BASED_ACTIVATE_IO_BITMAP
> > +    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_IO_BITMAP
> >                        | CPU_BASED_UNCOND_IO_EXITING);
> > -    shadow_cntrl |= host_cntrl;
> > +    /*
> > +     * Do NOT enforce the MSR bitmap currently used by L1, as certain hardware
> > +     * virtualization features require specific MSR bitmap settings, but without
> > +     * using such features the bitmap could be leaking through unwanted MSR
> > +     * accesses.
> 
> Perhaps "..., but without the guest also using these same features
> ..."? 

Ack

> And then - why would a similar argument not apply to the I/O
> bitmap as well?

Well, there seems to be some handling of IO bitmaps, see
nvmx_update_exec_control (note I'm not claiming such handling is
correct).

> > @@ -584,6 +648,9 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
> >          __vmwrite(IO_BITMAP_B, virt_to_maddr(bitmap) + PAGE_SIZE);
> >      }
> >  
> > +    if ( shadow_cntrl & CPU_BASED_ACTIVATE_MSR_BITMAP )
> > +        update_msrbitmap(v);
> 
> In the function you assert the bit to be set in the vVMCS, but ...

No, I assert the bit in the vvmcs as returned by __n2_exec_control,
which is not what gets written to the actual vmcs below.

> 
> >      /* TODO: change L0 intr window to MTF or NMI window */
> >      __vmwrite(CPU_BASED_VM_EXEC_CONTROL, shadow_cntrl);
> 
> ... it gets written only here.
> 
> > @@ -1351,6 +1418,9 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
> >      vmx_update_secondary_exec_control(v);
> >      vmx_update_exception_bitmap(v);
> >  
> > +    if ( v->arch.hvm.vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP )
> > +        __vmwrite(MSR_BITMAP, virt_to_maddr(v->arch.hvm.vmx.msr_bitmap));
> > +
> >      load_vvmcs_host_state(v);
> 
> Wouldn't the addition better move into this function?

I don't have a strong opinion TBH, it's all quite fuzzy as to which
function loads what IMO.

> > --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> > @@ -37,7 +37,8 @@ struct nestedvmx {
> >       */
> >      paddr_t    vmxon_region_pa;
> >      void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
> > -    void       *msrbitmap;		/* map (va) of L1 guest MSR bitmap */
> > +    struct vmx_msr_bitmap *msrbitmap;	/* map (va) of L1 guest MSR bitmap */
> > +    struct page_info *msr_merged;       /* merged L1 and L1 guest MSR bitmap */
> 
> Either you convert the tab to spaces at least on the line you
> change, or you use a tab (or two) as well on the line you add.

Will just use a tab instead.

Thanks, Roger.

Tian, Kevin Jan. 19, 2020, 4:21 a.m. UTC | #3

> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Wednesday, January 8, 2020 8:32 PM
> 
> Current implementation of nested VMX has a half baked handling of MSR
> bitmaps for the L1 VMM: it maps the L1 VMM provided MSR bitmap, but
> doesn't actually load it into the nested vmcs, and thus the nested
> guest vmcs ends up using the same MSR bitmap as the L1 VMM.
> 
> This is wrong as there's no assurance that the set of features enabled
> for the L1 vmcs are the same that L1 itself is going to use in the
> nested vmcs, and thus can lead to misconfigurations.
> 
> For example L1 vmcs can use x2APIC virtualization and virtual
> interrupt delivery, and thus some x2APIC MSRs won't be trapped so that
> they can be handled directly by the hardware using virtualization
> extensions. On the other hand, the nested vmcs created by L1 VMM might
> not use any of such features, so using a MSR bitmap that doesn't trap
> accesses to the x2APIC MSRs will be leaking them to the underlying
> hardware.
> 
> Fix this by crafting a merged MSR bitmap between the one used by L1
> and the nested guest, and make sure a nested vmcs MSR bitmap always
> traps accesses to the x2APIC MSR range, since hardware assisted x2APIC
> virtualization or virtual interrupt delivery are never available to
> L1 VMM.

can you split x2APIC fix into a separate patch? 

> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> This seems better than what's done currently, but TBH there's a lot of
> work to be done in nvmx in order to make it functional and secure that
> I'm not sure whether building on top of the current implementation is
> something sane to do, or it would be better to start from scratch and
> re-implement nvmx to just support the minimum required set of VTx
> features in a sane and safe way.
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c        | 76 ++++++++++++++++++++++++++++--
>  xen/include/asm-x86/hvm/vmx/vvmx.h |  3 +-
>  2 files changed, 75 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index af48b0beef..ad81de051b 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -128,6 +128,16 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>          unmap_domain_page(vw);
>      }
> 
> +    if ( cpu_has_vmx_msr_bitmap )
> +    {
> +        nvmx->msr_merged = alloc_domheap_page(NULL, 0);
> +        if ( !nvmx->msr_merged )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for MSR bitmap failed\n");
> +            return -ENOMEM;
> +        }
> +    }
> +
>      nvmx->ept.enabled = 0;
>      nvmx->guest_vpid = 0;
>      nvmx->vmxon_region_pa = INVALID_PADDR;
> @@ -182,6 +192,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          free_domheap_page(v->arch.hvm.vmx.vmwrite_bitmap);
>          v->arch.hvm.vmx.vmwrite_bitmap = NULL;
>      }
> +    if ( nvmx->msr_merged )
> +    {
> +        free_domheap_page(nvmx->msr_merged);
> +        nvmx->msr_merged = NULL;
> +    }
>  }
> 
>  void nvmx_domain_relinquish_resources(struct domain *d)
> @@ -548,6 +563,50 @@ unsigned long *_shadow_io_bitmap(struct vcpu *v)
>      return nestedhvm_vcpu_iomap_get(port80, portED);
>  }
> 
> +static void update_msrbitmap(struct vcpu *v)
> +{
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> +    struct vmx_msr_bitmap *msr_bitmap;
> +    unsigned int msr;
> +
> +    ASSERT(__n2_exec_control(v) & CPU_BASED_ACTIVATE_MSR_BITMAP);
> +
> +    if ( !nvmx->msrbitmap )
> +        return;
> +
> +    msr_bitmap = __map_domain_page(nvmx->msr_merged);
> +
> +    bitmap_or(msr_bitmap->read_low, nvmx->msrbitmap->read_low,
> +              v->arch.hvm.vmx.msr_bitmap->read_low,
> +              sizeof(msr_bitmap->read_low) * 8);
> +    bitmap_or(msr_bitmap->read_high, nvmx->msrbitmap->read_high,
> +              v->arch.hvm.vmx.msr_bitmap->read_high,
> +              sizeof(msr_bitmap->read_high) * 8);
> +    bitmap_or(msr_bitmap->write_low, nvmx->msrbitmap->write_low,
> +              v->arch.hvm.vmx.msr_bitmap->write_low,
> +              sizeof(msr_bitmap->write_low) * 8);
> +    bitmap_or(msr_bitmap->write_high, nvmx->msrbitmap->write_high,
> +              v->arch.hvm.vmx.msr_bitmap->write_high,
> +              sizeof(msr_bitmap->write_high) * 8);
> +
> +    /*
> +     * Nested VMX doesn't support any x2APIC hardware virtualization, so
> +     * make sure all the x2APIC MSRs are trapped.
> +     */
> +    ASSERT(!(__n2_secondary_exec_control(v) &
> +             (SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> +              SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) );
> +    for ( msr = MSR_X2APIC_FIRST; msr <= MSR_X2APIC_FIRST + 0xff; msr++ )
> +    {
> +        set_bit(msr, msr_bitmap->read_low);
> +        set_bit(msr, msr_bitmap->write_low);
> +    }
> +
> +    unmap_domain_page(msr_bitmap);
> +
> +    __vmwrite(MSR_BITMAP, page_to_maddr(nvmx->msr_merged));
> +}
> +
>  void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
>  {
>      u32 pio_cntrl = (CPU_BASED_ACTIVATE_IO_BITMAP
> @@ -558,10 +617,15 @@ void nvmx_update_exec_control(struct vcpu *v,
> u32 host_cntrl)
>      shadow_cntrl = __n2_exec_control(v);
>      pio_cntrl &= shadow_cntrl;
>      /* Enforce the removed features */
> -    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_MSR_BITMAP
> -                      | CPU_BASED_ACTIVATE_IO_BITMAP
> +    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_IO_BITMAP
>                        | CPU_BASED_UNCOND_IO_EXITING);
> -    shadow_cntrl |= host_cntrl;
> +    /*
> +     * Do NOT enforce the MSR bitmap currently used by L1, as certain
> hardware
> +     * virtualization features require specific MSR bitmap settings, but without
> +     * using such features the bitmap could be leaking through unwanted
> MSR
> +     * accesses.
> +     */
> +    shadow_cntrl |= (host_cntrl & ~CPU_BASED_ACTIVATE_MSR_BITMAP);
>      if ( pio_cntrl == CPU_BASED_UNCOND_IO_EXITING ) {
>          /* L1 VMM intercepts all I/O instructions */
>          shadow_cntrl |= CPU_BASED_UNCOND_IO_EXITING;
> @@ -584,6 +648,9 @@ void nvmx_update_exec_control(struct vcpu *v, u32
> host_cntrl)
>          __vmwrite(IO_BITMAP_B, virt_to_maddr(bitmap) + PAGE_SIZE);
>      }
> 
> +    if ( shadow_cntrl & CPU_BASED_ACTIVATE_MSR_BITMAP )
> +        update_msrbitmap(v);
> +
>      /* TODO: change L0 intr window to MTF or NMI window */
>      __vmwrite(CPU_BASED_VM_EXEC_CONTROL, shadow_cntrl);
>  }
> @@ -1351,6 +1418,9 @@ static void virtual_vmexit(struct cpu_user_regs
> *regs)
>      vmx_update_secondary_exec_control(v);
>      vmx_update_exception_bitmap(v);
> 
> +    if ( v->arch.hvm.vmx.exec_control &
> CPU_BASED_ACTIVATE_MSR_BITMAP )
> +        __vmwrite(MSR_BITMAP, virt_to_maddr(v-
> >arch.hvm.vmx.msr_bitmap));
> +
>      load_vvmcs_host_state(v);
> 
>      if ( lm_l1 != lm_l2 )
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-
> x86/hvm/vmx/vvmx.h
> index 6b9c4ae0b2..812c1d49bd 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -37,7 +37,8 @@ struct nestedvmx {
>       */
>      paddr_t    vmxon_region_pa;
>      void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
> -    void       *msrbitmap;		/* map (va) of L1 guest MSR bitmap
> */
> +    struct vmx_msr_bitmap *msrbitmap;	/* map (va) of L1 guest MSR
> bitmap */
> +    struct page_info *msr_merged;       /* merged L1 and L1 guest MSR
> bitmap */
>      /* deferred nested interrupt */
>      struct {
>          unsigned long intr_info;
> --
> 2.24.1

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index af48b0beef..ad81de051b 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -128,6 +128,16 @@  int nvmx_vcpu_initialise(struct vcpu *v)
         unmap_domain_page(vw);
     }
 
+    if ( cpu_has_vmx_msr_bitmap )
+    {
+        nvmx->msr_merged = alloc_domheap_page(NULL, 0);
+        if ( !nvmx->msr_merged )
+        {
+            gdprintk(XENLOG_ERR, "nest: allocation for MSR bitmap failed\n");
+            return -ENOMEM;
+        }
+    }
+
     nvmx->ept.enabled = 0;
     nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = INVALID_PADDR;
@@ -182,6 +192,11 @@  void nvmx_vcpu_destroy(struct vcpu *v)
         free_domheap_page(v->arch.hvm.vmx.vmwrite_bitmap);
         v->arch.hvm.vmx.vmwrite_bitmap = NULL;
     }
+    if ( nvmx->msr_merged )
+    {
+        free_domheap_page(nvmx->msr_merged);
+        nvmx->msr_merged = NULL;
+    }
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -548,6 +563,50 @@  unsigned long *_shadow_io_bitmap(struct vcpu *v)
     return nestedhvm_vcpu_iomap_get(port80, portED);
 }
 
+static void update_msrbitmap(struct vcpu *v)
+{
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+    struct vmx_msr_bitmap *msr_bitmap;
+    unsigned int msr;
+
+    ASSERT(__n2_exec_control(v) & CPU_BASED_ACTIVATE_MSR_BITMAP);
+
+    if ( !nvmx->msrbitmap )
+        return;
+
+    msr_bitmap = __map_domain_page(nvmx->msr_merged);
+
+    bitmap_or(msr_bitmap->read_low, nvmx->msrbitmap->read_low,
+              v->arch.hvm.vmx.msr_bitmap->read_low,
+              sizeof(msr_bitmap->read_low) * 8);
+    bitmap_or(msr_bitmap->read_high, nvmx->msrbitmap->read_high,
+              v->arch.hvm.vmx.msr_bitmap->read_high,
+              sizeof(msr_bitmap->read_high) * 8);
+    bitmap_or(msr_bitmap->write_low, nvmx->msrbitmap->write_low,
+              v->arch.hvm.vmx.msr_bitmap->write_low,
+              sizeof(msr_bitmap->write_low) * 8);
+    bitmap_or(msr_bitmap->write_high, nvmx->msrbitmap->write_high,
+              v->arch.hvm.vmx.msr_bitmap->write_high,
+              sizeof(msr_bitmap->write_high) * 8);
+
+    /*
+     * Nested VMX doesn't support any x2APIC hardware virtualization, so
+     * make sure all the x2APIC MSRs are trapped.
+     */
+    ASSERT(!(__n2_secondary_exec_control(v) &
+             (SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+              SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) );
+    for ( msr = MSR_X2APIC_FIRST; msr <= MSR_X2APIC_FIRST + 0xff; msr++ )
+    {
+        set_bit(msr, msr_bitmap->read_low);
+        set_bit(msr, msr_bitmap->write_low);
+    }
+
+    unmap_domain_page(msr_bitmap);
+
+    __vmwrite(MSR_BITMAP, page_to_maddr(nvmx->msr_merged));
+}
+
 void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
 {
     u32 pio_cntrl = (CPU_BASED_ACTIVATE_IO_BITMAP
@@ -558,10 +617,15 @@  void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
     shadow_cntrl = __n2_exec_control(v);
     pio_cntrl &= shadow_cntrl;
     /* Enforce the removed features */
-    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_MSR_BITMAP
-                      | CPU_BASED_ACTIVATE_IO_BITMAP
+    shadow_cntrl &= ~(CPU_BASED_ACTIVATE_IO_BITMAP
                       | CPU_BASED_UNCOND_IO_EXITING);
-    shadow_cntrl |= host_cntrl;
+    /*
+     * Do NOT enforce the MSR bitmap currently used by L1, as certain hardware
+     * virtualization features require specific MSR bitmap settings, but without
+     * using such features the bitmap could be leaking through unwanted MSR
+     * accesses.
+     */
+    shadow_cntrl |= (host_cntrl & ~CPU_BASED_ACTIVATE_MSR_BITMAP);
     if ( pio_cntrl == CPU_BASED_UNCOND_IO_EXITING ) {
         /* L1 VMM intercepts all I/O instructions */
         shadow_cntrl |= CPU_BASED_UNCOND_IO_EXITING;
@@ -584,6 +648,9 @@  void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
         __vmwrite(IO_BITMAP_B, virt_to_maddr(bitmap) + PAGE_SIZE);
     }
 
+    if ( shadow_cntrl & CPU_BASED_ACTIVATE_MSR_BITMAP )
+        update_msrbitmap(v);
+
     /* TODO: change L0 intr window to MTF or NMI window */
     __vmwrite(CPU_BASED_VM_EXEC_CONTROL, shadow_cntrl);
 }
@@ -1351,6 +1418,9 @@  static void virtual_vmexit(struct cpu_user_regs *regs)
     vmx_update_secondary_exec_control(v);
     vmx_update_exception_bitmap(v);
 
+    if ( v->arch.hvm.vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP )
+        __vmwrite(MSR_BITMAP, virt_to_maddr(v->arch.hvm.vmx.msr_bitmap));
+
     load_vvmcs_host_state(v);
 
     if ( lm_l1 != lm_l2 )
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 6b9c4ae0b2..812c1d49bd 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -37,7 +37,8 @@  struct nestedvmx {
      */
     paddr_t    vmxon_region_pa;
     void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
-    void       *msrbitmap;		/* map (va) of L1 guest MSR bitmap */
+    struct vmx_msr_bitmap *msrbitmap;	/* map (va) of L1 guest MSR bitmap */
+    struct page_info *msr_merged;       /* merged L1 and L1 guest MSR bitmap */
     /* deferred nested interrupt */
     struct {
         unsigned long intr_info;

nvmx: implement support for MSR bitmaps

Commit Message

Comments

Patch