mbox series

[v3,0/5] Handle monitor trap flag during instruction emulation

Message ID 20200207103608.110305-1-oupton@google.com (mailing list archive)
Headers show
Series Handle monitor trap flag during instruction emulation | expand

Message

Oliver Upton Feb. 7, 2020, 10:36 a.m. UTC
v1: http://lore.kernel.org/r/20200113221053.22053-1-oupton@google.com
v2: http://lore.kernel.org/r/20200128092715.69429-1-oupton@google.com

v1 => v2:
 - Don't split the #DB delivery by vendors. Unconditionally injecting
   #DB payloads into the 'pending debug exceptions' field will cause KVM
   to get stuck in a loop. Per the SDM, when hardware injects an event
   resulting from this field's value, it is checked against the
   exception interception bitmap.
 - Address Sean's comments by injecting the VM-exit into L1 from
   vmx_check_nested_events().
 - Added fix for nested INIT VM-exits + 'pending debug exceptions' field
   as it was noticed in implementing v2.
 - Drop Peter + Jim's Reviewed-by tags, as the patch set has changed
   since v1.

v2 => v3:
 - Merge the check/set_pending_dbg helpers into a single helper,
   vmx_update_pending_dbg(). Add clarifying comment to this helper.
 - Rewrite commit message, descriptive comment for change in 3/5 to
   explicitly describe the reason for mutating payload delivery
   behavior.
 - Undo the changes to kvm_vcpu_do_singlestep(). Instead, add a new hook
   to call for 'full' instruction emulation + 'fast' emulation.

KVM already provides guests the ability to use the 'monitor trap flag'
VM-execution control. Support for this flag is provided by the fact that
KVM unconditionally forwards MTF VM-exits to the guest (if requested),
as KVM doesn't utilize MTF. While this provides support during hardware
instruction execution, it is insufficient for instruction emulation.

Should L0 emulate an instruction on the behalf of L2, L0 should also
synthesize an MTF VM-exit into L1, should control be set.

The first patch corrects a nuanced difference between the definition of
a #DB exception payload field and DR6 register. Mask off bit 12 which is
defined in the 'pending debug exceptions' field when applying to DR6,
since the payload field is said to be compatible with the aforementioned
VMCS field.

The second patch sets the 'pending debug exceptions' VMCS field when
delivering an INIT signal VM-exit to L1, as described in the SDM. This
patch also introduces helpers for setting the 'pending debug exceptions'
VMCS field.

The third patch massages KVM's handling of exception payloads with
regard to API compatibility. Rather than immediately delivering the
payload w/o opt-in, instead defer the payload + inject
before completing a KVM_GET_VCPU_EVENTS. This maintains API
compatibility whilst correcting #DB behavior with regard to higher
priority VM-exit events.

Fourth patch introduces MTF implementation for emulated instructions.
Identify if an MTF is due on an instruction boundary from
kvm_vcpu_do_singlestep(), however only deliver this VM-exit from
vmx_check_nested_events() to respect the relative prioritization to
other VM-exits. Since this augments the nested state, introduce a new
flag for (de)serialization.

Last patch adds tests to kvm-unit-tests to assert the correctness of MTF
under several conditions (concurrent #DB trap, #DB fault, etc). These
tests pass under virtualization with this series as well as on
bare-metal.

Based on commit 2c2787938512 ("KVM: selftests: Stop memslot creation in
KVM internal memslot region").

Oliver Upton (4):
  KVM: x86: Mask off reserved bit from #DB exception payload
  KVM: nVMX: Handle pending #DB when injecting INIT VM-exit
  KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS
  KVM: nVMX: Emulate MTF when performing instruction emulation

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/svm.c              |  1 +
 arch/x86/kvm/vmx/nested.c       | 54 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/nested.h       |  5 +++
 arch/x86/kvm/vmx/vmx.c          | 37 +++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  3 ++
 arch/x86/kvm/x86.c              | 39 ++++++++++++++++--------
 8 files changed, 126 insertions(+), 15 deletions(-)

Comments

Paolo Bonzini Feb. 12, 2020, 11:34 a.m. UTC | #1
On 07/02/20 11:36, Oliver Upton wrote:
> v1: http://lore.kernel.org/r/20200113221053.22053-1-oupton@google.com
> v2: http://lore.kernel.org/r/20200128092715.69429-1-oupton@google.com
> 
> v1 => v2:
>  - Don't split the #DB delivery by vendors. Unconditionally injecting
>    #DB payloads into the 'pending debug exceptions' field will cause KVM
>    to get stuck in a loop. Per the SDM, when hardware injects an event
>    resulting from this field's value, it is checked against the
>    exception interception bitmap.
>  - Address Sean's comments by injecting the VM-exit into L1 from
>    vmx_check_nested_events().
>  - Added fix for nested INIT VM-exits + 'pending debug exceptions' field
>    as it was noticed in implementing v2.
>  - Drop Peter + Jim's Reviewed-by tags, as the patch set has changed
>    since v1.
> 
> v2 => v3:
>  - Merge the check/set_pending_dbg helpers into a single helper,
>    vmx_update_pending_dbg(). Add clarifying comment to this helper.
>  - Rewrite commit message, descriptive comment for change in 3/5 to
>    explicitly describe the reason for mutating payload delivery
>    behavior.
>  - Undo the changes to kvm_vcpu_do_singlestep(). Instead, add a new hook
>    to call for 'full' instruction emulation + 'fast' emulation.
> 
> KVM already provides guests the ability to use the 'monitor trap flag'
> VM-execution control. Support for this flag is provided by the fact that
> KVM unconditionally forwards MTF VM-exits to the guest (if requested),
> as KVM doesn't utilize MTF. While this provides support during hardware
> instruction execution, it is insufficient for instruction emulation.
> 
> Should L0 emulate an instruction on the behalf of L2, L0 should also
> synthesize an MTF VM-exit into L1, should control be set.
> 
> The first patch corrects a nuanced difference between the definition of
> a #DB exception payload field and DR6 register. Mask off bit 12 which is
> defined in the 'pending debug exceptions' field when applying to DR6,
> since the payload field is said to be compatible with the aforementioned
> VMCS field.
> 
> The second patch sets the 'pending debug exceptions' VMCS field when
> delivering an INIT signal VM-exit to L1, as described in the SDM. This
> patch also introduces helpers for setting the 'pending debug exceptions'
> VMCS field.
> 
> The third patch massages KVM's handling of exception payloads with
> regard to API compatibility. Rather than immediately delivering the
> payload w/o opt-in, instead defer the payload + inject
> before completing a KVM_GET_VCPU_EVENTS. This maintains API
> compatibility whilst correcting #DB behavior with regard to higher
> priority VM-exit events.
> 
> Fourth patch introduces MTF implementation for emulated instructions.
> Identify if an MTF is due on an instruction boundary from
> kvm_vcpu_do_singlestep(), however only deliver this VM-exit from
> vmx_check_nested_events() to respect the relative prioritization to
> other VM-exits. Since this augments the nested state, introduce a new
> flag for (de)serialization.
> 
> Last patch adds tests to kvm-unit-tests to assert the correctness of MTF
> under several conditions (concurrent #DB trap, #DB fault, etc). These
> tests pass under virtualization with this series as well as on
> bare-metal.
> 
> Based on commit 2c2787938512 ("KVM: selftests: Stop memslot creation in
> KVM internal memslot region").
> 
> Oliver Upton (4):
>   KVM: x86: Mask off reserved bit from #DB exception payload
>   KVM: nVMX: Handle pending #DB when injecting INIT VM-exit
>   KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS
>   KVM: nVMX: Emulate MTF when performing instruction emulation
> 
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/svm.c              |  1 +
>  arch/x86/kvm/vmx/nested.c       | 54 ++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/nested.h       |  5 +++
>  arch/x86/kvm/vmx/vmx.c          | 37 +++++++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.h          |  3 ++
>  arch/x86/kvm/x86.c              | 39 ++++++++++++++++--------
>  8 files changed, 126 insertions(+), 15 deletions(-)
> 

Queued (for 5.6-rc2), thanks.

Paolo
Oliver Upton Feb. 27, 2020, 12:22 a.m. UTC | #2
On Wed, Feb 12, 2020 at 3:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 07/02/20 11:36, Oliver Upton wrote:
> > v1: http://lore.kernel.org/r/20200113221053.22053-1-oupton@google.com
> > v2: http://lore.kernel.org/r/20200128092715.69429-1-oupton@google.com
> >
> > v1 => v2:
> >  - Don't split the #DB delivery by vendors. Unconditionally injecting
> >    #DB payloads into the 'pending debug exceptions' field will cause KVM
> >    to get stuck in a loop. Per the SDM, when hardware injects an event
> >    resulting from this field's value, it is checked against the
> >    exception interception bitmap.
> >  - Address Sean's comments by injecting the VM-exit into L1 from
> >    vmx_check_nested_events().
> >  - Added fix for nested INIT VM-exits + 'pending debug exceptions' field
> >    as it was noticed in implementing v2.
> >  - Drop Peter + Jim's Reviewed-by tags, as the patch set has changed
> >    since v1.
> >
> > v2 => v3:
> >  - Merge the check/set_pending_dbg helpers into a single helper,
> >    vmx_update_pending_dbg(). Add clarifying comment to this helper.
> >  - Rewrite commit message, descriptive comment for change in 3/5 to
> >    explicitly describe the reason for mutating payload delivery
> >    behavior.
> >  - Undo the changes to kvm_vcpu_do_singlestep(). Instead, add a new hook
> >    to call for 'full' instruction emulation + 'fast' emulation.
> >
> > KVM already provides guests the ability to use the 'monitor trap flag'
> > VM-execution control. Support for this flag is provided by the fact that
> > KVM unconditionally forwards MTF VM-exits to the guest (if requested),
> > as KVM doesn't utilize MTF. While this provides support during hardware
> > instruction execution, it is insufficient for instruction emulation.
> >
> > Should L0 emulate an instruction on the behalf of L2, L0 should also
> > synthesize an MTF VM-exit into L1, should control be set.
> >
> > The first patch corrects a nuanced difference between the definition of
> > a #DB exception payload field and DR6 register. Mask off bit 12 which is
> > defined in the 'pending debug exceptions' field when applying to DR6,
> > since the payload field is said to be compatible with the aforementioned
> > VMCS field.
> >
> > The second patch sets the 'pending debug exceptions' VMCS field when
> > delivering an INIT signal VM-exit to L1, as described in the SDM. This
> > patch also introduces helpers for setting the 'pending debug exceptions'
> > VMCS field.
> >
> > The third patch massages KVM's handling of exception payloads with
> > regard to API compatibility. Rather than immediately delivering the
> > payload w/o opt-in, instead defer the payload + inject
> > before completing a KVM_GET_VCPU_EVENTS. This maintains API
> > compatibility whilst correcting #DB behavior with regard to higher
> > priority VM-exit events.
> >
> > Fourth patch introduces MTF implementation for emulated instructions.
> > Identify if an MTF is due on an instruction boundary from
> > kvm_vcpu_do_singlestep(), however only deliver this VM-exit from
> > vmx_check_nested_events() to respect the relative prioritization to
> > other VM-exits. Since this augments the nested state, introduce a new
> > flag for (de)serialization.
> >
> > Last patch adds tests to kvm-unit-tests to assert the correctness of MTF
> > under several conditions (concurrent #DB trap, #DB fault, etc). These
> > tests pass under virtualization with this series as well as on
> > bare-metal.
> >
> > Based on commit 2c2787938512 ("KVM: selftests: Stop memslot creation in
> > KVM internal memslot region").
> >
> > Oliver Upton (4):
> >   KVM: x86: Mask off reserved bit from #DB exception payload
> >   KVM: nVMX: Handle pending #DB when injecting INIT VM-exit
> >   KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS
> >   KVM: nVMX: Emulate MTF when performing instruction emulation
> >
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/include/uapi/asm/kvm.h |  1 +
> >  arch/x86/kvm/svm.c              |  1 +
> >  arch/x86/kvm/vmx/nested.c       | 54 ++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/vmx/nested.h       |  5 +++
> >  arch/x86/kvm/vmx/vmx.c          | 37 +++++++++++++++++++++-
> >  arch/x86/kvm/vmx/vmx.h          |  3 ++
> >  arch/x86/kvm/x86.c              | 39 ++++++++++++++++--------
> >  8 files changed, 126 insertions(+), 15 deletions(-)
> >
>
> Queued (for 5.6-rc2), thanks.
>
> Paolo
>

Are there any strong opinions about how the newly introduced nested
state should be handled across live migrations? When applying this
patch set internally I realized live migration would be busted in the
case of a kernel rollback (i.e. a kernel with this patchset emits the
nested state, kernel w/o receives it + refuses).

Easy fix is to only turn on the feature once it is rollback-proof, but
I wonder if there is any room for improvement on this topic..

--
Thanks,
Oliver
Paolo Bonzini Feb. 27, 2020, 6:37 a.m. UTC | #3
On 27/02/20 01:22, Oliver Upton wrote:
> Are there any strong opinions about how the newly introduced nested
> state should be handled across live migrations? When applying this
> patch set internally I realized live migration would be busted in the
> case of a kernel rollback (i.e. a kernel with this patchset emits the
> nested state, kernel w/o receives it + refuses).

Only if you use MTF + emulation.  In this case it's a pure bugfix so
it's okay to break backwards migration.  If it's really a new feature,
it should support KVM_ENABLE_CAP to enable it.

Paolo

> Easy fix is to only turn on the feature once it is rollback-proof, but
> I wonder if there is any room for improvement on this topic..
Oliver Upton Feb. 27, 2020, 9:11 a.m. UTC | #4
On Wed, Feb 26, 2020 at 10:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 27/02/20 01:22, Oliver Upton wrote:
> > Are there any strong opinions about how the newly introduced nested
> > state should be handled across live migrations? When applying this
> > patch set internally I realized live migration would be busted in the
> > case of a kernel rollback (i.e. a kernel with this patchset emits the
> > nested state, kernel w/o receives it + refuses).
>
> Only if you use MTF + emulation.  In this case it's a pure bugfix so
> it's okay to break backwards migration.  If it's really a new feature,
> it should support KVM_ENABLE_CAP to enable it.
>
> Paolo
>

True. I suppose I've conflated the pure bugfix here with the fact that
MTF is new to us.

Thanks Paolo!

--
Best,
Oliver

> > Easy fix is to only turn on the feature once it is rollback-proof, but
> > I wonder if there is any room for improvement on this topic..
>