Message ID | 20181008182945.79957-5-jmattson@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/5] kvm: x86: Add payload to kvm_queued_exception and kvm_vcpu_events | expand |
> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote: > > This is a per-VM capability which can be enabled by userspace so that > the faulting linear address will be included with the information > about a pending #PF in L2, and the "new DR6 bits" will be included > with the information about a pending #DB in L2. With this capability > enabled, the L1 hypervisor can now intercept #PF before CR2 is > modified. Under VMX, the L1 hypervisor can now intercept #DB before > DR6 and DR7 are modified. > > When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should > generally provide an appropriate payload when injecting a #PF or #DB > exception via KVM_SET_VCPU_EVENTS. However, to support restoring old > checkpoints, this payload is not required. > > Note that bit 16 of the "new DR6 bits" is set to indicate that a debug > exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM > region while advanced debugging of RTM transactional regions was > enabled. This is the reverse of DR6.RTM, which is cleared in this > scenario. > > Reported-by: Jim Mattson <jmattson@google.com> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Jim Mattson <jmattson@google.com> > Reviewed-by: Peter Shier <pshier@google.com> > --- > Documentation/virtual/kvm/api.txt | 22 ++++++++++++++++++++++ > arch/x86/kvm/x86.c | 5 +++++ > include/uapi/linux/kvm.h | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 2df2cca81cf5..bb2b8bc0ffe0 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4540,6 +4540,28 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, > a #GP would be raised when the guest tries to access. Currently, this > capability does not enable write permissions of this MSR for the guest. > > +7.15 KVM_CAP_EXCEPTION_PAYLOAD > + > +Architectures: x86 > +Parameters: args[0] whether feature should be enabled or not > + > +With this capability enabled, CR2 will not be modified prior to the > +emulated VM-exit when L1 intercepts a #PF exception that occurs in > +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to > +the emulated VM-exit when L1 intercepts a #DB exception that occurs in > +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or > +#DB) exception for L2, exception.has_payload will be set and the > +faulting address (or the new DR6 bits*) will be reported in the > +exception_payload field. Similarly, when userspace injects a #PF (or > +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set > +exception.has_payload and to put the faulting address (or the new DR6 > +bits*) in the exception_payload field. > + > +There is no change in behavior for exceptions that occur in L1. > + > +* For the new DR6 bits, note that bit 16 is set iff the #DB exception > + will clear DR6.RTM. > + > 8. Other capabilities. > ---------------------- > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 33e171e6d067..bcfcfa813c90 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2994,6 +2994,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_GET_MSR_FEATURES: > case KVM_CAP_MSR_PLATFORM_INFO: > + case KVM_CAP_EXCEPTION_PAYLOAD: > r = 1; > break; > case KVM_CAP_SYNC_REGS: > @@ -4443,6 +4444,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > kvm->arch.guest_can_read_msr_platform_info = cap->args[0]; > r = 0; > break; > + case KVM_CAP_EXCEPTION_PAYLOAD: > + kvm->arch.exception_payload_enabled = cap->args[0]; > + r = 0; > + break; > default: > r = -EINVAL; > break; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 251be353f950..531da3d1fd55 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -953,6 +953,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_NESTED_STATE 157 > #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 > #define KVM_CAP_MSR_PLATFORM_INFO 159 > +#define KVM_CAP_EXCEPTION_PAYLOAD 160 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- > 2.19.0.605.g01d371f741-goog > Patch itself looks fine: Reviewed-by: Liran Alon <liran.alon@oracle.com> A couple of general notes regarding series: 1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”) verifies that indeed intercept on #DB is delivered before DR6 is modified. It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified. 2) Similar kvm-unit-tests should also be written for nSVM. 3) I think now we have the needed framework to also fix kvm_vcpu_ioctl_x86_get_vcpu_events() and kvm_vcpu_ioctl_x86_set_vcpu_events() to pass exception.pending and exception.injected separately. Do you think this work should be done at the end of this patch series or a separate one once this is applied? -Liran
I'm happy to do the kvm-unit-tests for (1) and (2). The subtlety of exception.pending and exception.injected is lost on me. We do need to handle pending debug exceptions in a MOV-SS shadow, but I don't think that's what you're talking about. Can you explain? On Tue, Oct 9, 2018 at 5:33 AM, Liran Alon <liran.alon@oracle.com> wrote: > > >> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote: >> >> This is a per-VM capability which can be enabled by userspace so that >> the faulting linear address will be included with the information >> about a pending #PF in L2, and the "new DR6 bits" will be included >> with the information about a pending #DB in L2. With this capability >> enabled, the L1 hypervisor can now intercept #PF before CR2 is >> modified. Under VMX, the L1 hypervisor can now intercept #DB before >> DR6 and DR7 are modified. >> >> When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should >> generally provide an appropriate payload when injecting a #PF or #DB >> exception via KVM_SET_VCPU_EVENTS. However, to support restoring old >> checkpoints, this payload is not required. >> >> Note that bit 16 of the "new DR6 bits" is set to indicate that a debug >> exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM >> region while advanced debugging of RTM transactional regions was >> enabled. This is the reverse of DR6.RTM, which is cleared in this >> scenario. >> >> Reported-by: Jim Mattson <jmattson@google.com> >> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Jim Mattson <jmattson@google.com> >> Reviewed-by: Peter Shier <pshier@google.com> >> --- >> Documentation/virtual/kvm/api.txt | 22 ++++++++++++++++++++++ >> arch/x86/kvm/x86.c | 5 +++++ >> include/uapi/linux/kvm.h | 1 + >> 3 files changed, 28 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index 2df2cca81cf5..bb2b8bc0ffe0 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -4540,6 +4540,28 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, >> a #GP would be raised when the guest tries to access. Currently, this >> capability does not enable write permissions of this MSR for the guest. >> >> +7.15 KVM_CAP_EXCEPTION_PAYLOAD >> + >> +Architectures: x86 >> +Parameters: args[0] whether feature should be enabled or not >> + >> +With this capability enabled, CR2 will not be modified prior to the >> +emulated VM-exit when L1 intercepts a #PF exception that occurs in >> +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to >> +the emulated VM-exit when L1 intercepts a #DB exception that occurs in >> +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or >> +#DB) exception for L2, exception.has_payload will be set and the >> +faulting address (or the new DR6 bits*) will be reported in the >> +exception_payload field. Similarly, when userspace injects a #PF (or >> +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set >> +exception.has_payload and to put the faulting address (or the new DR6 >> +bits*) in the exception_payload field. >> + >> +There is no change in behavior for exceptions that occur in L1. >> + >> +* For the new DR6 bits, note that bit 16 is set iff the #DB exception >> + will clear DR6.RTM. >> + >> 8. Other capabilities. >> ---------------------- >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 33e171e6d067..bcfcfa813c90 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -2994,6 +2994,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >> case KVM_CAP_IMMEDIATE_EXIT: >> case KVM_CAP_GET_MSR_FEATURES: >> case KVM_CAP_MSR_PLATFORM_INFO: >> + case KVM_CAP_EXCEPTION_PAYLOAD: >> r = 1; >> break; >> case KVM_CAP_SYNC_REGS: >> @@ -4443,6 +4444,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >> kvm->arch.guest_can_read_msr_platform_info = cap->args[0]; >> r = 0; >> break; >> + case KVM_CAP_EXCEPTION_PAYLOAD: >> + kvm->arch.exception_payload_enabled = cap->args[0]; >> + r = 0; >> + break; >> default: >> r = -EINVAL; >> break; >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index 251be353f950..531da3d1fd55 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -953,6 +953,7 @@ struct kvm_ppc_resize_hpt { >> #define KVM_CAP_NESTED_STATE 157 >> #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 >> #define KVM_CAP_MSR_PLATFORM_INFO 159 >> +#define KVM_CAP_EXCEPTION_PAYLOAD 160 >> >> #ifdef KVM_CAP_IRQ_ROUTING >> >> -- >> 2.19.0.605.g01d371f741-goog >> > > Patch itself looks fine: > Reviewed-by: Liran Alon <liran.alon@oracle.com> > > A couple of general notes regarding series: > 1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”) > verifies that indeed intercept on #DB is delivered before DR6 is modified. > It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified. > 2) Similar kvm-unit-tests should also be written for nSVM. > 3) I think now we have the needed framework to also fix > kvm_vcpu_ioctl_x86_get_vcpu_events() and kvm_vcpu_ioctl_x86_set_vcpu_events() > to pass exception.pending and exception.injected separately. > Do you think this work should be done at the end of this patch series or a separate one once this is applied? > > -Liran > >
> On 9 Oct 2018, at 17:01, Jim Mattson <jmattson@google.com> wrote: > > I'm happy to do the kvm-unit-tests for (1) and (2). The subtlety of > exception.pending and exception.injected is lost on me. We do need to > handle pending debug exceptions in a MOV-SS shadow, but I don't think > that's what you're talking about. Can you explain? Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected). Which means userspace have no way of knowing if exception was re-injected and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted. Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending to the value of events->exception.injected. Which means userspace only have the ability to inject exceptions which cannot be intercepted by a L1 guest. Before this series, userspace must have assumed that exception cannot be intercepted as exception side-effects (payload) is assumed to already been done by userspace. Now however, it is possible for userspace to set a pending exception via IOCTL together with it’s pending exception such that it can be correctly intercepted by L1. -Liran > > On Tue, Oct 9, 2018 at 5:33 AM, Liran Alon <liran.alon@oracle.com> wrote: >> >> >>> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote: >>> >>> This is a per-VM capability which can be enabled by userspace so that >>> the faulting linear address will be included with the information >>> about a pending #PF in L2, and the "new DR6 bits" will be included >>> with the information about a pending #DB in L2. With this capability >>> enabled, the L1 hypervisor can now intercept #PF before CR2 is >>> modified. Under VMX, the L1 hypervisor can now intercept #DB before >>> DR6 and DR7 are modified. >>> >>> When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should >>> generally provide an appropriate payload when injecting a #PF or #DB >>> exception via KVM_SET_VCPU_EVENTS. However, to support restoring old >>> checkpoints, this payload is not required. >>> >>> Note that bit 16 of the "new DR6 bits" is set to indicate that a debug >>> exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM >>> region while advanced debugging of RTM transactional regions was >>> enabled. This is the reverse of DR6.RTM, which is cleared in this >>> scenario. >>> >>> Reported-by: Jim Mattson <jmattson@google.com> >>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> >>> Signed-off-by: Jim Mattson <jmattson@google.com> >>> Reviewed-by: Peter Shier <pshier@google.com> >> >> Patch itself looks fine: >> Reviewed-by: Liran Alon <liran.alon@oracle.com> >> >> A couple of general notes regarding series: >> 1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”) >> verifies that indeed intercept on #DB is delivered before DR6 is modified. >> It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified. >> 2) Similar kvm-unit-tests should also be written for nSVM. >> 3) I think now we have the needed framework to also fix >> kvm_vcpu_ioctl_x86_get_vcpu_events() and kvm_vcpu_ioctl_x86_set_vcpu_events() >> to pass exception.pending and exception.injected separately. >> Do you think this work should be done at the end of this patch series or a separate one once this is applied? >> >> -Liran >> >>
On Tue, Oct 9, 2018 at 10:04 AM, Liran Alon <liran.alon@oracle.com> wrote: > Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected > the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected). > Which means userspace have no way of knowing if exception was re-injected > and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted. > > Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending > to the value of events->exception.injected. Which means userspace only have the > ability to inject exceptions which cannot be intercepted by a L1 guest. > > Before this series, userspace must have assumed that exception cannot be intercepted > as exception side-effects (payload) is assumed to already been done by userspace. > Now however, it is possible for userspace to set a pending exception via IOCTL together > with it’s pending exception such that it can be correctly intercepted by L1. Yes, that seems like a mess that should be cleaned up and tied to the new capability.
> On 9 Oct 2018, at 20:54, Jim Mattson <jmattson@google.com> wrote: > > On Tue, Oct 9, 2018 at 10:04 AM, Liran Alon <liran.alon@oracle.com> wrote: > >> Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected >> the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected). >> Which means userspace have no way of knowing if exception was re-injected >> and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted. >> >> Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending >> to the value of events->exception.injected. Which means userspace only have the >> ability to inject exceptions which cannot be intercepted by a L1 guest. >> >> Before this series, userspace must have assumed that exception cannot be intercepted >> as exception side-effects (payload) is assumed to already been done by userspace. >> Now however, it is possible for userspace to set a pending exception via IOCTL together >> with it’s pending exception such that it can be correctly intercepted by L1. > > Yes, that seems like a mess that should be cleaned up and tied to the > new capability. That’s why I asked if you plan to cleanup this mess as-well in v2 of this series or should I in a separate one? :P -Liran
On Tue, Oct 9, 2018 at 11:12 AM, Liran Alon <liran.alon@oracle.com> wrote: > That’s why I asked if you plan to cleanup this mess as-well in v2 of this series > or should I in a separate one? :P I'll take a look in v2.
On 09/10/2018 19:54, Jim Mattson wrote: >> Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected >> the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected). >> Which means userspace have no way of knowing if exception was re-injected >> and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted. >> >> Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending >> to the value of events->exception.injected. Which means userspace only have the >> ability to inject exceptions which cannot be intercepted by a L1 guest. >> >> Before this series, userspace must have assumed that exception cannot be intercepted >> as exception side-effects (payload) is assumed to already been done by userspace. >> Now however, it is possible for userspace to set a pending exception via IOCTL together >> with it’s pending exception such that it can be correctly intercepted by L1. > Yes, that seems like a mess that should be cleaned up and tied to the > new capability. Or perhaps it _is_ actually better to have a KVM_GET/SET_VCPU_EVENTS2 (or event just the "set") that is unrelated to the capability. Maybe that would make the code cleaner. Paolo
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 2df2cca81cf5..bb2b8bc0ffe0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4540,6 +4540,28 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, a #GP would be raised when the guest tries to access. Currently, this capability does not enable write permissions of this MSR for the guest. +7.15 KVM_CAP_EXCEPTION_PAYLOAD + +Architectures: x86 +Parameters: args[0] whether feature should be enabled or not + +With this capability enabled, CR2 will not be modified prior to the +emulated VM-exit when L1 intercepts a #PF exception that occurs in +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to +the emulated VM-exit when L1 intercepts a #DB exception that occurs in +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or +#DB) exception for L2, exception.has_payload will be set and the +faulting address (or the new DR6 bits*) will be reported in the +exception_payload field. Similarly, when userspace injects a #PF (or +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set +exception.has_payload and to put the faulting address (or the new DR6 +bits*) in the exception_payload field. + +There is no change in behavior for exceptions that occur in L1. + +* For the new DR6 bits, note that bit 16 is set iff the #DB exception + will clear DR6.RTM. + 8. Other capabilities. ---------------------- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 33e171e6d067..bcfcfa813c90 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2994,6 +2994,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_GET_MSR_FEATURES: case KVM_CAP_MSR_PLATFORM_INFO: + case KVM_CAP_EXCEPTION_PAYLOAD: r = 1; break; case KVM_CAP_SYNC_REGS: @@ -4443,6 +4444,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, kvm->arch.guest_can_read_msr_platform_info = cap->args[0]; r = 0; break; + case KVM_CAP_EXCEPTION_PAYLOAD: + kvm->arch.exception_payload_enabled = cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 251be353f950..531da3d1fd55 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -953,6 +953,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_NESTED_STATE 157 #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 #define KVM_CAP_MSR_PLATFORM_INFO 159 +#define KVM_CAP_EXCEPTION_PAYLOAD 160 #ifdef KVM_CAP_IRQ_ROUTING