[5/5] kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD

Message ID	20181008182945.79957-5-jmattson@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Date: Mon, 8 Oct 2018 11:29:45 -0700 In-Reply-To: <20181008182945.79957-1-jmattson@google.com> Message-Id: <20181008182945.79957-5-jmattson@google.com> Mime-Version: 1.0 References: <20181008182945.79957-1-jmattson@google.com> Subject: [PATCH 5/5] kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Peter Shier <pshier@google.com> Cc: Jim Mattson <jmattson@google.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	[1/5] kvm: x86: Add payload to kvm_queued_exception and kvm_vcpu_events \| expand [1/5] kvm: x86: Add payload to kvm_queued_exception and kvm_vcpu_events [2/5] kvm: x86: Add payload operands to kvm_multiple_exception [3/5] kvm: x86: Defer setting of CR2 until #PF delivery [4/5] kvm: vmx: Defer setting of DR6 until #DB delivery [5/5] kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD

Jim Mattson Oct. 8, 2018, 6:29 p.m. UTC

This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.

When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.

Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.

Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
---
 Documentation/virtual/kvm/api.txt | 22 ++++++++++++++++++++++
 arch/x86/kvm/x86.c                |  5 +++++
 include/uapi/linux/kvm.h          |  1 +
 3 files changed, 28 insertions(+)

Liran Alon Oct. 9, 2018, 12:33 p.m. UTC | #1

> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote:
> 
> This is a per-VM capability which can be enabled by userspace so that
> the faulting linear address will be included with the information
> about a pending #PF in L2, and the "new DR6 bits" will be included
> with the information about a pending #DB in L2. With this capability
> enabled, the L1 hypervisor can now intercept #PF before CR2 is
> modified. Under VMX, the L1 hypervisor can now intercept #DB before
> DR6 and DR7 are modified.
> 
> When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
> generally provide an appropriate payload when injecting a #PF or #DB
> exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
> checkpoints, this payload is not required.
> 
> Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
> exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
> region while advanced debugging of RTM transactional regions was
> enabled. This is the reverse of DR6.RTM, which is cleared in this
> scenario.
> 
> Reported-by: Jim Mattson <jmattson@google.com>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> ---
> Documentation/virtual/kvm/api.txt | 22 ++++++++++++++++++++++
> arch/x86/kvm/x86.c                |  5 +++++
> include/uapi/linux/kvm.h          |  1 +
> 3 files changed, 28 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 2df2cca81cf5..bb2b8bc0ffe0 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4540,6 +4540,28 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
> a #GP would be raised when the guest tries to access. Currently, this
> capability does not enable write permissions of this MSR for the guest.
> 
> +7.15 KVM_CAP_EXCEPTION_PAYLOAD
> +
> +Architectures: x86
> +Parameters: args[0] whether feature should be enabled or not
> +
> +With this capability enabled, CR2 will not be modified prior to the
> +emulated VM-exit when L1 intercepts a #PF exception that occurs in
> +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
> +the emulated VM-exit when L1 intercepts a #DB exception that occurs in
> +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
> +#DB) exception for L2, exception.has_payload will be set and the
> +faulting address (or the new DR6 bits*) will be reported in the
> +exception_payload field. Similarly, when userspace injects a #PF (or
> +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
> +exception.has_payload and to put the faulting address (or the new DR6
> +bits*) in the exception_payload field.
> +
> +There is no change in behavior for exceptions that occur in L1.
> +
> +* For the new DR6 bits, note that bit 16 is set iff the #DB exception
> +  will clear DR6.RTM.
> +
> 8. Other capabilities.
> ----------------------
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 33e171e6d067..bcfcfa813c90 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2994,6 +2994,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> 	case KVM_CAP_IMMEDIATE_EXIT:
> 	case KVM_CAP_GET_MSR_FEATURES:
> 	case KVM_CAP_MSR_PLATFORM_INFO:
> +	case KVM_CAP_EXCEPTION_PAYLOAD:
> 		r = 1;
> 		break;
> 	case KVM_CAP_SYNC_REGS:
> @@ -4443,6 +4444,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> 		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
> 		r = 0;
> 		break;
> +	case KVM_CAP_EXCEPTION_PAYLOAD:
> +		kvm->arch.exception_payload_enabled = cap->args[0];
> +		r = 0;
> +		break;
> 	default:
> 		r = -EINVAL;
> 		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 251be353f950..531da3d1fd55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -953,6 +953,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_NESTED_STATE 157
> #define KVM_CAP_ARM_INJECT_SERROR_ESR 158
> #define KVM_CAP_MSR_PLATFORM_INFO 159
> +#define KVM_CAP_EXCEPTION_PAYLOAD 160
> 
> #ifdef KVM_CAP_IRQ_ROUTING
> 
> -- 
> 2.19.0.605.g01d371f741-goog
> 

Patch itself looks fine:
Reviewed-by: Liran Alon <liran.alon@oracle.com>

A couple of general notes regarding series:
1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”)
verifies that indeed intercept on #DB is delivered before DR6 is modified.
It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified.
2) Similar kvm-unit-tests should also be written for nSVM.
3) I think now we have the needed framework to also fix
kvm_vcpu_ioctl_x86_get_vcpu_events() and  kvm_vcpu_ioctl_x86_set_vcpu_events()
to pass exception.pending and exception.injected separately.
Do you think this work should be done at the end of this patch series or a separate one once this is applied?

-Liran

Jim Mattson Oct. 9, 2018, 2:01 p.m. UTC | #2

I'm happy to do the kvm-unit-tests for (1) and (2). The subtlety of
exception.pending and exception.injected is lost on me. We do need to
handle pending debug exceptions in a MOV-SS shadow, but I don't think
that's what you're talking about. Can you explain?

On Tue, Oct 9, 2018 at 5:33 AM, Liran Alon <liran.alon@oracle.com> wrote:
>
>
>> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote:
>>
>> This is a per-VM capability which can be enabled by userspace so that
>> the faulting linear address will be included with the information
>> about a pending #PF in L2, and the "new DR6 bits" will be included
>> with the information about a pending #DB in L2. With this capability
>> enabled, the L1 hypervisor can now intercept #PF before CR2 is
>> modified. Under VMX, the L1 hypervisor can now intercept #DB before
>> DR6 and DR7 are modified.
>>
>> When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
>> generally provide an appropriate payload when injecting a #PF or #DB
>> exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
>> checkpoints, this payload is not required.
>>
>> Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
>> exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
>> region while advanced debugging of RTM transactional regions was
>> enabled. This is the reverse of DR6.RTM, which is cleared in this
>> scenario.
>>
>> Reported-by: Jim Mattson <jmattson@google.com>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Jim Mattson <jmattson@google.com>
>> Reviewed-by: Peter Shier <pshier@google.com>
>> ---
>> Documentation/virtual/kvm/api.txt | 22 ++++++++++++++++++++++
>> arch/x86/kvm/x86.c                |  5 +++++
>> include/uapi/linux/kvm.h          |  1 +
>> 3 files changed, 28 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 2df2cca81cf5..bb2b8bc0ffe0 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4540,6 +4540,28 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
>> a #GP would be raised when the guest tries to access. Currently, this
>> capability does not enable write permissions of this MSR for the guest.
>>
>> +7.15 KVM_CAP_EXCEPTION_PAYLOAD
>> +
>> +Architectures: x86
>> +Parameters: args[0] whether feature should be enabled or not
>> +
>> +With this capability enabled, CR2 will not be modified prior to the
>> +emulated VM-exit when L1 intercepts a #PF exception that occurs in
>> +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
>> +the emulated VM-exit when L1 intercepts a #DB exception that occurs in
>> +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
>> +#DB) exception for L2, exception.has_payload will be set and the
>> +faulting address (or the new DR6 bits*) will be reported in the
>> +exception_payload field. Similarly, when userspace injects a #PF (or
>> +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
>> +exception.has_payload and to put the faulting address (or the new DR6
>> +bits*) in the exception_payload field.
>> +
>> +There is no change in behavior for exceptions that occur in L1.
>> +
>> +* For the new DR6 bits, note that bit 16 is set iff the #DB exception
>> +  will clear DR6.RTM.
>> +
>> 8. Other capabilities.
>> ----------------------
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 33e171e6d067..bcfcfa813c90 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2994,6 +2994,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>       case KVM_CAP_IMMEDIATE_EXIT:
>>       case KVM_CAP_GET_MSR_FEATURES:
>>       case KVM_CAP_MSR_PLATFORM_INFO:
>> +     case KVM_CAP_EXCEPTION_PAYLOAD:
>>               r = 1;
>>               break;
>>       case KVM_CAP_SYNC_REGS:
>> @@ -4443,6 +4444,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>               kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
>>               r = 0;
>>               break;
>> +     case KVM_CAP_EXCEPTION_PAYLOAD:
>> +             kvm->arch.exception_payload_enabled = cap->args[0];
>> +             r = 0;
>> +             break;
>>       default:
>>               r = -EINVAL;
>>               break;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 251be353f950..531da3d1fd55 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -953,6 +953,7 @@ struct kvm_ppc_resize_hpt {
>> #define KVM_CAP_NESTED_STATE 157
>> #define KVM_CAP_ARM_INJECT_SERROR_ESR 158
>> #define KVM_CAP_MSR_PLATFORM_INFO 159
>> +#define KVM_CAP_EXCEPTION_PAYLOAD 160
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>> --
>> 2.19.0.605.g01d371f741-goog
>>
>
> Patch itself looks fine:
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
> A couple of general notes regarding series:
> 1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”)
> verifies that indeed intercept on #DB is delivered before DR6 is modified.
> It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified.
> 2) Similar kvm-unit-tests should also be written for nSVM.
> 3) I think now we have the needed framework to also fix
> kvm_vcpu_ioctl_x86_get_vcpu_events() and  kvm_vcpu_ioctl_x86_set_vcpu_events()
> to pass exception.pending and exception.injected separately.
> Do you think this work should be done at the end of this patch series or a separate one once this is applied?
>
> -Liran
>
>

Liran Alon Oct. 9, 2018, 5:04 p.m. UTC | #3

> On 9 Oct 2018, at 17:01, Jim Mattson <jmattson@google.com> wrote:
> 
> I'm happy to do the kvm-unit-tests for (1) and (2). The subtlety of
> exception.pending and exception.injected is lost on me. We do need to
> handle pending debug exceptions in a MOV-SS shadow, but I don't think
> that's what you're talking about. Can you explain?

Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected
the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected).
Which means userspace have no way of knowing if exception was re-injected
and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted.

Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending
to the value of events->exception.injected. Which means userspace only have the
ability to inject exceptions which cannot be intercepted by a L1 guest.

Before this series, userspace must have assumed that exception cannot be intercepted
as exception side-effects (payload) is assumed to already been done by userspace.
Now however, it is possible for userspace to set a pending exception via IOCTL together
with it’s pending exception such that it can be correctly intercepted by L1.

-Liran

> 
> On Tue, Oct 9, 2018 at 5:33 AM, Liran Alon <liran.alon@oracle.com> wrote:
>> 
>> 
>>> On 8 Oct 2018, at 21:29, Jim Mattson <jmattson@google.com> wrote:
>>> 
>>> This is a per-VM capability which can be enabled by userspace so that
>>> the faulting linear address will be included with the information
>>> about a pending #PF in L2, and the "new DR6 bits" will be included
>>> with the information about a pending #DB in L2. With this capability
>>> enabled, the L1 hypervisor can now intercept #PF before CR2 is
>>> modified. Under VMX, the L1 hypervisor can now intercept #DB before
>>> DR6 and DR7 are modified.
>>> 
>>> When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
>>> generally provide an appropriate payload when injecting a #PF or #DB
>>> exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
>>> checkpoints, this payload is not required.
>>> 
>>> Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
>>> exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
>>> region while advanced debugging of RTM transactional regions was
>>> enabled. This is the reverse of DR6.RTM, which is cleared in this
>>> scenario.
>>> 
>>> Reported-by: Jim Mattson <jmattson@google.com>
>>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Jim Mattson <jmattson@google.com>
>>> Reviewed-by: Peter Shier <pshier@google.com>
>> 
>> Patch itself looks fine:
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> 
>> A couple of general notes regarding series:
>> 1) I saw that kvm-unit-test 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1”)
>> verifies that indeed intercept on #DB is delivered before DR6 is modified.
>> It would be nice to also have a kvm-unit-test that similarly verifies that intercept on #PF is delivered before CR2 is modified.
>> 2) Similar kvm-unit-tests should also be written for nSVM.
>> 3) I think now we have the needed framework to also fix
>> kvm_vcpu_ioctl_x86_get_vcpu_events() and  kvm_vcpu_ioctl_x86_set_vcpu_events()
>> to pass exception.pending and exception.injected separately.
>> Do you think this work should be done at the end of this patch series or a separate one once this is applied?
>> 
>> -Liran
>> 
>>

Jim Mattson Oct. 9, 2018, 5:54 p.m. UTC | #4

On Tue, Oct 9, 2018 at 10:04 AM, Liran Alon <liran.alon@oracle.com> wrote:

> Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected
> the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected).
> Which means userspace have no way of knowing if exception was re-injected
> and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted.
>
> Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending
> to the value of events->exception.injected. Which means userspace only have the
> ability to inject exceptions which cannot be intercepted by a L1 guest.
>
> Before this series, userspace must have assumed that exception cannot be intercepted
> as exception side-effects (payload) is assumed to already been done by userspace.
> Now however, it is possible for userspace to set a pending exception via IOCTL together
> with it’s pending exception such that it can be correctly intercepted by L1.

Yes, that seems like a mess that should be cleaned up and tied to the
new capability.

Liran Alon Oct. 9, 2018, 6:12 p.m. UTC | #5

> On 9 Oct 2018, at 20:54, Jim Mattson <jmattson@google.com> wrote:
> 
> On Tue, Oct 9, 2018 at 10:04 AM, Liran Alon <liran.alon@oracle.com> wrote:
> 
>> Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected
>> the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected).
>> Which means userspace have no way of knowing if exception was re-injected
>> and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted.
>> 
>> Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending
>> to the value of events->exception.injected. Which means userspace only have the
>> ability to inject exceptions which cannot be intercepted by a L1 guest.
>> 
>> Before this series, userspace must have assumed that exception cannot be intercepted
>> as exception side-effects (payload) is assumed to already been done by userspace.
>> Now however, it is possible for userspace to set a pending exception via IOCTL together
>> with it’s pending exception such that it can be correctly intercepted by L1.
> 
> Yes, that seems like a mess that should be cleaned up and tied to the
> new capability.

That’s why I asked if you plan to cleanup this mess as-well in v2 of this series
or should I in a separate one? :P

-Liran

Jim Mattson Oct. 9, 2018, 6:17 p.m. UTC | #6

On Tue, Oct 9, 2018 at 11:12 AM, Liran Alon <liran.alon@oracle.com> wrote:

> That’s why I asked if you plan to cleanup this mess as-well in v2 of this series
> or should I in a separate one? :P

I'll take a look in v2.

Paolo Bonzini Oct. 15, 2018, 5:05 p.m. UTC | #7

On 09/10/2018 19:54, Jim Mattson wrote:
>> Today, KVM_GET_VCPU_EVENTS IOCTL returns in events->exception.injected
>> the value of (vcpu->arch.exception.pending || vcpu->arch.exception.injected).
>> Which means userspace have no way of knowing if exception was re-injected
>> and thus cannot be intercepted by L1 or it is still pending and therefore can be intercepted.
>>
>> Similarly, KVM_SET_VCPU_EVENTS IOCTL sets cpu->arch.exception.pending
>> to the value of events->exception.injected. Which means userspace only have the
>> ability to inject exceptions which cannot be intercepted by a L1 guest.
>>
>> Before this series, userspace must have assumed that exception cannot be intercepted
>> as exception side-effects (payload) is assumed to already been done by userspace.
>> Now however, it is possible for userspace to set a pending exception via IOCTL together
>> with it’s pending exception such that it can be correctly intercepted by L1.
> Yes, that seems like a mess that should be cleaned up and tied to the
> new capability.

Or perhaps it _is_ actually better to have a KVM_GET/SET_VCPU_EVENTS2
(or event just the "set") that is unrelated to the capability.  Maybe
that would make the code cleaner.

Paolo

[5/5] kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD

Commit Message

Comments

Patch