mbox series

[v6,0/7] Allow user space to restrict and augment MSR emulation

Message ID 20200901201517.29086-1-graf@amazon.com (mailing list archive)
Headers show
Series Allow user space to restrict and augment MSR emulation | expand

Message

Alexander Graf Sept. 1, 2020, 8:15 p.m. UTC
While tying to add support for the MSR_CORE_THREAD_COUNT MSR in KVM,
I realized that we were still in a world where user space has no control
over what happens with MSR emulation in KVM.

That is bad for multiple reasons. In my case, I wanted to emulate the
MSR in user space, because it's a CPU specific register that does not
exist on older CPUs and that really only contains informational data that
is on the package level, so it's a natural fit for user space to provide
it.

However, it is also bad on a platform compatibility level. Currrently,
KVM has no way to expose different MSRs based on the selected target CPU
type.

This patch set introduces a way for user space to indicate to KVM which
MSRs should be handled in kernel space. With that, we can solve part of
the platform compatibility story. Or at least we can not handle AMD specific
MSRs on an Intel platform and vice versa.

In addition, it introduces a way for user space to get into the loop
when an MSR access would generate a #GP fault, such as when KVM finds an
MSR that is not handled by the in-kernel MSR emulation or when the guest
is trying to access reserved registers.

In combination with filtering, user space trapping allows us to emulate
arbitrary MSRs in user space, paving the way for target CPU specific MSR
implementations from user space.

v1 -> v2:

  - s/ETRAP_TO_USER_SPACE/ENOENT/g
  - deflect all #GP injection events to user space, not just unknown MSRs.
    That was we can also deflect allowlist errors later
  - fix emulator case
  - new patch: KVM: x86: Introduce allow list for MSR emulation
  - new patch: KVM: selftests: Add test for user space MSR handling

v2 -> v3:

  - return r if r == X86EMUL_IO_NEEDED
  - s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g
  - s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g
  - Use complete_userspace_io logic instead of reply field
  - Simplify trapping code
  - document flags for KVM_X86_ADD_MSR_ALLOWLIST
  - generalize exit path, always unlock when returning
  - s/KVM_CAP_ADD_MSR_ALLOWLIST/KVM_CAP_X86_MSR_ALLOWLIST/g
  - Add KVM_X86_CLEAR_MSR_ALLOWLIST
  - Add test to clear whitelist
  - Adjust to reply-less API
  - Fix asserts
  - Actually trap on MSR_IA32_POWER_CTL writes

v3 -> v4:

  - Mention exit reasons in re-enter mandatory section of API documentation
  - Clear padding bytes
  - Generalize get/set deflect functions
  - Remove redundant pending_user_msr field
  - lock allow check and clearing
  - free bitmaps on clear

v4 -> v5:

  - use srcu 

v5 -> v6:

  - Switch from allow list to filtering API with explicit fallback option
  - Support and test passthrough MSR filtering
  - Check for filter exit reason
  - Add .gitignore
  - send filter change notification
  - change to atomic set_msr_filter ioctl with fallback flag
  - use EPERM for filter blocks
  - add bit for MSR user space deflection
  - check for overflow of BITS_TO_LONGS (thanks Dan Carpenter!)
  - s/int i;/u32 i;/
  - remove overlap check
  - Introduce exit reason mask to allow for future expansion and filtering
  - s/emul_to_vcpu(ctxt)/vcpu/
  - imported patch: KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
  - new patch: KVM: x86: Add infrastructure for MSR filtering
  - new patch: KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
  - new patch: KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied

Aaron Lewis (1):
  KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs

Alexander Graf (6):
  KVM: x86: Deflect unknown MSR accesses to user space
  KVM: x86: Add infrastructure for MSR filtering
  KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
  KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
  KVM: x86: Introduce MSR filtering
  KVM: selftests: Add test for user space MSR handling

 Documentation/virt/kvm/api.rst                | 176 +++++++++-
 arch/x86/include/asm/kvm_host.h               |  18 ++
 arch/x86/include/uapi/asm/kvm.h               |  19 ++
 arch/x86/kvm/emulate.c                        |  18 +-
 arch/x86/kvm/svm/svm.c                        | 122 +++++--
 arch/x86/kvm/svm/svm.h                        |   7 +
 arch/x86/kvm/vmx/nested.c                     |   2 +-
 arch/x86/kvm/vmx/vmx.c                        | 303 ++++++++++++------
 arch/x86/kvm/vmx/vmx.h                        |   9 +-
 arch/x86/kvm/x86.c                            | 267 ++++++++++++++-
 arch/x86/kvm/x86.h                            |   1 +
 include/trace/events/kvm.h                    |   2 +-
 include/uapi/linux/kvm.h                      |  17 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/user_msr_test.c      | 224 +++++++++++++
 16 files changed, 1055 insertions(+), 132 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/user_msr_test.c

Comments

Aaron Lewis Sept. 1, 2020, 10:32 p.m. UTC | #1
On Tue, Sep 1, 2020 at 1:15 PM Alexander Graf <graf@amazon.com> wrote:
>
> While tying to add support for the MSR_CORE_THREAD_COUNT MSR in KVM,
> I realized that we were still in a world where user space has no control
> over what happens with MSR emulation in KVM.
>
> That is bad for multiple reasons. In my case, I wanted to emulate the
> MSR in user space, because it's a CPU specific register that does not
> exist on older CPUs and that really only contains informational data that
> is on the package level, so it's a natural fit for user space to provide
> it.
>
> However, it is also bad on a platform compatibility level. Currrently,
> KVM has no way to expose different MSRs based on the selected target CPU
> type.
>
> This patch set introduces a way for user space to indicate to KVM which
> MSRs should be handled in kernel space. With that, we can solve part of
> the platform compatibility story. Or at least we can not handle AMD specific
> MSRs on an Intel platform and vice versa.
>
> In addition, it introduces a way for user space to get into the loop
> when an MSR access would generate a #GP fault, such as when KVM finds an
> MSR that is not handled by the in-kernel MSR emulation or when the guest
> is trying to access reserved registers.
>
> In combination with filtering, user space trapping allows us to emulate
> arbitrary MSRs in user space, paving the way for target CPU specific MSR
> implementations from user space.
>
> v1 -> v2:
>
>   - s/ETRAP_TO_USER_SPACE/ENOENT/g
>   - deflect all #GP injection events to user space, not just unknown MSRs.
>     That was we can also deflect allowlist errors later
>   - fix emulator case
>   - new patch: KVM: x86: Introduce allow list for MSR emulation
>   - new patch: KVM: selftests: Add test for user space MSR handling
>
> v2 -> v3:
>
>   - return r if r == X86EMUL_IO_NEEDED
>   - s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g
>   - s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g
>   - Use complete_userspace_io logic instead of reply field
>   - Simplify trapping code
>   - document flags for KVM_X86_ADD_MSR_ALLOWLIST
>   - generalize exit path, always unlock when returning
>   - s/KVM_CAP_ADD_MSR_ALLOWLIST/KVM_CAP_X86_MSR_ALLOWLIST/g
>   - Add KVM_X86_CLEAR_MSR_ALLOWLIST
>   - Add test to clear whitelist
>   - Adjust to reply-less API
>   - Fix asserts
>   - Actually trap on MSR_IA32_POWER_CTL writes
>
> v3 -> v4:
>
>   - Mention exit reasons in re-enter mandatory section of API documentation
>   - Clear padding bytes
>   - Generalize get/set deflect functions
>   - Remove redundant pending_user_msr field
>   - lock allow check and clearing
>   - free bitmaps on clear
>
> v4 -> v5:
>
>   - use srcu
>
> v5 -> v6:
>
>   - Switch from allow list to filtering API with explicit fallback option
>   - Support and test passthrough MSR filtering
>   - Check for filter exit reason
>   - Add .gitignore
>   - send filter change notification
>   - change to atomic set_msr_filter ioctl with fallback flag
>   - use EPERM for filter blocks
>   - add bit for MSR user space deflection
>   - check for overflow of BITS_TO_LONGS (thanks Dan Carpenter!)
>   - s/int i;/u32 i;/
>   - remove overlap check
>   - Introduce exit reason mask to allow for future expansion and filtering
>   - s/emul_to_vcpu(ctxt)/vcpu/
>   - imported patch: KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
>   - new patch: KVM: x86: Add infrastructure for MSR filtering
>   - new patch: KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
>   - new patch: KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
>
> Aaron Lewis (1):
>   KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
>
> Alexander Graf (6):
>   KVM: x86: Deflect unknown MSR accesses to user space
>   KVM: x86: Add infrastructure for MSR filtering
>   KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
>   KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
>   KVM: x86: Introduce MSR filtering
>   KVM: selftests: Add test for user space MSR handling
>
>  Documentation/virt/kvm/api.rst                | 176 +++++++++-
>  arch/x86/include/asm/kvm_host.h               |  18 ++
>  arch/x86/include/uapi/asm/kvm.h               |  19 ++
>  arch/x86/kvm/emulate.c                        |  18 +-
>  arch/x86/kvm/svm/svm.c                        | 122 +++++--
>  arch/x86/kvm/svm/svm.h                        |   7 +
>  arch/x86/kvm/vmx/nested.c                     |   2 +-
>  arch/x86/kvm/vmx/vmx.c                        | 303 ++++++++++++------
>  arch/x86/kvm/vmx/vmx.h                        |   9 +-
>  arch/x86/kvm/x86.c                            | 267 ++++++++++++++-
>  arch/x86/kvm/x86.h                            |   1 +
>  include/trace/events/kvm.h                    |   2 +-
>  include/uapi/linux/kvm.h                      |  17 +
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/x86_64/user_msr_test.c      | 224 +++++++++++++
>  16 files changed, 1055 insertions(+), 132 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/user_msr_test.c
>
> --
> 2.17.1
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

Hi Alex,

I'm only seeing 4 commits.  Are you planning on sending the remaining 3?

Thanks,
Aaron
Alexander Graf Sept. 2, 2020, 1:01 p.m. UTC | #2
On 02.09.20 00:32, Aaron Lewis wrote:
> 
> On Tue, Sep 1, 2020 at 1:15 PM Alexander Graf <graf@amazon.com> wrote:
>>
>> While tying to add support for the MSR_CORE_THREAD_COUNT MSR in KVM,
>> I realized that we were still in a world where user space has no control
>> over what happens with MSR emulation in KVM.
>>
>> That is bad for multiple reasons. In my case, I wanted to emulate the
>> MSR in user space, because it's a CPU specific register that does not
>> exist on older CPUs and that really only contains informational data that
>> is on the package level, so it's a natural fit for user space to provide
>> it.
>>
>> However, it is also bad on a platform compatibility level. Currrently,
>> KVM has no way to expose different MSRs based on the selected target CPU
>> type.
>>
>> This patch set introduces a way for user space to indicate to KVM which
>> MSRs should be handled in kernel space. With that, we can solve part of
>> the platform compatibility story. Or at least we can not handle AMD specific
>> MSRs on an Intel platform and vice versa.
>>
>> In addition, it introduces a way for user space to get into the loop
>> when an MSR access would generate a #GP fault, such as when KVM finds an
>> MSR that is not handled by the in-kernel MSR emulation or when the guest
>> is trying to access reserved registers.
>>
>> In combination with filtering, user space trapping allows us to emulate
>> arbitrary MSRs in user space, paving the way for target CPU specific MSR
>> implementations from user space.
>>
>> v1 -> v2:
>>
>>    - s/ETRAP_TO_USER_SPACE/ENOENT/g
>>    - deflect all #GP injection events to user space, not just unknown MSRs.
>>      That was we can also deflect allowlist errors later
>>    - fix emulator case
>>    - new patch: KVM: x86: Introduce allow list for MSR emulation
>>    - new patch: KVM: selftests: Add test for user space MSR handling
>>
>> v2 -> v3:
>>
>>    - return r if r == X86EMUL_IO_NEEDED
>>    - s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g
>>    - s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g
>>    - Use complete_userspace_io logic instead of reply field
>>    - Simplify trapping code
>>    - document flags for KVM_X86_ADD_MSR_ALLOWLIST
>>    - generalize exit path, always unlock when returning
>>    - s/KVM_CAP_ADD_MSR_ALLOWLIST/KVM_CAP_X86_MSR_ALLOWLIST/g
>>    - Add KVM_X86_CLEAR_MSR_ALLOWLIST
>>    - Add test to clear whitelist
>>    - Adjust to reply-less API
>>    - Fix asserts
>>    - Actually trap on MSR_IA32_POWER_CTL writes
>>
>> v3 -> v4:
>>
>>    - Mention exit reasons in re-enter mandatory section of API documentation
>>    - Clear padding bytes
>>    - Generalize get/set deflect functions
>>    - Remove redundant pending_user_msr field
>>    - lock allow check and clearing
>>    - free bitmaps on clear
>>
>> v4 -> v5:
>>
>>    - use srcu
>>
>> v5 -> v6:
>>
>>    - Switch from allow list to filtering API with explicit fallback option
>>    - Support and test passthrough MSR filtering
>>    - Check for filter exit reason
>>    - Add .gitignore
>>    - send filter change notification
>>    - change to atomic set_msr_filter ioctl with fallback flag
>>    - use EPERM for filter blocks
>>    - add bit for MSR user space deflection
>>    - check for overflow of BITS_TO_LONGS (thanks Dan Carpenter!)
>>    - s/int i;/u32 i;/
>>    - remove overlap check
>>    - Introduce exit reason mask to allow for future expansion and filtering
>>    - s/emul_to_vcpu(ctxt)/vcpu/
>>    - imported patch: KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
>>    - new patch: KVM: x86: Add infrastructure for MSR filtering
>>    - new patch: KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
>>    - new patch: KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
>>
>> Aaron Lewis (1):
>>    KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
>>
>> Alexander Graf (6):
>>    KVM: x86: Deflect unknown MSR accesses to user space
>>    KVM: x86: Add infrastructure for MSR filtering
>>    KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied
>>    KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
>>    KVM: x86: Introduce MSR filtering
>>    KVM: selftests: Add test for user space MSR handling
>>
>>   Documentation/virt/kvm/api.rst                | 176 +++++++++-
>>   arch/x86/include/asm/kvm_host.h               |  18 ++
>>   arch/x86/include/uapi/asm/kvm.h               |  19 ++
>>   arch/x86/kvm/emulate.c                        |  18 +-
>>   arch/x86/kvm/svm/svm.c                        | 122 +++++--
>>   arch/x86/kvm/svm/svm.h                        |   7 +
>>   arch/x86/kvm/vmx/nested.c                     |   2 +-
>>   arch/x86/kvm/vmx/vmx.c                        | 303 ++++++++++++------
>>   arch/x86/kvm/vmx/vmx.h                        |   9 +-
>>   arch/x86/kvm/x86.c                            | 267 ++++++++++++++-
>>   arch/x86/kvm/x86.h                            |   1 +
>>   include/trace/events/kvm.h                    |   2 +-
>>   include/uapi/linux/kvm.h                      |  17 +
>>   tools/testing/selftests/kvm/.gitignore        |   1 +
>>   tools/testing/selftests/kvm/Makefile          |   1 +
>>   .../selftests/kvm/x86_64/user_msr_test.c      | 224 +++++++++++++
>>   16 files changed, 1055 insertions(+), 132 deletions(-)
>>   create mode 100644 tools/testing/selftests/kvm/x86_64/user_msr_test.c
>>
>> --
>> 2.17.1
>>
>>
>>
>>
>> Amazon Development Center Germany GmbH
>> Krausenstr. 38
>> 10117 Berlin
>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>> Sitz: Berlin
>> Ust-ID: DE 289 237 879
>>
>>
>>
> 
> Hi Alex,
> 
> I'm only seeing 4 commits.  Are you planning on sending the remaining 3?

Hmm, looks like a combination of bad email server and buggy git broke my 
neck here. I found a small bug in the series, so I've just sent the full 
set out again, hopefully completely this time :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879