mbox series

[v1,00/13] KVM: arm64: Fixed features for protected VMs

Message ID 20210608141141.997398-1-tabba@google.com (mailing list archive)
Headers show
Series KVM: arm64: Fixed features for protected VMs | expand

Message

Fuad Tabba June 8, 2021, 2:11 p.m. UTC
Hi,

This patch series adds support for restricting CPU features for protected VMs
in KVM [1].

Various feature configurations are allowed in KVM/arm64. Supporting all
these features in pKVM is difficult, as it either involves moving much of
the handling code to EL2, which adds bloat and results in a less verifiable
trusted code base. Or it involves leaving the code handling at EL1, which
risks having an untrusted host kernel feeding wrong information to the EL2
and to the protected guests.

This series attempts to mitigate this by reducing the configuration space,
providing a reduced amount of feature support at EL2 with the least amount of
compromise of protected guests' capabilities.

This is done by restricting CPU features exposed to protected guests through
feature registers. These restrictions are enforced by trapping register
accesses as well as instructions associated with these features, and injecting
an undefined exception into the guest if it attempts to use a restricted
feature.

The features being restricted (only for protected VMs in protected mode) are
the following:
- Debug, Trace, and DoubleLock
- Performance Monitoring (PMU)
- Statistical Profiling (SPE)
- Scalable Vector Extension (SVE)
- Memory Partitioning and Monitoring (MPAM)
- Activity Monitoring (AMU)
- Memory Tagging (MTE)
- Limited Ordering Regions (LOR)
- AArch32 State
- Generic Interrupt Controller (GIC) (depending on rVIC support)
- Nested Virtualization (NV)
- Reliability, Availability, and Serviceability (RAS) above V1
- Implementation-defined Features

This series is based on kvmarm/next and Will's patches for an Initial pKVM user
ABI [1]. You can find the applied series here [2].

Cheers,
/fuad

[1] https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/

For more details about pKVM, please refer to Will's talk at KVM Forum 2020:
https://www.youtube.com/watch?v=edqJSzsDRxk

[2] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/el2_fixed_feature_v1

To: kvmarm@lists.cs.columbia.edu
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Quentin Perret <qperret@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: kernel-team@android.com

Fuad Tabba (13):
  KVM: arm64: Remove trailing whitespace in comments
  KVM: arm64: MDCR_EL2 is a 64-bit register
  KVM: arm64: Fix name of HCR_TACR to match the spec
  KVM: arm64: Refactor sys_regs.h,c for nVHE reuse
  KVM: arm64: Restore mdcr_el2 from vcpu
  KVM: arm64: Add feature register flag definitions
  KVM: arm64: Add config register bit definitions
  KVM: arm64: Guest exit handlers for nVHE hyp
  KVM: arm64: Add trap handlers for protected VMs
  KVM: arm64: Move sanitized copies of CPU features
  KVM: arm64: Trap access to pVM restricted features
  KVM: arm64: Handle protected guests at 32 bits
  KVM: arm64: Check vcpu features at pVM creation

 arch/arm64/include/asm/kvm_arm.h        |  34 +-
 arch/arm64/include/asm/kvm_asm.h        |   2 +-
 arch/arm64/include/asm/kvm_host.h       |   2 +-
 arch/arm64/include/asm/kvm_hyp.h        |   4 +
 arch/arm64/include/asm/sysreg.h         |   6 +
 arch/arm64/kvm/arm.c                    |   4 +
 arch/arm64/kvm/debug.c                  |   5 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  42 ++
 arch/arm64/kvm/hyp/nvhe/Makefile        |   2 +-
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |   2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |   6 -
 arch/arm64/kvm/hyp/nvhe/switch.c        | 114 +++++-
 arch/arm64/kvm/hyp/nvhe/sys_regs.c      | 501 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/debug-sr.c       |   2 +-
 arch/arm64/kvm/pkvm.c                   |  31 ++
 arch/arm64/kvm/sys_regs.c               |  62 +--
 arch/arm64/kvm/sys_regs.h               |  35 ++
 17 files changed, 782 insertions(+), 72 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/sys_regs.c


base-commit: 35b256a5eebe3ac715b4ea6234aa4236a10d1a88

Comments

Andrew Jones June 8, 2021, 3:07 p.m. UTC | #1
On Tue, Jun 08, 2021 at 03:11:28PM +0100, Fuad Tabba wrote:
> Hi,
> 
> This patch series adds support for restricting CPU features for protected VMs
> in KVM [1].
> 
> Various feature configurations are allowed in KVM/arm64. Supporting all
> these features in pKVM is difficult, as it either involves moving much of
> the handling code to EL2, which adds bloat and results in a less verifiable
> trusted code base. Or it involves leaving the code handling at EL1, which
> risks having an untrusted host kernel feeding wrong information to the EL2
> and to the protected guests.
> 
> This series attempts to mitigate this by reducing the configuration space,
> providing a reduced amount of feature support at EL2 with the least amount of
> compromise of protected guests' capabilities.
> 
> This is done by restricting CPU features exposed to protected guests through
> feature registers. These restrictions are enforced by trapping register
> accesses as well as instructions associated with these features, and injecting
> an undefined exception into the guest if it attempts to use a restricted
> feature.
> 
> The features being restricted (only for protected VMs in protected mode) are
> the following:
> - Debug, Trace, and DoubleLock
> - Performance Monitoring (PMU)
> - Statistical Profiling (SPE)
> - Scalable Vector Extension (SVE)
> - Memory Partitioning and Monitoring (MPAM)
> - Activity Monitoring (AMU)
> - Memory Tagging (MTE)
> - Limited Ordering Regions (LOR)
> - AArch32 State
> - Generic Interrupt Controller (GIC) (depending on rVIC support)
> - Nested Virtualization (NV)
> - Reliability, Availability, and Serviceability (RAS) above V1
> - Implementation-defined Features

Hi Fuad,

I see this series takes the approach we currently have in KVM of masking
features we don't want to expose to the guest. This approach adds yet
another "reject list" to be maintained as hardware evolves. I'd rather see
that we first change KVM to using an accept list, i.e. mask everything and
then only set what we want to enable. Mimicking that new accept list in
pKVM, where much less would be enabled, would reduce the amount of
maintenance needed.

Thanks,
drew

> 
> This series is based on kvmarm/next and Will's patches for an Initial pKVM user
> ABI [1]. You can find the applied series here [2].
> 
> Cheers,
> /fuad
> 
> [1] https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
> 
> For more details about pKVM, please refer to Will's talk at KVM Forum 2020:
> https://www.youtube.com/watch?v=edqJSzsDRxk
> 
> [2] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/el2_fixed_feature_v1
> 
> To: kvmarm@lists.cs.columbia.edu
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: James Morse <james.morse@arm.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Quentin Perret <qperret@google.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: kernel-team@android.com
> 
> Fuad Tabba (13):
>   KVM: arm64: Remove trailing whitespace in comments
>   KVM: arm64: MDCR_EL2 is a 64-bit register
>   KVM: arm64: Fix name of HCR_TACR to match the spec
>   KVM: arm64: Refactor sys_regs.h,c for nVHE reuse
>   KVM: arm64: Restore mdcr_el2 from vcpu
>   KVM: arm64: Add feature register flag definitions
>   KVM: arm64: Add config register bit definitions
>   KVM: arm64: Guest exit handlers for nVHE hyp
>   KVM: arm64: Add trap handlers for protected VMs
>   KVM: arm64: Move sanitized copies of CPU features
>   KVM: arm64: Trap access to pVM restricted features
>   KVM: arm64: Handle protected guests at 32 bits
>   KVM: arm64: Check vcpu features at pVM creation
> 
>  arch/arm64/include/asm/kvm_arm.h        |  34 +-
>  arch/arm64/include/asm/kvm_asm.h        |   2 +-
>  arch/arm64/include/asm/kvm_host.h       |   2 +-
>  arch/arm64/include/asm/kvm_hyp.h        |   4 +
>  arch/arm64/include/asm/sysreg.h         |   6 +
>  arch/arm64/kvm/arm.c                    |   4 +
>  arch/arm64/kvm/debug.c                  |   5 +-
>  arch/arm64/kvm/hyp/include/hyp/switch.h |  42 ++
>  arch/arm64/kvm/hyp/nvhe/Makefile        |   2 +-
>  arch/arm64/kvm/hyp/nvhe/debug-sr.c      |   2 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c   |   6 -
>  arch/arm64/kvm/hyp/nvhe/switch.c        | 114 +++++-
>  arch/arm64/kvm/hyp/nvhe/sys_regs.c      | 501 ++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/debug-sr.c       |   2 +-
>  arch/arm64/kvm/pkvm.c                   |  31 ++
>  arch/arm64/kvm/sys_regs.c               |  62 +--
>  arch/arm64/kvm/sys_regs.h               |  35 ++
>  17 files changed, 782 insertions(+), 72 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/sys_regs.c
> 
> 
> base-commit: 35b256a5eebe3ac715b4ea6234aa4236a10d1a88
> -- 
> 2.32.0.rc1.229.g3e70b5a671-goog
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>
Fuad Tabba June 9, 2021, 3:22 p.m. UTC | #2
Hi Drew,

> I see this series takes the approach we currently have in KVM of masking
> features we don't want to expose to the guest. This approach adds yet
> another "reject list" to be maintained as hardware evolves. I'd rather see
> that we first change KVM to using an accept list, i.e. mask everything and
> then only set what we want to enable. Mimicking that new accept list in
> pKVM, where much less would be enabled, would reduce the amount of
> maintenance needed.

Good point. I agree that having an allow list is preferable to having
a block list. The way this patch series handles system register
accesses is actually an allow list. However, as it is now, features
being exposed to protected guests via the feature registers take the
block list approach. I will fix that to ensure that instead it exposes
a list of allowed features, rather than hiding restricted ones. As you
suggest, this would reduce the amount of maintenance as hardware
evolves and is better for security as well.

As for changing KVM first, I think that that's orthogonal to what this
series is trying to accomplish. Features in pKVM are not controlled or
negotiable by userspace, as it is a fixed virtual platform. When KVM
changes to use allow lists instead, it shouldn't conflict with how
this series works, especially if I fix it to use an allow list
instead.

Thanks for your feedback.

Cheers,
/fuad


> Thanks,
> drew
>
> >
> > This series is based on kvmarm/next and Will's patches for an Initial pKVM user
> > ABI [1]. You can find the applied series here [2].
> >
> > Cheers,
> > /fuad
> >
> > [1] https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
> >
> > For more details about pKVM, please refer to Will's talk at KVM Forum 2020:
> > https://www.youtube.com/watch?v=edqJSzsDRxk
> >
> > [2] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/el2_fixed_feature_v1
> >
> > To: kvmarm@lists.cs.columbia.edu
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Christoffer Dall <christoffer.dall@arm.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Quentin Perret <qperret@google.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: kernel-team@android.com
> >
> > Fuad Tabba (13):
> >   KVM: arm64: Remove trailing whitespace in comments
> >   KVM: arm64: MDCR_EL2 is a 64-bit register
> >   KVM: arm64: Fix name of HCR_TACR to match the spec
> >   KVM: arm64: Refactor sys_regs.h,c for nVHE reuse
> >   KVM: arm64: Restore mdcr_el2 from vcpu
> >   KVM: arm64: Add feature register flag definitions
> >   KVM: arm64: Add config register bit definitions
> >   KVM: arm64: Guest exit handlers for nVHE hyp
> >   KVM: arm64: Add trap handlers for protected VMs
> >   KVM: arm64: Move sanitized copies of CPU features
> >   KVM: arm64: Trap access to pVM restricted features
> >   KVM: arm64: Handle protected guests at 32 bits
> >   KVM: arm64: Check vcpu features at pVM creation
> >
> >  arch/arm64/include/asm/kvm_arm.h        |  34 +-
> >  arch/arm64/include/asm/kvm_asm.h        |   2 +-
> >  arch/arm64/include/asm/kvm_host.h       |   2 +-
> >  arch/arm64/include/asm/kvm_hyp.h        |   4 +
> >  arch/arm64/include/asm/sysreg.h         |   6 +
> >  arch/arm64/kvm/arm.c                    |   4 +
> >  arch/arm64/kvm/debug.c                  |   5 +-
> >  arch/arm64/kvm/hyp/include/hyp/switch.h |  42 ++
> >  arch/arm64/kvm/hyp/nvhe/Makefile        |   2 +-
> >  arch/arm64/kvm/hyp/nvhe/debug-sr.c      |   2 +-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c   |   6 -
> >  arch/arm64/kvm/hyp/nvhe/switch.c        | 114 +++++-
> >  arch/arm64/kvm/hyp/nvhe/sys_regs.c      | 501 ++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/vhe/debug-sr.c       |   2 +-
> >  arch/arm64/kvm/pkvm.c                   |  31 ++
> >  arch/arm64/kvm/sys_regs.c               |  62 +--
> >  arch/arm64/kvm/sys_regs.h               |  35 ++
> >  17 files changed, 782 insertions(+), 72 deletions(-)
> >  create mode 100644 arch/arm64/kvm/hyp/nvhe/sys_regs.c
> >
> >
> > base-commit: 35b256a5eebe3ac715b4ea6234aa4236a10d1a88
> > --
> > 2.32.0.rc1.229.g3e70b5a671-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> >
>
Alexandru Elisei June 11, 2021, 12:44 p.m. UTC | #3
Hi,

On 6/8/21 3:11 PM, Fuad Tabba wrote:
> Hi,
>
> This patch series adds support for restricting CPU features for protected VMs
> in KVM [1].
>
> Various feature configurations are allowed in KVM/arm64. Supporting all
> these features in pKVM is difficult, as it either involves moving much of
> the handling code to EL2, which adds bloat and results in a less verifiable
> trusted code base. Or it involves leaving the code handling at EL1, which
> risks having an untrusted host kernel feeding wrong information to the EL2
> and to the protected guests.
>
> This series attempts to mitigate this by reducing the configuration space,
> providing a reduced amount of feature support at EL2 with the least amount of
> compromise of protected guests' capabilities.
>
> This is done by restricting CPU features exposed to protected guests through
> feature registers. These restrictions are enforced by trapping register
> accesses as well as instructions associated with these features, and injecting
> an undefined exception into the guest if it attempts to use a restricted
> feature.
>
> The features being restricted (only for protected VMs in protected mode) are
> the following:
> - Debug, Trace, and DoubleLock
> - Performance Monitoring (PMU)
> - Statistical Profiling (SPE)
> - Scalable Vector Extension (SVE)
> - Memory Partitioning and Monitoring (MPAM)
> - Activity Monitoring (AMU)
> - Memory Tagging (MTE)
> - Limited Ordering Regions (LOR)
> - AArch32 State
> - Generic Interrupt Controller (GIC) (depending on rVIC support)
> - Nested Virtualization (NV)
> - Reliability, Availability, and Serviceability (RAS) above V1
> - Implementation-defined Features
>
> This series is based on kvmarm/next and Will's patches for an Initial pKVM user
> ABI [1]. You can find the applied series here [2].

Since this is implementing the kernel side of an RFC userspace ABI, I'm going to
treat the series as an RFC also and not go into the individual patches.

What strikes me as odd is the fact that, as far as I can tell, you're duplicating
part of the kvm/sys_regs.c and kvm/handle_exit.c functionality in the nvhe code.
Why was this approach chosen instead of reusing the existing functions and adding
extra code to handle the protected VM case?

Another example of this is detecting when a host dropped to 32bit EL0, the comment
says that you don't trust the host to make the check. What exactly do you trust
the host to do at what point? I don't see this explained anywhere, it's possible
I've missed it.

I also think that registers that mostly don't change during the lifetime of the VM
(HCR_EL2, CPTR_EL2, MDCR_EL2) can be set by host when the VM becomes protected,
instead of fiddling with them at each world switch. Was there a particular reason
for changing them in __activate_traps_pvm() or was this just an implementation choice?

Thanks,

Alex

> Cheers,
> /fuad
>
> [1] https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
>
> For more details about pKVM, please refer to Will's talk at KVM Forum 2020:
> https://www.youtube.com/watch?v=edqJSzsDRxk
>
> [2] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/el2_fixed_feature_v1
Fuad Tabba June 13, 2021, 4:12 p.m. UTC | #4
Hi Alex,

On Fri, Jun 11, 2021 at 1:43 PM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On 6/8/21 3:11 PM, Fuad Tabba wrote:
> > Hi,
> >
> > This patch series adds support for restricting CPU features for protected VMs
> > in KVM [1].
> >
> > Various feature configurations are allowed in KVM/arm64. Supporting all
> > these features in pKVM is difficult, as it either involves moving much of
> > the handling code to EL2, which adds bloat and results in a less verifiable
> > trusted code base. Or it involves leaving the code handling at EL1, which
> > risks having an untrusted host kernel feeding wrong information to the EL2
> > and to the protected guests.
> >
> > This series attempts to mitigate this by reducing the configuration space,
> > providing a reduced amount of feature support at EL2 with the least amount of
> > compromise of protected guests' capabilities.
> >
> > This is done by restricting CPU features exposed to protected guests through
> > feature registers. These restrictions are enforced by trapping register
> > accesses as well as instructions associated with these features, and injecting
> > an undefined exception into the guest if it attempts to use a restricted
> > feature.
> >
> > The features being restricted (only for protected VMs in protected mode) are
> > the following:
> > - Debug, Trace, and DoubleLock
> > - Performance Monitoring (PMU)
> > - Statistical Profiling (SPE)
> > - Scalable Vector Extension (SVE)
> > - Memory Partitioning and Monitoring (MPAM)
> > - Activity Monitoring (AMU)
> > - Memory Tagging (MTE)
> > - Limited Ordering Regions (LOR)
> > - AArch32 State
> > - Generic Interrupt Controller (GIC) (depending on rVIC support)
> > - Nested Virtualization (NV)
> > - Reliability, Availability, and Serviceability (RAS) above V1
> > - Implementation-defined Features
> >
> > This series is based on kvmarm/next and Will's patches for an Initial pKVM user
> > ABI [1]. You can find the applied series here [2].
>
> Since this is implementing the kernel side of an RFC userspace ABI, I'm going to
> treat the series as an RFC also and not go into the individual patches.
>
> What strikes me as odd is the fact that, as far as I can tell, you're duplicating
> part of the kvm/sys_regs.c and kvm/handle_exit.c functionality in the nvhe code.
> Why was this approach chosen instead of reusing the existing functions and adding
> extra code to handle the protected VM case?
>
> Another example of this is detecting when a host dropped to 32bit EL0, the comment
> says that you don't trust the host to make the check. What exactly do you trust
> the host to do at what point? I don't see this explained anywhere, it's possible
> I've missed it.

You're right. I haven't explained this in the patch series or provided
enough context other than a link to Will's presentation [1].

The idea is that protected VMs are protected from the host Linux
kernel (and from other VMs), where the host does not have access to
guest memory even if compromised. This patch series does not cover
that part yet. It is a part of, and builds on, other concurrent work
in order to get us there eventually [2].

Once everything falls into place, the host should not even have access
to a protected guest's state or anything that would enable it to
manipulate it (e.g., vcpu register context and el2 system registers),
only hyp would have that access. If the host could access that state,
then it might be able to get around the protection provided.
Therefore, anything that is sensitive and that would require such
access needs to happen at hyp, hence the code in nvhe running only at
hyp.

> I also think that registers that mostly don't change during the lifetime of the VM
> (HCR_EL2, CPTR_EL2, MDCR_EL2) can be set by host when the VM becomes protected,
> instead of fiddling with them at each world switch. Was there a particular reason
> for changing them in __activate_traps_pvm() or was this just an implementation choice?

You're right that they don't change during the lifetime of a VM, but
protected VMs can coexist with non-protected VMs. The values of these
registers are different between the two (different trapping to control
protection as well as enabled features). Thus the new
__activate_traps_pvm(), which would activate traps specifically for
protected VMs.

Thank you,
/fuad

[1]
https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/kvmforum-2020-edited.pdf

[2] Some of the work on protected KVM:
https://lore.kernel.org/kvmarm/20201202184122.26046-1-dbrazdil@google.com/
https://lore.kernel.org/kvmarm/20210602094347.3730846-1-qperret@google.com/
https://lore.kernel.org/kvmarm/20210608114518.748712-1-qperret@google.com/
https://lore.kernel.org/kvmarm/20210322175639.801566-1-maz@kernel.org/
https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/






> Thanks,
>
> Alex
>
> > Cheers,
> > /fuad
> >
> > [1] https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
> >
> > For more details about pKVM, please refer to Will's talk at KVM Forum 2020:
> > https://www.youtube.com/watch?v=edqJSzsDRxk
> >
> > [2] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/el2_fixed_feature_v1