mbox series

[RFC,0/7] KVM: optee: Introduce OP-TEE Mediator for exposing secure world to KVM guests

Message ID 20250401170527.344092-1-yuvraj.kernel@gmail.com (mailing list archive)
Headers show
Series KVM: optee: Introduce OP-TEE Mediator for exposing secure world to KVM guests | expand

Message

Yuvraj Sakshith April 1, 2025, 5:05 p.m. UTC
A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment
(which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided
by the architecture (such as, SMC)  which switch control to the firmware, are trapped in EL2 when the guest
is executes them.

This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets
a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which
hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.

Overview
=========

Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction",
after loading in arguments into CPU registers. What these arguments consists of and how both these entities 
communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. 
This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting
the HCR_EL2.TSC bit before entering the guest.

Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE.
Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to
OP-TEE.

OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:

1. Notify OP-TEE when a VM is created.
2. Notify OP-TEE when a VM is destroyed.
3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0.
4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest.
5. Memory shared by the VM to OP-TEE has to remain pinned.
6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.

Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is 
a hypervisor with a mediator in normal world.

This implementation has been heavily inspired by Xen's OP-TEE mediator.

Design
======

The unique design of KVM makes it quite challenging to implement such a mediator. OP-TEE is not aware of the host-guest
paradigm. Hence, the mediator treats the host as a VM with VMID 1. The guests are assigned VMIDs starting from 2 (note,
these are not the VMIDs tagged in TLB, rather we implement our own simple indexing mechanism).

When the host's OP-TEE driver is initialised or released, OP-TEE is notified about VM 1 being created/destroyed.

When a VMM (such as, QEMU) created a guest through KVM ioctls, a call to the TEE mediator layer is made, which in-turn
calls OP-TEE mediator which eventually assigns a VM context, VMID, etc. and notifies OP-TEE about guest creation. The
opposite happens on guest destruction.

When the guest makes an SMC targetting OP-TEE, it is trapped by the hypervisor and the register state (kvm_vcpu) is sent to
the OP-TEE mediator through the TEE layer. Here there are two possibilities.

The guest may make an SMC with arguments which are simple numeric values, exchanging UUID, version information, etc.
In this case, the mediator has much less work. It has to attach VMID into X7 and pass the register state to OP-TEE.

But, when guest passes memory addresses as arguments, the mediator has to translate these into physical addresses from
intermediate physical addresses (IPA). According to the OP-TEE protocol (as documented in optee_smc.h and optee_msg.h),
the guest OP-TEE driver would share a buffer filled with pointers, which the mediator translates.

The OP-TEE mediator also keeps track of active calls between each guest and OP-TEE, and pins pages which are already shared.
This is to avoid swapping of shared pages by the host under memory pressure. These pages are unpinned as soon as guest's
transaction completes with OP-TEE.

Testing
=======

The feature has been tested on QEMU virt platform using "xtest" as the test suite. As of now, all of 35000+ tests pass.
The mediator has also been stressed under memory pressure and all tests pass too. Any suggestions on further testing the
feature are welcome.

Call for review
===============
Any insights/suggestions regarding the implementation are appreciated.

Yuvraj Sakshith (7):
  firmware: smccc: Add macros for Trusted OS/App owner check on SMC
    value
  tee: Add TEE Mediator module which aims to expose TEE to a KVM guest.
  KVM: Notify TEE Mediator when KVM creates and destroys guests
  KVM: arm64: Forward guest CPU state to TEE mediator on SMC trap
  tee: optee: Add OPTEE_SMC_VM_CREATED and OPTEE_SMC_VM_DESTROYED
  tee: optee: Add OP-TEE Mediator
  tee: optee: Notify TEE Mediator on OP-TEE driver initialization and
    release

 arch/arm64/kvm/hypercalls.c        |   15 +-
 drivers/tee/Kconfig                |    5 +
 drivers/tee/Makefile               |    1 +
 drivers/tee/optee/Kconfig          |    7 +
 drivers/tee/optee/Makefile         |    1 +
 drivers/tee/optee/core.c           |   13 +-
 drivers/tee/optee/optee_mediator.c | 1319 ++++++++++++++++++++++++++++
 drivers/tee/optee/optee_mediator.h |  103 +++
 drivers/tee/optee/optee_smc.h      |   53 ++
 drivers/tee/optee/smc_abi.c        |    6 +
 drivers/tee/tee_mediator.c         |  145 +++
 include/linux/arm-smccc.h          |    8 +
 include/linux/tee_mediator.h       |   39 +
 virt/kvm/kvm_main.c                |   11 +-
 14 files changed, 1721 insertions(+), 5 deletions(-)
 create mode 100644 drivers/tee/optee/optee_mediator.c
 create mode 100644 drivers/tee/optee/optee_mediator.h
 create mode 100644 drivers/tee/tee_mediator.c
 create mode 100644 include/linux/tee_mediator.h

Comments

Marc Zyngier April 1, 2025, 6:13 p.m. UTC | #1
On Tue, 01 Apr 2025 18:05:20 +0100,
Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> 
> A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment
> (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided
> by the architecture (such as, SMC)  which switch control to the firmware, are trapped in EL2 when the guest
> is executes them.
> 
> This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets
> a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which
> hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.
> 
> Overview
> =========
> 
> Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction",
> after loading in arguments into CPU registers. What these arguments consists of and how both these entities 
> communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. 
> This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting
> the HCR_EL2.TSC bit before entering the guest.
> 
> Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE.
> Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to
> OP-TEE.
> 
> OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:
> 
> 1. Notify OP-TEE when a VM is created.
> 2. Notify OP-TEE when a VM is destroyed.
> 3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0.
> 4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest.
> 5. Memory shared by the VM to OP-TEE has to remain pinned.
> 6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.
> 
> Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is 
> a hypervisor with a mediator in normal world.
> 
> This implementation has been heavily inspired by Xen's OP-TEE
> mediator.

[...]

And I think this inspiration is the source of most of the problems in
this series.

Routing Secure Calls from the guest to whatever is on the secure side
should not be the kernel's job at all. It should be the VMM's job. All
you need to do is to route the SMCs from the guest to userspace, and
we already have all the required infrastructure for that.

It is the VMM that should:

- signal the TEE of VM creation/teardown

- translate between IPAs and host VAs without involving KVM

- let the host TEE driver translate between VAs and PAs and deal with
  the pinning as required, just like it would do for any userspace
  (without ever using the KVM memslot interface)

- proxy requests from the guest to the TEE

- in general, bear the complexity of anything related to the TEE

In short, the VMM is just another piece of userspace using the TEE to
do whatever it wants. The TEE driver on the host must obviously know
about VMs, but that's about it.

Crucially, KVM should:

- be completely TEE agnostic and never call into something that is
  TEE-specific

- allow a TEE implementation entirely in userspace, specially for the
  machines that do not have EL3

As it stands, your design looks completely upside-down. Most of this
code should be userspace code and live in (or close to) the VMM, with
the host kernel only providing the basic primitives, most of which
should already be there.

Thanks,

	M.
Yuvraj Sakshith April 2, 2025, 2:58 a.m. UTC | #2
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
> On Tue, 01 Apr 2025 18:05:20 +0100,
> Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> > 
> > A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment
> > (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided
> > by the architecture (such as, SMC)  which switch control to the firmware, are trapped in EL2 when the guest
> > is executes them.
> > 
> > This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets
> > a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which
> > hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.
> > 
> > Overview
> > =========
> > 
> > Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction",
> > after loading in arguments into CPU registers. What these arguments consists of and how both these entities 
> > communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. 
> > This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting
> > the HCR_EL2.TSC bit before entering the guest.
> > 
> > Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE.
> > Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to
> > OP-TEE.
> > 
> > OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:
> > 
> > 1. Notify OP-TEE when a VM is created.
> > 2. Notify OP-TEE when a VM is destroyed.
> > 3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0.
> > 4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest.
> > 5. Memory shared by the VM to OP-TEE has to remain pinned.
> > 6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.
> > 
> > Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is 
> > a hypervisor with a mediator in normal world.
> > 
> > This implementation has been heavily inspired by Xen's OP-TEE
> > mediator.
> 
> [...]
> 
> And I think this inspiration is the source of most of the problems in
> this series.
> 
> Routing Secure Calls from the guest to whatever is on the secure side
> should not be the kernel's job at all. It should be the VMM's job. All
> you need to do is to route the SMCs from the guest to userspace, and
> we already have all the required infrastructure for that.
>
Yes, this was an argument at the time of designing this solution.

> It is the VMM that should:
> 
> - signal the TEE of VM creation/teardown
> 
> - translate between IPAs and host VAs without involving KVM
> 
> - let the host TEE driver translate between VAs and PAs and deal with
>   the pinning as required, just like it would do for any userspace
>   (without ever using the KVM memslot interface)
> 
> - proxy requests from the guest to the TEE
> 
> - in general, bear the complexity of anything related to the TEE
>

Major reason why I went with placing the implementation inside the kernel is,
	- OP-TEE userspace lib (client) does not support sending SMCs for VM events
	  and needs modification.
	- QEMU (or every other VMM)  will have to be modified.
	- OP-TEE driver is anyways in the kernel. A mediator will just be an addition
		and not a completely new entity.
	- (Potential) issues if we would want to mediate requests from VM which has
	  private mem.
	- Heavy VM exits if guest makes frequent TOS calls.

Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a
strong reason, although arguable.

> In short, the VMM is just another piece of userspace using the TEE to
> do whatever it wants. The TEE driver on the host must obviously know
> about VMs, but that's about it.
> 
> Crucially, KVM should:
> 
> - be completely TEE agnostic and never call into something that is
>   TEE-specific
> 
> - allow a TEE implementation entirely in userspace, specially for the
>   machines that do not have EL3
>

Yes, you're right. Although I believe there still are some changes that need to be made
to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call.

So, having an implementation completely in VMM without any change in KVM might be challenging,
any potential solutions are welcome.
 
> As it stands, your design looks completely upside-down. Most of this
> code should be userspace code and live in (or close to) the VMM, with
> the host kernel only providing the basic primitives, most of which
> should already be there.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Jazz isn't dead. It just smells funny.
Marc Zyngier April 2, 2025, 8:42 a.m. UTC | #3
On Wed, 02 Apr 2025 03:58:48 +0100,
Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> 
> On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
> > On Tue, 01 Apr 2025 18:05:20 +0100,
> > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> > >

[...]

> > > This implementation has been heavily inspired by Xen's OP-TEE
> > > mediator.
> > 
> > [...]
> > 
> > And I think this inspiration is the source of most of the problems in
> > this series.
> > 
> > Routing Secure Calls from the guest to whatever is on the secure side
> > should not be the kernel's job at all. It should be the VMM's job. All
> > you need to do is to route the SMCs from the guest to userspace, and
> > we already have all the required infrastructure for that.
> >
> Yes, this was an argument at the time of designing this solution.
>
> > It is the VMM that should:
> > 
> > - signal the TEE of VM creation/teardown
> > 
> > - translate between IPAs and host VAs without involving KVM
> > 
> > - let the host TEE driver translate between VAs and PAs and deal with
> >   the pinning as required, just like it would do for any userspace
> >   (without ever using the KVM memslot interface)
> > 
> > - proxy requests from the guest to the TEE
> > 
> > - in general, bear the complexity of anything related to the TEE
> >
> 
> Major reason why I went with placing the implementation inside the kernel is,
> 	- OP-TEE userspace lib (client) does not support sending SMCs for VM events
> 	  and needs modification.
> 	- QEMU (or every other VMM)  will have to be modified.

Sure. And what? New feature, new API, new code. And what will happen
once someone wants to use something other than OP-TEE? Or one of the
many forks of OP-TEE that have a completely different ABI (cue the
Android forks -- yes, plural)?

> 	- OP-TEE driver is anyways in the kernel. A mediator will just be an addition
> 		and not a completely new entity.

Of course not. The TEE can be anywhere I want. On another machine if I
decide so. Just because OP-TEE has a very simplistic model doesn't
mean we have to be constrained by it.

> 	- (Potential) issues if we would want to mediate requests from VM which has
> 	  private mem.

Private memory means that not even the host has access to it, as it is
the case with pKVM. How would that be an issue?

> 	- Heavy VM exits if guest makes frequent TOS calls.

Sorry, I have to completely dismiss the argument here. I'm not even
remotely considering performance for something that is essentially a
full context switch of the whole machine. By definition, calling into
EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and
an additional exit to userspace will hardly register for anything
other than a pointless latency benchmark.

> 
> Hence, the thought of making changes to too many entities (libteec,
> VMM, etc.) was a strong reason, although arguable.

It is a *terrible* reason. By this reasoning, we would have subsumed
the whole VMM into the kernel (just like Xen), because "we don't want
to change userspace".

Furthermore, you are not even considering basic things such as
permissions. Your approach completely circumvents any form of access
control, meaning that if any user that can create a VM can talk to the
TEE, even if they don't have access to the TEE driver.

Yes, you could replicate access permission, SE-Linux, seccomp (and the
rest of the security theater) at the KVM/TEE boundary, making the
whole thing even more of a twisted mess.

Or you could simply do the right thing and let the kernel do its job
the way it was intended by using the syscall interface from userspace.

> 
> > In short, the VMM is just another piece of userspace using the TEE to
> > do whatever it wants. The TEE driver on the host must obviously know
> > about VMs, but that's about it.
> > 
> > Crucially, KVM should:
> > 
> > - be completely TEE agnostic and never call into something that is
> >   TEE-specific
> > 
> > - allow a TEE implementation entirely in userspace, specially for the
> >   machines that do not have EL3
> >
> 
> Yes, you're right. Although I believe there still are some changes
> that need to be made to KVM for facilitating this. For example,
> kvm_smccc_get_action() would deny TOS call.

If something is missing in KVM to allow routing of SMCs to userspace,
I'm more than happy to entertain the change.

> So, having an implementation completely in VMM without any change in
> KVM might be challenging, any potential solutions are welcome.

I've said what I have to say already, and pointed you in a direction
that I see as both correct and maintainable.

Thanks,

	M.
Yuvraj Sakshith April 2, 2025, 11:19 a.m. UTC | #4
On Wed, Apr 02, 2025 at 09:42:39AM +0100, Marc Zyngier wrote:
> On Wed, 02 Apr 2025 03:58:48 +0100,
> Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> > 
> > On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
> > > On Tue, 01 Apr 2025 18:05:20 +0100,
> > > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote:
> > > >
> 
> [...]
> 
> > > > This implementation has been heavily inspired by Xen's OP-TEE
> > > > mediator.
> > > 
> > > [...]
> > > 
> > > And I think this inspiration is the source of most of the problems in
> > > this series.
> > > 
> > > Routing Secure Calls from the guest to whatever is on the secure side
> > > should not be the kernel's job at all. It should be the VMM's job. All
> > > you need to do is to route the SMCs from the guest to userspace, and
> > > we already have all the required infrastructure for that.
> > >
> > Yes, this was an argument at the time of designing this solution.
> >
> > > It is the VMM that should:
> > > 
> > > - signal the TEE of VM creation/teardown
> > > 
> > > - translate between IPAs and host VAs without involving KVM
> > > 
> > > - let the host TEE driver translate between VAs and PAs and deal with
> > >   the pinning as required, just like it would do for any userspace
> > >   (without ever using the KVM memslot interface)
> > > 
> > > - proxy requests from the guest to the TEE
> > > 
> > > - in general, bear the complexity of anything related to the TEE
> > >
> > 
> > Major reason why I went with placing the implementation inside the kernel is,
> > 	- OP-TEE userspace lib (client) does not support sending SMCs for VM events
> > 	  and needs modification.
> > 	- QEMU (or every other VMM)  will have to be modified.
> 
> Sure. And what? New feature, new API, new code. And what will happen
> once someone wants to use something other than OP-TEE? Or one of the
> many forks of OP-TEE that have a completely different ABI (cue the
> Android forks -- yes, plural)?

If something other than OP-TEE has to be supported, a specific mediator
(such as drivers/tee/optee/optee_mediator.c) has to be constructed
with handlers hooked via tee_mediator_register_ops().

But yes, the ABI might change and the implementor has the freedom to
mediate it as required.

> > 	- OP-TEE driver is anyways in the kernel. A mediator will just be an addition
> > 		and not a completely new entity.
> 
> Of course not. The TEE can be anywhere I want. On another machine if I
> decide so. Just because OP-TEE has a very simplistic model doesn't
> mean we have to be constrained by it.
> 
> > 	- (Potential) issues if we would want to mediate requests from VM which has
> > 	  private mem.
> 
> Private memory means that not even the host has access to it, as it is
> the case with pKVM. How would that be an issue?
>

Guest shares memory to OP-TEE through a buffer filled with pointers, which
the mediator has to read for IPA->PA translations of all these pointers.
VMM wont be able to read these if memory is private.

But, this is a "potential" solution and if at all the mediator is moved to VMM,
this is completely ruled out.
 
> > 	- Heavy VM exits if guest makes frequent TOS calls.
> 
> Sorry, I have to completely dismiss the argument here. I'm not even
> remotely considering performance for something that is essentially a
> full context switch of the whole machine. By definition, calling into
> EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and
> an additional exit to userspace will hardly register for anything
> other than a pointless latency benchmark.
> 
Okay, makes sense.
> > 
> > Hence, the thought of making changes to too many entities (libteec,
> > VMM, etc.) was a strong reason, although arguable.
> 
> It is a *terrible* reason. By this reasoning, we would have subsumed
> the whole VMM into the kernel (just like Xen), because "we don't want
> to change userspace".
> 
> Furthermore, you are not even considering basic things such as
> permissions. Your approach completely circumvents any form of access
> control, meaning that if any user that can create a VM can talk to the
> TEE, even if they don't have access to the TEE driver.

Well, this is a good point. OP-TEE built for NS-Virt supports handles calls
from different VMs under different MMU partitions (will need to go off track
to explain this). But, each VM's state and data remains isolated internally
in S-EL1.

> Yes, you could replicate access permission, SE-Linux, seccomp (and the
> rest of the security theater) at the KVM/TEE boundary, making the
> whole thing even more of a twisted mess.
> 
> Or you could simply do the right thing and let the kernel do its job
> the way it was intended by using the syscall interface from userspace.
> 
> > 
> > > In short, the VMM is just another piece of userspace using the TEE to
> > > do whatever it wants. The TEE driver on the host must obviously know
> > > about VMs, but that's about it.
> > > 
> > > Crucially, KVM should:
> > > 
> > > - be completely TEE agnostic and never call into something that is
> > >   TEE-specific
> > > 
> > > - allow a TEE implementation entirely in userspace, specially for the
> > >   machines that do not have EL3
> > >
> > 
> > Yes, you're right. Although I believe there still are some changes
> > that need to be made to KVM for facilitating this. For example,
> > kvm_smccc_get_action() would deny TOS call.
> 
> If something is missing in KVM to allow routing of SMCs to userspace,
> I'm more than happy to entertain the change.

Okay.

> > So, having an implementation completely in VMM without any change in
> > KVM might be challenging, any potential solutions are welcome.
> 
> I've said what I have to say already, and pointed you in a direction
> that I see as both correct and maintainable.
> 

Yes, I get your point on placing mediator in VMM. And now that I think of it,
I believe I can make an improvement.

But yes, since too many entities are involved, the design of this solution has been
a nightmare. Good to have been pushed this way.

> Thanks,
> 
> 	M.
> 
> -- 
> Jazz isn't dead. It just smells funny.

Thanks,
Yuvraj Sakshith