Message ID | 20250401170527.344092-1-yuvraj.kernel@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: optee: Introduce OP-TEE Mediator for exposing secure world to KVM guests | expand |
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment > (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided > by the architecture (such as, SMC) which switch control to the firmware, are trapped in EL2 when the guest > is executes them. > > This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets > a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which > hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE. > > Overview > ========= > > Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction", > after loading in arguments into CPU registers. What these arguments consists of and how both these entities > communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. > This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting > the HCR_EL2.TSC bit before entering the guest. > > Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE. > Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to > OP-TEE. > > OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor: > > 1. Notify OP-TEE when a VM is created. > 2. Notify OP-TEE when a VM is destroyed. > 3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0. > 4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest. > 5. Memory shared by the VM to OP-TEE has to remain pinned. > 6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE. > > Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is > a hypervisor with a mediator in normal world. > > This implementation has been heavily inspired by Xen's OP-TEE > mediator. [...] And I think this inspiration is the source of most of the problems in this series. Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that. It is the VMM that should: - signal the TEE of VM creation/teardown - translate between IPAs and host VAs without involving KVM - let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface) - proxy requests from the guest to the TEE - in general, bear the complexity of anything related to the TEE In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it. Crucially, KVM should: - be completely TEE agnostic and never call into something that is TEE-specific - allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3 As it stands, your design looks completely upside-down. Most of this code should be userspace code and live in (or close to) the VMM, with the host kernel only providing the basic primitives, most of which should already be there. Thanks, M.
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote: > On Tue, 01 Apr 2025 18:05:20 +0100, > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > > > A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment > > (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided > > by the architecture (such as, SMC) which switch control to the firmware, are trapped in EL2 when the guest > > is executes them. > > > > This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets > > a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which > > hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE. > > > > Overview > > ========= > > > > Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction", > > after loading in arguments into CPU registers. What these arguments consists of and how both these entities > > communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. > > This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting > > the HCR_EL2.TSC bit before entering the guest. > > > > Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE. > > Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to > > OP-TEE. > > > > OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor: > > > > 1. Notify OP-TEE when a VM is created. > > 2. Notify OP-TEE when a VM is destroyed. > > 3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0. > > 4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest. > > 5. Memory shared by the VM to OP-TEE has to remain pinned. > > 6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE. > > > > Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is > > a hypervisor with a mediator in normal world. > > > > This implementation has been heavily inspired by Xen's OP-TEE > > mediator. > > [...] > > And I think this inspiration is the source of most of the problems in > this series. > > Routing Secure Calls from the guest to whatever is on the secure side > should not be the kernel's job at all. It should be the VMM's job. All > you need to do is to route the SMCs from the guest to userspace, and > we already have all the required infrastructure for that. > Yes, this was an argument at the time of designing this solution. > It is the VMM that should: > > - signal the TEE of VM creation/teardown > > - translate between IPAs and host VAs without involving KVM > > - let the host TEE driver translate between VAs and PAs and deal with > the pinning as required, just like it would do for any userspace > (without ever using the KVM memslot interface) > > - proxy requests from the guest to the TEE > > - in general, bear the complexity of anything related to the TEE > Major reason why I went with placing the implementation inside the kernel is, - OP-TEE userspace lib (client) does not support sending SMCs for VM events and needs modification. - QEMU (or every other VMM) will have to be modified. - OP-TEE driver is anyways in the kernel. A mediator will just be an addition and not a completely new entity. - (Potential) issues if we would want to mediate requests from VM which has private mem. - Heavy VM exits if guest makes frequent TOS calls. Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a strong reason, although arguable. > In short, the VMM is just another piece of userspace using the TEE to > do whatever it wants. The TEE driver on the host must obviously know > about VMs, but that's about it. > > Crucially, KVM should: > > - be completely TEE agnostic and never call into something that is > TEE-specific > > - allow a TEE implementation entirely in userspace, specially for the > machines that do not have EL3 > Yes, you're right. Although I believe there still are some changes that need to be made to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call. So, having an implementation completely in VMM without any change in KVM might be challenging, any potential solutions are welcome. > As it stands, your design looks completely upside-down. Most of this > code should be userspace code and live in (or close to) the VMM, with > the host kernel only providing the basic primitives, most of which > should already be there. > > Thanks, > > M. > > -- > Jazz isn't dead. It just smells funny.
On Wed, 02 Apr 2025 03:58:48 +0100, Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote: > > On Tue, 01 Apr 2025 18:05:20 +0100, > > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > > [...] > > > This implementation has been heavily inspired by Xen's OP-TEE > > > mediator. > > > > [...] > > > > And I think this inspiration is the source of most of the problems in > > this series. > > > > Routing Secure Calls from the guest to whatever is on the secure side > > should not be the kernel's job at all. It should be the VMM's job. All > > you need to do is to route the SMCs from the guest to userspace, and > > we already have all the required infrastructure for that. > > > Yes, this was an argument at the time of designing this solution. > > > It is the VMM that should: > > > > - signal the TEE of VM creation/teardown > > > > - translate between IPAs and host VAs without involving KVM > > > > - let the host TEE driver translate between VAs and PAs and deal with > > the pinning as required, just like it would do for any userspace > > (without ever using the KVM memslot interface) > > > > - proxy requests from the guest to the TEE > > > > - in general, bear the complexity of anything related to the TEE > > > > Major reason why I went with placing the implementation inside the kernel is, > - OP-TEE userspace lib (client) does not support sending SMCs for VM events > and needs modification. > - QEMU (or every other VMM) will have to be modified. Sure. And what? New feature, new API, new code. And what will happen once someone wants to use something other than OP-TEE? Or one of the many forks of OP-TEE that have a completely different ABI (cue the Android forks -- yes, plural)? > - OP-TEE driver is anyways in the kernel. A mediator will just be an addition > and not a completely new entity. Of course not. The TEE can be anywhere I want. On another machine if I decide so. Just because OP-TEE has a very simplistic model doesn't mean we have to be constrained by it. > - (Potential) issues if we would want to mediate requests from VM which has > private mem. Private memory means that not even the host has access to it, as it is the case with pKVM. How would that be an issue? > - Heavy VM exits if guest makes frequent TOS calls. Sorry, I have to completely dismiss the argument here. I'm not even remotely considering performance for something that is essentially a full context switch of the whole machine. By definition, calling into EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and an additional exit to userspace will hardly register for anything other than a pointless latency benchmark. > > Hence, the thought of making changes to too many entities (libteec, > VMM, etc.) was a strong reason, although arguable. It is a *terrible* reason. By this reasoning, we would have subsumed the whole VMM into the kernel (just like Xen), because "we don't want to change userspace". Furthermore, you are not even considering basic things such as permissions. Your approach completely circumvents any form of access control, meaning that if any user that can create a VM can talk to the TEE, even if they don't have access to the TEE driver. Yes, you could replicate access permission, SE-Linux, seccomp (and the rest of the security theater) at the KVM/TEE boundary, making the whole thing even more of a twisted mess. Or you could simply do the right thing and let the kernel do its job the way it was intended by using the syscall interface from userspace. > > > In short, the VMM is just another piece of userspace using the TEE to > > do whatever it wants. The TEE driver on the host must obviously know > > about VMs, but that's about it. > > > > Crucially, KVM should: > > > > - be completely TEE agnostic and never call into something that is > > TEE-specific > > > > - allow a TEE implementation entirely in userspace, specially for the > > machines that do not have EL3 > > > > Yes, you're right. Although I believe there still are some changes > that need to be made to KVM for facilitating this. For example, > kvm_smccc_get_action() would deny TOS call. If something is missing in KVM to allow routing of SMCs to userspace, I'm more than happy to entertain the change. > So, having an implementation completely in VMM without any change in > KVM might be challenging, any potential solutions are welcome. I've said what I have to say already, and pointed you in a direction that I see as both correct and maintainable. Thanks, M.
On Wed, Apr 02, 2025 at 09:42:39AM +0100, Marc Zyngier wrote: > On Wed, 02 Apr 2025 03:58:48 +0100, > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > > > On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote: > > > On Tue, 01 Apr 2025 18:05:20 +0100, > > > Yuvraj Sakshith <yuvraj.kernel@gmail.com> wrote: > > > > > > [...] > > > > > This implementation has been heavily inspired by Xen's OP-TEE > > > > mediator. > > > > > > [...] > > > > > > And I think this inspiration is the source of most of the problems in > > > this series. > > > > > > Routing Secure Calls from the guest to whatever is on the secure side > > > should not be the kernel's job at all. It should be the VMM's job. All > > > you need to do is to route the SMCs from the guest to userspace, and > > > we already have all the required infrastructure for that. > > > > > Yes, this was an argument at the time of designing this solution. > > > > > It is the VMM that should: > > > > > > - signal the TEE of VM creation/teardown > > > > > > - translate between IPAs and host VAs without involving KVM > > > > > > - let the host TEE driver translate between VAs and PAs and deal with > > > the pinning as required, just like it would do for any userspace > > > (without ever using the KVM memslot interface) > > > > > > - proxy requests from the guest to the TEE > > > > > > - in general, bear the complexity of anything related to the TEE > > > > > > > Major reason why I went with placing the implementation inside the kernel is, > > - OP-TEE userspace lib (client) does not support sending SMCs for VM events > > and needs modification. > > - QEMU (or every other VMM) will have to be modified. > > Sure. And what? New feature, new API, new code. And what will happen > once someone wants to use something other than OP-TEE? Or one of the > many forks of OP-TEE that have a completely different ABI (cue the > Android forks -- yes, plural)? If something other than OP-TEE has to be supported, a specific mediator (such as drivers/tee/optee/optee_mediator.c) has to be constructed with handlers hooked via tee_mediator_register_ops(). But yes, the ABI might change and the implementor has the freedom to mediate it as required. > > - OP-TEE driver is anyways in the kernel. A mediator will just be an addition > > and not a completely new entity. > > Of course not. The TEE can be anywhere I want. On another machine if I > decide so. Just because OP-TEE has a very simplistic model doesn't > mean we have to be constrained by it. > > > - (Potential) issues if we would want to mediate requests from VM which has > > private mem. > > Private memory means that not even the host has access to it, as it is > the case with pKVM. How would that be an issue? > Guest shares memory to OP-TEE through a buffer filled with pointers, which the mediator has to read for IPA->PA translations of all these pointers. VMM wont be able to read these if memory is private. But, this is a "potential" solution and if at all the mediator is moved to VMM, this is completely ruled out. > > - Heavy VM exits if guest makes frequent TOS calls. > > Sorry, I have to completely dismiss the argument here. I'm not even > remotely considering performance for something that is essentially a > full context switch of the whole machine. By definition, calling into > EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and > an additional exit to userspace will hardly register for anything > other than a pointless latency benchmark. > Okay, makes sense. > > > > Hence, the thought of making changes to too many entities (libteec, > > VMM, etc.) was a strong reason, although arguable. > > It is a *terrible* reason. By this reasoning, we would have subsumed > the whole VMM into the kernel (just like Xen), because "we don't want > to change userspace". > > Furthermore, you are not even considering basic things such as > permissions. Your approach completely circumvents any form of access > control, meaning that if any user that can create a VM can talk to the > TEE, even if they don't have access to the TEE driver. Well, this is a good point. OP-TEE built for NS-Virt supports handles calls from different VMs under different MMU partitions (will need to go off track to explain this). But, each VM's state and data remains isolated internally in S-EL1. > Yes, you could replicate access permission, SE-Linux, seccomp (and the > rest of the security theater) at the KVM/TEE boundary, making the > whole thing even more of a twisted mess. > > Or you could simply do the right thing and let the kernel do its job > the way it was intended by using the syscall interface from userspace. > > > > > > In short, the VMM is just another piece of userspace using the TEE to > > > do whatever it wants. The TEE driver on the host must obviously know > > > about VMs, but that's about it. > > > > > > Crucially, KVM should: > > > > > > - be completely TEE agnostic and never call into something that is > > > TEE-specific > > > > > > - allow a TEE implementation entirely in userspace, specially for the > > > machines that do not have EL3 > > > > > > > Yes, you're right. Although I believe there still are some changes > > that need to be made to KVM for facilitating this. For example, > > kvm_smccc_get_action() would deny TOS call. > > If something is missing in KVM to allow routing of SMCs to userspace, > I'm more than happy to entertain the change. Okay. > > So, having an implementation completely in VMM without any change in > > KVM might be challenging, any potential solutions are welcome. > > I've said what I have to say already, and pointed you in a direction > that I see as both correct and maintainable. > Yes, I get your point on placing mediator in VMM. And now that I think of it, I believe I can make an improvement. But yes, since too many entities are involved, the design of this solution has been a nightmare. Good to have been pushed this way. > Thanks, > > M. > > -- > Jazz isn't dead. It just smells funny. Thanks, Yuvraj Sakshith