Message ID | 20250107142719.179636-9-yilun.xu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Private MMIO support for private assigned dev | expand |
On Mon, Jan 13, 2025 at 12:49:35PM -0400, Jason Gunthorpe wrote: > On Sat, Jan 11, 2025 at 11:48:06AM +0800, Xu Yilun wrote: > > > > > > can be sure what is the correct UAPI. In other words, make the > > > > > VFIO device into a CC device should also prevent mmaping it and so on. > > > > > > > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). > > > > > > I think you need to start the TDI process much earlier. Some arches > > > are going to need work to prepare the TDI before the VM is started. > > > > Could you elaborate more on that? AFAICS Intel & AMD are all good on > > "late bind", but not sure for other architectures. > > I'm not sure about this, the topic has been confused a bit, and people > often seem to misunderstand what the full scenario actually is. :\ Yes, it is in early stage and open to discuss. > > What I'm talking abou there is that you will tell the secure world to > create vPCI function that has the potential to be secure "TDI run" > down the road. The VM will decide when it reaches the run state. This Yes. > is needed so the secure world can prepare anything it needs prior to > starting the VM. OK. From Dan's patchset there are some touch point for vendor tsm drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). Maybe we could move to Dan's thread for discussion. https://lore.kernel.org/linux-coco/173343739517.1074769.13134786548545925484.stgit@dwillia2-xfh.jf.intel.com/ > Setting up secure vIOMMU emulation, for instance. I I think this could be done at VM late bind time. > expect ARM will need this, I'd be surprised if AMD actually doesn't in > the full scenario with secure viommu. AFAICS, AMD needs secure viommu. > > It should not be a surprise to the secure world after the VM has > started that suddenly it learns about a vPCI function that wants to be With some pre-VM stage touch point, it wouldn't be all of a sudden. > secure. This should all be pre-arranged as possible before starting But our current implementation is not to prepare as much as possible, but only necessary, so most of the secure work for vPCI function is done at late bind time. Thank, Yilun > the VM, even if alot of steps happen after the VM starts running (or > maybe don't happen at all). > > Jason
On Fri, Jan 17, 2025 at 09:25:23AM -0400, Jason Gunthorpe wrote: > On Fri, Jan 17, 2025 at 09:57:40AM +0800, Baolu Lu wrote: > > On 1/15/25 21:01, Jason Gunthorpe wrote: > > > On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote: > > > > On 15/1/25 00:35, Jason Gunthorpe wrote: > > > > > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: > > > > > > > > > > > > is needed so the secure world can prepare anything it needs prior to > > > > > > > starting the VM. > > > > > > OK. From Dan's patchset there are some touch point for vendor tsm > > > > > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). > > > > > > > > > > > > Maybe we could move to Dan's thread for discussion. > > > > > > > > > > > > https://lore.kernel.org/linux- > > > > > > coco/173343739517.1074769.13134786548545925484.stgit@dwillia2- > > > > > > xfh.jf.intel.com/ > > > > > I think Dan's series is different, any uapi from that series should > > > > > not be used in the VMM case. We need proper vfio APIs for the VMM to > > > > > use. I would expect VFIO to be calling some of that infrastructure. > > > > Something like this experiment? > > > > > > > > https://github.com/aik/linux/commit/ > > > > ce052512fb8784e19745d4cb222e23cabc57792e > > > Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be > > > hosting those APIs, the above does seem to be a reasonable direction. > > > > > > When the various fds are closed I would expect the kernel to unbind > > > and restore the device back. > > > > I am curious about the value of tsm binding against an iomnufd_vdevice > > instead of the physical iommufd_device. > > Interesting question > > > It is likely that the kvm pointer should be passed to iommufd during the > > creation of a viommu object. > > Yes, I fully expect this > > > If my recollection is correct, the arm > > smmu-v3 needs it to obtain the vmid to setup the userspace event queue: > > Right now it will use a VMID unrelated to KVM. BTM support on ARM will > require syncing the VMID with KVM. > > AMD and Intel may require the KVM for some reason as well. > > For CC I'm expecting the KVM fd to be the handle for the cVM, so any > RPCs that want to call into the secure world need the KVM FD to get > the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI > information and the cVM's handle. I also expect this. > > From that perspective it does make sense that any cVM related APIs, > like "bind to cVM" would be against the VDEVICE where we have a link > to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is > part of the object hierarchy, but does not necessarily have to force a > vIOMMU to appear in the cVM. > > But it also seems to me that VFIO should be able to support putting > the device into the RUN state Firstly I think VFIO should support putting device into *LOCKED* state. From LOCKED to RUN, there are many evidence fetching and attestation things that only guest cares. I don't think VFIO needs to opt-in. But that doesn't impact this concern. I actually think VFIO should provide 'bind' uAPI to support these device side configuration things rather than iommufd uAPI. IIUC iommufd should only do the setup on IOMMU side. The switching of TDISP state to LOCKED involves device side differences that should be awared by the device owner, VFIO driver. E.g. as we previously mentioned, to check if all MMIOs are never mapped. Another E.g. invalidate MMIOs when device is to be LOCKED, some Pseudo Code: @@ -1494,7 +1494,15 @@ static int vfio_pci_ioctl_tsm_bind(struct vfio_pci_core_device *vdev, if (!kvm) return -ENOENT; + down_write(&vdev->memory_lock); + vfio_pci_dma_buf_move(vdev, true); + ret = pci_tsm_dev_bind(pdev, kvm, &bind.intf_id); + + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); + up_write(&vdev->memory_lock); BTW, we may still need viommu/vdevice APIs during 'bind', if some IOMMU side configurations are required by secure world. TDX does have some. > without involving KVM or cVMs. It may not be feasible for all vendors. I believe AMD would have one firmware call that requires cVM handle *AND* move device into LOCKED state. It really depends on firmware implementation. So I'm expecting a coarse TSM verb pci_tsm_dev_bind() for vendors to do any host side preparation and put device into LOCKED state. > > > Intel TDX connect implementation also needs a reference to the kvm > > pointer to obtain the secure EPT information. This is crucial because > > the CPU's page table must be shared with the iommu. > > I thought kvm folks were NAKing this sharing entirely? Or is the I believe this is still Based on the general EPT sharing idea, is it? There are several major reasons for the objection. In general, KVM now has many "page non-present" tricks in EPT, which are not applicable to IOPT. If shared, KVM has to take IOPT concerns into account, which is quite a burden for KVM maintaining. > secure EPT in the secure world and not directly managed by Linux? Yes, the secure EPT is in the secure world and managed by TDX firmware. Now a SW Mirror Secure EPT is introduced in KVM and managed by KVM directly, and KVM will finally use firmware calls to propagate Mirror Secure EPT changes to secure EPT. Secure EPT are controlled by TDX module, basically KVM cannot play any of the tricks. And TDX firmware should ensure any SEPT setting would be applicable for Secure IOPT. I hope this could remove most of the concerns. I remember we've talked about SEPT sharing architechture for TDX TIO before, but didn't get information back from KVM folks. Not sure how things will go. Maybe will find out when we have some patches posted. Thanks, Yilun > > AFAIK AMD is going to mirror the iommu page table like today. > > ARM, I suspect, will not have an "EPT" under Linux control, so > whatever happens will be hidden in their secure world. > > Jason
On Mon, Jan 20, 2025 at 09:25:25AM -0400, Jason Gunthorpe wrote: > On Mon, Jun 24, 2024 at 03:59:53AM +0800, Xu Yilun wrote: > > > But it also seems to me that VFIO should be able to support putting > > > the device into the RUN state > > > > Firstly I think VFIO should support putting device into *LOCKED* state. > > From LOCKED to RUN, there are many evidence fetching and attestation > > things that only guest cares. I don't think VFIO needs to opt-in. > > VFIO is not just about running VMs. If someone wants to run DPDK on > VFIO they should be able to get the device into a RUN state and work > with secure memory without requiring a KVM. Yes there are many steps > to this, but we should imagine how it can work. Interesting Question. I've never thought about native TIO before. And you are also thinking about VFIO usage in CoCo-VM. So I believe VFIO could be able to support putting the device into the RUN state, but no need a uAPI for that, this happens when VFIO works as a TEE attester. In different cases, VFIO plays different roles: 1. TEE helper, but itself is out of TEE. 2. TEE attester, it is within the TEE. 3. TEE user, it is within the TEE. As a TEE helper, it works on a untrusted device and help put the device in LOCKED state, waiting for attestation. For VM use case, VM acts as the attester to do attestation and move device into trusted/RUN state (lets say 'accept'). The attestation and accept could be direct talks between attester and device (maybe via TSM sysfs node), because from LOCKED -> RUN VFIO doesn't change its way of handling device so seems no need to introduce extra uAPIs and complexity just for passing the talks. That's my expectation of VFIO's responsibility as a TEE helper - serve until LOCKED, no care about the rest, UNLOCK rollbacks everything. I imagine in bare metal, if DPDK works as an attester (within TEE) and VFIO still as a TEE helper (out of TEE), this model seems still work. When VFIO works as a TEE user in VM, it means an attester (e.g. PCI subsystem) has already moved the device to RUN state. So VFIO & DPDK are all TEE users, no need to manipulate TDISP state between them. AFAICS, this is the most preferred TIO usage in CoCo-VM. When VFIO works as a TEE attester in VM, it means the VM's PCI subsystem leaves the attestation work to device drivers. VFIO should do the attestation and accept before pass through to DPDK, again no need to manipulate TDISP state between them. I image the possibility TIO happens on bare metal, that a device is configured as waiting for attestation by whatever kernel module, then PCI subsystem or VFIO try to attest, accept and use it, just the same as in CoCo VM. > > > > without involving KVM or cVMs. > > > > It may not be feasible for all vendors. > > It must be. A CC guest with an in kernel driver can definately get the > PCI device into RUN, so VFIO running in the guest should be able as > well. You are talking about VFIO in CoCo-VM as an attester, then definiately yes. > > > I believe AMD would have one firmware call that requires cVM handle > > *AND* move device into LOCKED state. It really depends on firmware > > implementation. > > IMHO, you would not use the secure firmware if you are not using VMs. > > > Yes, the secure EPT is in the secure world and managed by TDX firmware. > > Now a SW Mirror Secure EPT is introduced in KVM and managed by KVM > > directly, and KVM will finally use firmware calls to propagate Mirror > > Secure EPT changes to secure EPT. > > If the secure world managed it then the secure world can have rules > that work with the IOMMU as well.. Yes. Thanks, Yilun > > Jason
On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > for private assignment. For these private assigned devices, disallow > host accessing their MMIO resources. Why? Shouldn't the VMM simply not call mmap? Why does the kernel have to enforce this? Jason
On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > for private assignment. For these private assigned devices, disallow > > host accessing their MMIO resources. > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > to enforce this? MM.. maybe I should not say 'host', instead 'userspace'. I think the kernel part VMM (KVM) has the responsibility to enforce the correct behavior of the userspace part VMM (QEMU). QEMU has no way to touch private memory/MMIO intentionally or accidently. IIUC that's one of the initiative guest_memfd is introduced for private memory. Private MMIO follows. Thanks, Yilun > > Jason
On Thu, Jan 09, 2025 at 12:57:58AM +0800, Xu Yilun wrote: > On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > > for private assignment. For these private assigned devices, disallow > > > host accessing their MMIO resources. > > > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > > to enforce this? > > MM.. maybe I should not say 'host', instead 'userspace'. > > I think the kernel part VMM (KVM) has the responsibility to enforce the > correct behavior of the userspace part VMM (QEMU). QEMU has no way to > touch private memory/MMIO intentionally or accidently. IIUC that's one > of the initiative guest_memfd is introduced for private memory. Private > MMIO follows. Okay, but then why is it a flag like that? I'm expecting a much broader system here to make the VFIO device into a confidential device (like setup the TDI) where we'd have to enforce the private things, communicate with some secure world to assign it, and so on. I want to see a fuller solution to the CC problem in VFIO before we can be sure what is the correct UAPI. In other words, make the VFIO device into a CC device should also prevent mmaping it and so on. So, I would take this out and defer VFIO enforcment to a series which does fuller CC enablement of VFIO. The precursor work should just be avoiding requiring a VMA when installing VFIO MMIO into the KVM and IOMMU stage 2 mappings. Ie by using a FD to get the CPU pfns into iommufd and kvm as you are showing. This works just fine for non-CC devices anyhow and is the necessary building block for making a TDI interface in VFIO. Jason
On Thu, Jan 09, 2025 at 10:40:51AM -0400, Jason Gunthorpe wrote: > On Thu, Jan 09, 2025 at 12:57:58AM +0800, Xu Yilun wrote: > > On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > > > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > > > for private assignment. For these private assigned devices, disallow > > > > host accessing their MMIO resources. > > > > > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > > > to enforce this? > > > > MM.. maybe I should not say 'host', instead 'userspace'. > > > > I think the kernel part VMM (KVM) has the responsibility to enforce the > > correct behavior of the userspace part VMM (QEMU). QEMU has no way to > > touch private memory/MMIO intentionally or accidently. IIUC that's one > > of the initiative guest_memfd is introduced for private memory. Private > > MMIO follows. > > Okay, but then why is it a flag like that? I'm expecting a much This flag is a prerequisite for setting up TDI, or part of the requirement to make a "TDI capable" assigned device. It prevents the userspace mapping at the first place, even as a shared device. We want the device firstly appear as a shared device in CoCo-VM, then do TDI setup (via a tsm verb "bind"). This late bind approach avoids changing the CoCo VM startup routine. In contrast, early bind would easily be broken, especially if bios is not aware of the TDI rule. So then we face with the shared <-> private device conversion in CoCo VM, and in turn shared <-> private MMIO conversion. MMIO region has only one physical backend so it is a bit like in-place conversion which is complicated. I wanna simply the MMIO conversion routine based on the fact that VMM never needs to access assigned MMIO for feature emulation, so always disallow userspace MMIO mapping during the whole lifecycle. That's why the flag is introduced. Patch 6 has similar discription. > broader system here to make the VFIO device into a confidential device > (like setup the TDI) where we'd have to enforce the private things, I plan to introduce a new VFIO ioctl to setup the TDI. > communicate with some secure world to assign it, and so on. Yes, the new VFIO ioctl will communicate with PCI TSM. > > I want to see a fuller solution to the CC problem in VFIO before we MM.. I have something but need more preparation. Whether send out or make a public repo, I'll discuss with internal. > can be sure what is the correct UAPI. In other words, make the > VFIO device into a CC device should also prevent mmaping it and so on. My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). > > So, I would take this out and defer VFIO enforcment to a series which > does fuller CC enablement of VFIO. > > The precursor work should just be avoiding requiring a VMA when > installing VFIO MMIO into the KVM and IOMMU stage 2 mappings. Ie by > using a FD to get the CPU pfns into iommufd and kvm as you are > showing. > > This works just fine for non-CC devices anyhow and is the necessary Yes. It carries out the idea of "KVM maps MMIO resources without firstly mapping into the host" even for normal VM. That's why I think it could be an independent patchset. Thanks, Yilun > building block for making a TDI interface in VFIO. > > Jason
On Fri, Jan 10, 2025 at 12:40:28AM +0800, Xu Yilun wrote: > So then we face with the shared <-> private device conversion in CoCo VM, > and in turn shared <-> private MMIO conversion. MMIO region has only one > physical backend so it is a bit like in-place conversion which is > complicated. I wanna simply the MMIO conversion routine based on the fact > that VMM never needs to access assigned MMIO for feature emulation, so > always disallow userspace MMIO mapping during the whole lifecycle. That's > why the flag is introduced. The VMM can simply not map it if for these cases. As part of the TDI flow the kernel can validate it is not mapped. > > can be sure what is the correct UAPI. In other words, make the > > VFIO device into a CC device should also prevent mmaping it and so on. > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). I think you need to start the TDI process much earlier. Some arches are going to need work to prepare the TDI before the VM is started. The other issue here is that Intel is somewhat different from others and when we build uapi for TDI it has to accommodate everyone. > Yes. It carries out the idea of "KVM maps MMIO resources without firstly > mapping into the host" even for normal VM. That's why I think it could > be an independent patchset. Yes, just remove this patch and other TDI focused stuff. Just infrastructure to move to FD based mapping instead of VMA. Jason
On Fri, Jan 10, 2025 at 09:31:16AM -0400, Jason Gunthorpe wrote: > On Fri, Jan 10, 2025 at 12:40:28AM +0800, Xu Yilun wrote: > > > So then we face with the shared <-> private device conversion in CoCo VM, > > and in turn shared <-> private MMIO conversion. MMIO region has only one > > physical backend so it is a bit like in-place conversion which is > > complicated. I wanna simply the MMIO conversion routine based on the fact > > that VMM never needs to access assigned MMIO for feature emulation, so > > always disallow userspace MMIO mapping during the whole lifecycle. That's > > why the flag is introduced. > > The VMM can simply not map it if for these cases. As part of the TDI > flow the kernel can validate it is not mapped. That's a good point. I can try on that. > > > > can be sure what is the correct UAPI. In other words, make the > > > VFIO device into a CC device should also prevent mmaping it and so on. > > > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). > > I think you need to start the TDI process much earlier. Some arches > are going to need work to prepare the TDI before the VM is started. Could you elaborate more on that? AFAICS Intel & AMD are all good on "late bind", but not sure for other architectures. This relates to the definition of TSM verbs and is the right time we collect the needs for Dan's series. > > The other issue here is that Intel is somewhat different from others > and when we build uapi for TDI it has to accommodate everyone. Sure, this is the aim for PCI TSM core, and VFIO as a PCI TSM user should not be TDX awared. > > > Yes. It carries out the idea of "KVM maps MMIO resources without firstly > > mapping into the host" even for normal VM. That's why I think it could > > be an independent patchset. > > Yes, just remove this patch and other TDI focused stuff. Just > infrastructure to move to FD based mapping instead of VMA. Yes. Thanks, Yilun > > Jason
On Sat, Jan 11, 2025 at 11:48:06AM +0800, Xu Yilun wrote: > > > > can be sure what is the correct UAPI. In other words, make the > > > > VFIO device into a CC device should also prevent mmaping it and so on. > > > > > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). > > > > I think you need to start the TDI process much earlier. Some arches > > are going to need work to prepare the TDI before the VM is started. > > Could you elaborate more on that? AFAICS Intel & AMD are all good on > "late bind", but not sure for other architectures. I'm not sure about this, the topic has been confused a bit, and people often seem to misunderstand what the full scenario actually is. :\ What I'm talking abou there is that you will tell the secure world to create vPCI function that has the potential to be secure "TDI run" down the road. The VM will decide when it reaches the run state. This is needed so the secure world can prepare anything it needs prior to starting the VM. Setting up secure vIOMMU emulation, for instance. I expect ARM will need this, I'd be surprised if AMD actually doesn't in the full scenario with secure viommu. It should not be a surprise to the secure world after the VM has started that suddenly it learns about a vPCI function that wants to be secure. This should all be pre-arranged as possible before starting the VM, even if alot of steps happen after the VM starts running (or maybe don't happen at all). Jason
On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: > > is needed so the secure world can prepare anything it needs prior to > > starting the VM. > > OK. From Dan's patchset there are some touch point for vendor tsm > drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). > > Maybe we could move to Dan's thread for discussion. > > https://lore.kernel.org/linux-coco/173343739517.1074769.13134786548545925484.stgit@dwillia2-xfh.jf.intel.com/ I think Dan's series is different, any uapi from that series should not be used in the VMM case. We need proper vfio APIs for the VMM to use. I would expect VFIO to be calling some of that infrastructure. Really, I don't see a clear sense of how this will look yet. AMD provided some patches along these lines, I have not seem ARM and Intel proposals yet, not do I sense there is alignment. > > Setting up secure vIOMMU emulation, for instance. I > > I think this could be done at VM late bind time. The vIOMMU needs to be setup before the VM boots > > secure. This should all be pre-arranged as possible before starting > > But our current implementation is not to prepare as much as possible, > but only necessary, so most of the secure work for vPCI function is done > at late bind time. That's fine too, but both options need to be valid. Jason
On 15/1/25 00:35, Jason Gunthorpe wrote: > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: > >>> is needed so the secure world can prepare anything it needs prior to >>> starting the VM. >> >> OK. From Dan's patchset there are some touch point for vendor tsm >> drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). >> >> Maybe we could move to Dan's thread for discussion. >> >> https://lore.kernel.org/linux-coco/173343739517.1074769.13134786548545925484.stgit@dwillia2-xfh.jf.intel.com/ > > I think Dan's series is different, any uapi from that series should > not be used in the VMM case. We need proper vfio APIs for the VMM to > use. I would expect VFIO to be calling some of that infrastructure. Something like this experiment? https://github.com/aik/linux/commit/ce052512fb8784e19745d4cb222e23cabc57792e Thanks, > > Really, I don't see a clear sense of how this will look yet. AMD > provided some patches along these lines, I have not seem ARM and Intel > proposals yet, not do I sense there is alignment. > >>> Setting up secure vIOMMU emulation, for instance. I >> >> I think this could be done at VM late bind time. > > The vIOMMU needs to be setup before the VM boots > >>> secure. This should all be pre-arranged as possible before starting >> >> But our current implementation is not to prepare as much as possible, >> but only necessary, so most of the secure work for vPCI function is done >> at late bind time. > > That's fine too, but both options need to be valid. > > Jason
On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote: > On 15/1/25 00:35, Jason Gunthorpe wrote: > > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: > > > > > > is needed so the secure world can prepare anything it needs prior to > > > > starting the VM. > > > > > > OK. From Dan's patchset there are some touch point for vendor tsm > > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). > > > > > > Maybe we could move to Dan's thread for discussion. > > > > > > https://lore.kernel.org/linux-coco/173343739517.1074769.13134786548545925484.stgit@dwillia2-xfh.jf.intel.com/ > > > > I think Dan's series is different, any uapi from that series should > > not be used in the VMM case. We need proper vfio APIs for the VMM to > > use. I would expect VFIO to be calling some of that infrastructure. > > Something like this experiment? > > https://github.com/aik/linux/commit/ce052512fb8784e19745d4cb222e23cabc57792e Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be hosting those APIs, the above does seem to be a reasonable direction. When the various fds are closed I would expect the kernel to unbind and restore the device back. Jason
On 1/15/25 21:01, Jason Gunthorpe wrote: > On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote: >> On 15/1/25 00:35, Jason Gunthorpe wrote: >>> On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: >>> >>>>> is needed so the secure world can prepare anything it needs prior to >>>>> starting the VM. >>>> OK. From Dan's patchset there are some touch point for vendor tsm >>>> drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). >>>> >>>> Maybe we could move to Dan's thread for discussion. >>>> >>>> https://lore.kernel.org/linux- >>>> coco/173343739517.1074769.13134786548545925484.stgit@dwillia2- >>>> xfh.jf.intel.com/ >>> I think Dan's series is different, any uapi from that series should >>> not be used in the VMM case. We need proper vfio APIs for the VMM to >>> use. I would expect VFIO to be calling some of that infrastructure. >> Something like this experiment? >> >> https://github.com/aik/linux/commit/ >> ce052512fb8784e19745d4cb222e23cabc57792e > Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be > hosting those APIs, the above does seem to be a reasonable direction. > > When the various fds are closed I would expect the kernel to unbind > and restore the device back. I am curious about the value of tsm binding against an iomnufd_vdevice instead of the physical iommufd_device. It is likely that the kvm pointer should be passed to iommufd during the creation of a viommu object. If my recollection is correct, the arm smmu-v3 needs it to obtain the vmid to setup the userspace event queue: struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, unsigned int viommu_type) { [...] /* FIXME Move VMID allocation from the S2 domain allocation to here */ vsmmu->vmid = s2_parent->s2_cfg.vmid; return &vsmmu->core; } Intel TDX connect implementation also needs a reference to the kvm pointer to obtain the secure EPT information. This is crucial because the CPU's page table must be shared with the iommu. I am not sure whether the amd architecture has a similar requirement. --- baolu
On Fri, Jan 17, 2025 at 09:57:40AM +0800, Baolu Lu wrote: > On 1/15/25 21:01, Jason Gunthorpe wrote: > > On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote: > > > On 15/1/25 00:35, Jason Gunthorpe wrote: > > > > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: > > > > > > > > > > is needed so the secure world can prepare anything it needs prior to > > > > > > starting the VM. > > > > > OK. From Dan's patchset there are some touch point for vendor tsm > > > > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). > > > > > > > > > > Maybe we could move to Dan's thread for discussion. > > > > > > > > > > https://lore.kernel.org/linux- > > > > > coco/173343739517.1074769.13134786548545925484.stgit@dwillia2- > > > > > xfh.jf.intel.com/ > > > > I think Dan's series is different, any uapi from that series should > > > > not be used in the VMM case. We need proper vfio APIs for the VMM to > > > > use. I would expect VFIO to be calling some of that infrastructure. > > > Something like this experiment? > > > > > > https://github.com/aik/linux/commit/ > > > ce052512fb8784e19745d4cb222e23cabc57792e > > Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be > > hosting those APIs, the above does seem to be a reasonable direction. > > > > When the various fds are closed I would expect the kernel to unbind > > and restore the device back. > > I am curious about the value of tsm binding against an iomnufd_vdevice > instead of the physical iommufd_device. Interesting question > It is likely that the kvm pointer should be passed to iommufd during the > creation of a viommu object. Yes, I fully expect this > If my recollection is correct, the arm > smmu-v3 needs it to obtain the vmid to setup the userspace event queue: Right now it will use a VMID unrelated to KVM. BTM support on ARM will require syncing the VMID with KVM. AMD and Intel may require the KVM for some reason as well. For CC I'm expecting the KVM fd to be the handle for the cVM, so any RPCs that want to call into the secure world need the KVM FD to get the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI information and the cVM's handle. From that perspective it does make sense that any cVM related APIs, like "bind to cVM" would be against the VDEVICE where we have a link to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is part of the object hierarchy, but does not necessarily have to force a vIOMMU to appear in the cVM. But it also seems to me that VFIO should be able to support putting the device into the RUN state without involving KVM or cVMs. > Intel TDX connect implementation also needs a reference to the kvm > pointer to obtain the secure EPT information. This is crucial because > the CPU's page table must be shared with the iommu. I thought kvm folks were NAKing this sharing entirely? Or is the secure EPT in the secure world and not directly managed by Linux? AFAIK AMD is going to mirror the iommu page table like today. ARM, I suspect, will not have an "EPT" under Linux control, so whatever happens will be hidden in their secure world. Jason
On 1/17/25 21:25, Jason Gunthorpe wrote: >> If my recollection is correct, the arm >> smmu-v3 needs it to obtain the vmid to setup the userspace event queue: > Right now it will use a VMID unrelated to KVM. BTM support on ARM will > require syncing the VMID with KVM. > > AMD and Intel may require the KVM for some reason as well. > > For CC I'm expecting the KVM fd to be the handle for the cVM, so any > RPCs that want to call into the secure world need the KVM FD to get > the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI > information and the cVM's handle. > > From that perspective it does make sense that any cVM related APIs, > like "bind to cVM" would be against the VDEVICE where we have a link > to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is > part of the object hierarchy, but does not necessarily have to force a > vIOMMU to appear in the cVM. Yea, from that perspective, treating the vDEVICE object as the primary focus for the uAPIs of cVMs is more reasonable. This simplifies the iommu drivers by eliminating the need to verify hardware capabilities and compatibilities within each callback. Everything could be done in one shot when allocating the vDEVICE object. > > But it also seems to me that VFIO should be able to support putting > the device into the RUN state without involving KVM or cVMs. Then, it appears that BIND ioctl should be part of VFIO uAPI. >> Intel TDX connect implementation also needs a reference to the kvm >> pointer to obtain the secure EPT information. This is crucial because >> the CPU's page table must be shared with the iommu. > I thought kvm folks were NAKing this sharing entirely? Or is the Yes, previous idea of *generic* EPT sharing was objected by the kvm folks. The primary concern, as I understand it, is that KVM has many "page non-present" tricks in EPT, which are not applicable to IOPT. Consequently, KVM must now consider IOPT requirements when sharing the EPT with the IOMMU, which presents a significant maintenance burden for the KVM folks. > secure EPT in the secure world and not directly managed by Linux? But Secure EPT is managed by the TDX module within the secure world. Crucially, KVM does not involve any such mechanisms. The firmware guarantees that any Secure EPT configuration will be applicable to Secure IOPT. This approach may alleviate concerns raised by the KVM community. > AFAIK AMD is going to mirror the iommu page table like today. > > ARM, I suspect, will not have an "EPT" under Linux control, so > whatever happens will be hidden in their secure world. Intel also does not have an EPT under Linux control. The KVM has a mirrored page table and syncs it with the secure EPT managed by firmware every time it is updated through the ABIs defined by the firmware. Thanks, baolu
On 18/1/25 00:25, Jason Gunthorpe wrote: > On Fri, Jan 17, 2025 at 09:57:40AM +0800, Baolu Lu wrote: >> On 1/15/25 21:01, Jason Gunthorpe wrote: >>> On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote: >>>> On 15/1/25 00:35, Jason Gunthorpe wrote: >>>>> On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote: >>>>> >>>>>>> is needed so the secure world can prepare anything it needs prior to >>>>>>> starting the VM. >>>>>> OK. From Dan's patchset there are some touch point for vendor tsm >>>>>> drivers to do secure world preparation. e.g. pci_tsm_ops::probe(). >>>>>> >>>>>> Maybe we could move to Dan's thread for discussion. >>>>>> >>>>>> https://lore.kernel.org/linux- >>>>>> coco/173343739517.1074769.13134786548545925484.stgit@dwillia2- >>>>>> xfh.jf.intel.com/ >>>>> I think Dan's series is different, any uapi from that series should >>>>> not be used in the VMM case. We need proper vfio APIs for the VMM to >>>>> use. I would expect VFIO to be calling some of that infrastructure. >>>> Something like this experiment? >>>> >>>> https://github.com/aik/linux/commit/ >>>> ce052512fb8784e19745d4cb222e23cabc57792e >>> Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be >>> hosting those APIs, the above does seem to be a reasonable direction. >>> >>> When the various fds are closed I would expect the kernel to unbind >>> and restore the device back. >> >> I am curious about the value of tsm binding against an iomnufd_vdevice >> instead of the physical iommufd_device. > > Interesting question > >> It is likely that the kvm pointer should be passed to iommufd during the >> creation of a viommu object. > > Yes, I fully expect this > >> If my recollection is correct, the arm >> smmu-v3 needs it to obtain the vmid to setup the userspace event queue: > > Right now it will use a VMID unrelated to KVM. BTM support on ARM will > require syncing the VMID with KVM. > > AMD and Intel may require the KVM for some reason as well. > > For CC I'm expecting the KVM fd to be the handle for the cVM, so any > RPCs that want to call into the secure world need the KVM FD to get > the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI > information and the cVM's handle. And keep KVM fd open until unbind? Or just for the short time to call the PSP? > From that perspective it does make sense that any cVM related APIs, > like "bind to cVM" would be against the VDEVICE where we have a link > to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is > part of the object hierarchy, but does not necessarily have to force a > vIOMMU to appear in the cVM. Well, in my sketch it "appears" as an ability to make GUEST TIO REQUEST calls (guest <-> secure FW protocol). > But it also seems to me that VFIO should be able to support putting > the device into the RUN state without involving KVM or cVMs. AMD's TDI bind handler in the PSP wants a guest handle ("GCTX") and a guest device BDFn, and VFIO has no desire to dive into this KVM business beyond IOMMUFD. And then this GUEST TIO REQUEST which is used for 1) enabling secure part of IOMMU (so it relates to IOMMUFD) 2) enabling secure MMIO (which is more VFIO business). We can do all sorts of things but the lifetime of these entangled objects is tricky sometimes. Thanks, >> Intel TDX connect implementation also needs a reference to the kvm >> pointer to obtain the secure EPT information. This is crucial because >> the CPU's page table must be shared with the iommu. > > I thought kvm folks were NAKing this sharing entirely? Or is the > secure EPT in the secure world and not directly managed by Linux? > > AFAIK AMD is going to mirror the iommu page table like today. > > ARM, I suspect, will not have an "EPT" under Linux control, so > whatever happens will be hidden in their secure world. > > Jason
On Mon, Jun 24, 2024 at 03:59:53AM +0800, Xu Yilun wrote: > > But it also seems to me that VFIO should be able to support putting > > the device into the RUN state > > Firstly I think VFIO should support putting device into *LOCKED* state. > From LOCKED to RUN, there are many evidence fetching and attestation > things that only guest cares. I don't think VFIO needs to opt-in. VFIO is not just about running VMs. If someone wants to run DPDK on VFIO they should be able to get the device into a RUN state and work with secure memory without requiring a KVM. Yes there are many steps to this, but we should imagine how it can work. > > without involving KVM or cVMs. > > It may not be feasible for all vendors. It must be. A CC guest with an in kernel driver can definately get the PCI device into RUN, so VFIO running in the guest should be able as well. > I believe AMD would have one firmware call that requires cVM handle > *AND* move device into LOCKED state. It really depends on firmware > implementation. IMHO, you would not use the secure firmware if you are not using VMs. > Yes, the secure EPT is in the secure world and managed by TDX firmware. > Now a SW Mirror Secure EPT is introduced in KVM and managed by KVM > directly, and KVM will finally use firmware calls to propagate Mirror > Secure EPT changes to secure EPT. If the secure world managed it then the secure world can have rules that work with the IOMMU as well.. Jason
On Mon, Jan 20, 2025 at 08:45:51PM +1100, Alexey Kardashevskiy wrote: > > For CC I'm expecting the KVM fd to be the handle for the cVM, so any > > RPCs that want to call into the secure world need the KVM FD to get > > the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI > > information and the cVM's handle. > > And keep KVM fd open until unbind? Or just for the short time to call the > PSP? iommufd will keep the KVM fd alive so long as the vIOMMU object exists. Other uses for kvm require it to work like this. > > But it also seems to me that VFIO should be able to support putting > > the device into the RUN state without involving KVM or cVMs. > > AMD's TDI bind handler in the PSP wants a guest handle ("GCTX") and a guest > device BDFn, and VFIO has no desire to dive into this KVM business beyond > IOMMUFD. As in my other email, VFIO is not restricted to running VMs, useful things should be available to apps like DPDK. There is a use case for using TDISP and getting devices up into an ecrypted/attested state on pure bare metal without any KVM, VFIO should work in that use case too. Jason
On Tue, Jun 25, 2024 at 05:12:10AM +0800, Xu Yilun wrote: > When VFIO works as a TEE user in VM, it means an attester (e.g. PCI > subsystem) has already moved the device to RUN state. So VFIO & DPDK > are all TEE users, no need to manipulate TDISP state between them. > AFAICS, this is the most preferred TIO usage in CoCo-VM. No, unfortunately. Part of the motivation to have the devices be unlocked when the VM starts is because there is an expectation that a driver in the VM will need to do untrusted operations to boot up the device before it can be switched to the run state. So any vfio use case needs to imagine that VFIO starts with an untrusted device, does stuff to it, then pushes everything through to run. The exact mirror as what a kernel driver should be able to do. How exactly all this very complex stuff works, I have no idea, but this is what I've understood is the target. :\ Jason
On Tue, Jan 21, 2025 at 01:43:03PM -0400, Jason Gunthorpe wrote: > On Tue, Jun 25, 2024 at 05:12:10AM +0800, Xu Yilun wrote: > > > When VFIO works as a TEE user in VM, it means an attester (e.g. PCI > > subsystem) has already moved the device to RUN state. So VFIO & DPDK > > are all TEE users, no need to manipulate TDISP state between them. > > AFAICS, this is the most preferred TIO usage in CoCo-VM. > > No, unfortunately. Part of the motivation to have the devices be > unlocked when the VM starts is because there is an expectation that a > driver in the VM will need to do untrusted operations to boot up the I assume these operations are device specific. > device before it can be switched to the run state. > > So any vfio use case needs to imagine that VFIO starts with an > untrusted device, does stuff to it, then pushes everything through to I have concern that VFIO has to do device specific stuff. Our current expectation is a specific device driver deals with the untrusted operations, then user writes a 'bind' device sysfs node which detaches the driver for untrusted, do the attestation and accept, and try match the driver for trusted (e.g. VFIO). Thanks, Yilun > run. The exact mirror as what a kernel driver should be able to do. > > How exactly all this very complex stuff works, I have no idea, but > this is what I've understood is the target. :\ > > Jason
On Wed, Jan 22, 2025 at 12:32:56PM +0800, Xu Yilun wrote: > On Tue, Jan 21, 2025 at 01:43:03PM -0400, Jason Gunthorpe wrote: > > On Tue, Jun 25, 2024 at 05:12:10AM +0800, Xu Yilun wrote: > > > > > When VFIO works as a TEE user in VM, it means an attester (e.g. PCI > > > subsystem) has already moved the device to RUN state. So VFIO & DPDK > > > are all TEE users, no need to manipulate TDISP state between them. > > > AFAICS, this is the most preferred TIO usage in CoCo-VM. > > > > No, unfortunately. Part of the motivation to have the devices be > > unlocked when the VM starts is because there is an expectation that a > > driver in the VM will need to do untrusted operations to boot up the > > I assume these operations are device specific. Yes > > device before it can be switched to the run state. > > > > So any vfio use case needs to imagine that VFIO starts with an > > untrusted device, does stuff to it, then pushes everything through to > > I have concern that VFIO has to do device specific stuff. Our current > expectation is a specific device driver deals with the untrusted > operations, then user writes a 'bind' device sysfs node which detaches > the driver for untrusted, do the attestation and accept, and try match > the driver for trusted (e.g. VFIO). I don't see this as working, VFIO will FLR the device which will destroy anything that was done prior. VFIO itself has to do the sequence and the VFIO userspace has to contain the device specific stuff. The bind/unbind dance for untrusted->trusted would need to be internalized in VFIO without unbinding. The main motivation for the bind/unbind flow was to manage the DMA API, which VFIO does not use. Jason
On Wed, Jan 22, 2025 at 08:55:12AM -0400, Jason Gunthorpe wrote: > On Wed, Jan 22, 2025 at 12:32:56PM +0800, Xu Yilun wrote: > > On Tue, Jan 21, 2025 at 01:43:03PM -0400, Jason Gunthorpe wrote: > > > On Tue, Jun 25, 2024 at 05:12:10AM +0800, Xu Yilun wrote: > > > > > > > When VFIO works as a TEE user in VM, it means an attester (e.g. PCI > > > > subsystem) has already moved the device to RUN state. So VFIO & DPDK > > > > are all TEE users, no need to manipulate TDISP state between them. > > > > AFAICS, this is the most preferred TIO usage in CoCo-VM. > > > > > > No, unfortunately. Part of the motivation to have the devices be > > > unlocked when the VM starts is because there is an expectation that a > > > driver in the VM will need to do untrusted operations to boot up the > > > > I assume these operations are device specific. > > Yes > > > > device before it can be switched to the run state. > > > > > > So any vfio use case needs to imagine that VFIO starts with an > > > untrusted device, does stuff to it, then pushes everything through to > > > > I have concern that VFIO has to do device specific stuff. Our current > > expectation is a specific device driver deals with the untrusted > > operations, then user writes a 'bind' device sysfs node which detaches > > the driver for untrusted, do the attestation and accept, and try match > > the driver for trusted (e.g. VFIO). > > I don't see this as working, VFIO will FLR the device which will > destroy anything that was done prior. > > VFIO itself has to do the sequence and the VFIO userspace has to > contain the device specific stuff. I don't have a complete idea yet. But the goal is not to make any existing driver seamlessly work with secure device. It is to provide a generic way for bind/attestation/accept, and may save driver's effort if they don't care about this startup process. There are plenty of operations that a driver can't do to a secure device, FLR is one of them. The TDISP SPEC has described some general rules but some are even device specific. So I think a driver (including VFIO) expects change to support trusted device, but may not have to cover bind/attestation/accept flow. Thanks, Yilun > > The bind/unbind dance for untrusted->trusted would need to be > internalized in VFIO without unbinding. The main motivation for the > bind/unbind flow was to manage the DMA API, which VFIO does not use. > > Jason
On Thu, Jan 23, 2025 at 03:41:58PM +0800, Xu Yilun wrote: > I don't have a complete idea yet. But the goal is not to make any > existing driver seamlessly work with secure device. It is to provide a > generic way for bind/attestation/accept, and may save driver's effort > if they don't care about this startup process. There are plenty of > operations that a driver can't do to a secure device, FLR is one of > them. The TDISP SPEC has described some general rules but some are even > device specific. You can FLR a secure device, it just has to be re-secured and re-attested after. Otherwise no VFIO for you. > So I think a driver (including VFIO) expects change to support trusted > device, but may not have to cover bind/attestation/accept flow. I expect changes, but not fundamental ones. VFIO will still have to FLR devices as part of it's security architecture. The entire flow needs to have options for drivers to be involved in the flow, somehow. Jason
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..919285c1cd7a 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -75,7 +75,10 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, if (copy_from_user(&bind, arg, minsz)) return -EFAULT; - if (bind.argsz < minsz || bind.flags || bind.iommufd < 0) + if (bind.argsz < minsz || bind.iommufd < 0) + return -EINVAL; + + if (bind.flags & ~(VFIO_DEVICE_BIND_IOMMUFD_PRIVATE)) return -EINVAL; /* BIND_IOMMUFD only allowed for cdev fds */ @@ -118,6 +121,9 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, goto out_close_device; device->cdev_opened = true; + if (bind.flags & VFIO_DEVICE_BIND_IOMMUFD_PRIVATE) + device->is_private = true; + /* * Paired with smp_load_acquire() in vfio_device_fops::ioctl/ * read/write/mmap @@ -151,6 +157,7 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) return; mutex_lock(&device->dev_set->lock); + device->is_private = false; vfio_df_close(df); vfio_device_put_kvm(device); iommufd_ctx_put(df->iommufd); diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index f69eda5956ad..11c735dfe1f7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1005,6 +1005,12 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev, return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; } +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar) +{ + /* Any mmap supported bar can be used as vfio dmabuf */ + return vdev->bar_mmap_supported[bar] && vdev->vdev.is_private; +} + static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, struct vfio_region_info __user *arg) { @@ -1035,6 +1041,11 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, break; } + if (is_vfio_pci_bar_private(vdev, info.index)) { + info.flags = VFIO_REGION_INFO_FLAG_PRIVATE; + break; + } + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; if (vdev->bar_mmap_supported[info.index]) { @@ -1735,6 +1746,9 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma u64 phys_len, req_len, pgoff, req_start; int ret; + if (vdev->vdev.is_private) + return -EINVAL; + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index d27f383f3931..2b61e35145fd 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -126,4 +126,6 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, } #endif +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar); + #endif diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 66b72c289284..e385f7f63414 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -242,6 +242,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, struct resource *res = &vdev->pdev->resource[bar]; ssize_t done; + if (is_vfio_pci_bar_private(vdev, bar)) + return -EINVAL; + if (pci_resource_start(pdev, bar)) end = pci_resource_len(pdev, bar); else if (bar == PCI_ROM_RESOURCE && diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2258b0585330..e99d856c6cd8 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -69,6 +69,7 @@ struct vfio_device { struct iommufd_device *iommufd_device; u8 iommufd_attached:1; #endif + u8 is_private:1; u8 cdev_opened:1; #ifdef CONFIG_DEBUG_FS /* diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index f43dfbde7352..6a1c703e3185 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -275,6 +275,7 @@ struct vfio_region_info { #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ #define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */ +#define VFIO_REGION_INFO_FLAG_PRIVATE (1 << 4) /* Region supports private MMIO */ __u32 index; /* Region index */ __u32 cap_offset; /* Offset within info struct of first cap */ __aligned_u64 size; /* Region size (bytes) */ @@ -904,7 +905,8 @@ struct vfio_device_feature { * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18, * struct vfio_device_bind_iommufd) * @argsz: User filled size of this data. - * @flags: Must be 0. + * @flags: Optional device initialization flags: + * VFIO_DEVICE_BIND_IOMMUFD_PRIVATE: for private assignment * @iommufd: iommufd to bind. * @out_devid: The device id generated by this bind. devid is a handle for * this device/iommufd bond and can be used in IOMMUFD commands. @@ -921,6 +923,7 @@ struct vfio_device_feature { struct vfio_device_bind_iommufd { __u32 argsz; __u32 flags; +#define VFIO_DEVICE_BIND_IOMMUFD_PRIVATE (1 << 0) __s32 iommufd; __u32 out_devid; };
Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as for private assignment. For these private assigned devices, disallow host accessing their MMIO resources. Since the MMIO regions for private assignment are not accessible from host, remove the VFIO_REGION_INFO_FLAG_MMAP/READ/WRITE for these regions, instead add a new VFIO_REGION_INFO_FLAG_PRIVATE flag to indicate users should create dma-buf for MMIO mapping in KVM MMU. Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- drivers/vfio/device_cdev.c | 9 ++++++++- drivers/vfio/pci/vfio_pci_core.c | 14 ++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 ++ drivers/vfio/pci/vfio_pci_rdwr.c | 3 +++ include/linux/vfio.h | 1 + include/uapi/linux/vfio.h | 5 ++++- 6 files changed, 32 insertions(+), 2 deletions(-)