Message ID | 1591877734-66527-1-git-send-email-yi.l.liu@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio: expose virtual Shared Virtual Addressing to VMs | expand |
On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > Intel platforms allows address space sharing between device DMA and > applications. SVA can reduce programming complexity and enhance security. > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > guest application address space with passthru devices. This is called > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > changes. For IOMMU and QEMU changes, they are in separate series (listed > in the "Related series"). > > The high-level architecture for SVA virtualization is as below, the key > design of vSVA support is to utilize the dual-stage IOMMU translation ( > also known as IOMMU nesting translation) capability in host IOMMU. > > > .-------------. .---------------------------. > | vIOMMU | | Guest process CR3, FL only| > | | '---------------------------' > .----------------/ > | PASID Entry |--- PASID cache flush - > '-------------' | > | | V > | | CR3 in GPA > '-------------' > Guest > ------| Shadow |--------------------------|-------- > v v v > Host > .-------------. .----------------------. > | pIOMMU | | Bind FL for GVA-GPA | > | | '----------------------' > .----------------/ | > | PASID Entry | V (Nested xlate) > '----------------\.------------------------------. > | | |SL for GPA-HPA, default domain| > | | '------------------------------' > '-------------' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables Hi, Looks like an interesting feature! To check I understand this feature: can applications now pass virtual addresses to devices instead of translating to IOVAs? If yes, can guest applications restrict the vSVA address space so the device only has access to certain regions? On one hand replacing IOVA translation with virtual addresses simplifies the application programming model, but does it give up isolation if the device can now access all application memory? Thanks, Stefan
> From: Stefan Hajnoczi <stefanha@gmail.com> > Sent: Monday, June 15, 2020 6:02 PM > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-------------. .---------------------------. > > | vIOMMU | | Guest process CR3, FL only| > > | | '---------------------------' > > .----------------/ > > | PASID Entry |--- PASID cache flush - > > '-------------' | > > | | V > > | | CR3 in GPA > > '-------------' > > Guest > > ------| Shadow |--------------------------|-------- > > v v v > > Host > > .-------------. .----------------------. > > | pIOMMU | | Bind FL for GVA-GPA | > > | | '----------------------' > > .----------------/ | > > | PASID Entry | V (Nested xlate) > > '----------------\.------------------------------. > > | | |SL for GPA-HPA, default domain| > > | | '------------------------------' > > '-------------' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > Hi, > Looks like an interesting feature! thanks for the interest. Stefan :-) > To check I understand this feature: can applications now pass virtual > addresses to devices instead of translating to IOVAs? yes, application could pass virtual addresses to device directly. As long as the virtual address is mapped in cpu page table, then IOMMU would get it translated to physical address. > If yes, can guest applications restrict the vSVA address space so the > device only has access to certain regions? do you mean restrict the access of certain virtual address regions of guest application ? or certain guest memory? :-) > On one hand replacing IOVA translation with virtual addresses simplifies > the application programming model, but does it give up isolation if the > device can now access all application memory? yeah, you are right, SVA simplifies application programming model. And today, we do allow access all application memory by SVA. this is also another benefit of SVA. e.g. say an accelerator gets a copy of data from a buffer written by cpu. If there is some other data which is directed by a pointer (a virtual address) within the data got from memory, accelerator could do another DMA to fetch it without cpu's involvement. Regards, Yi Liu > Thanks, > Stefan
> From: Stefan Hajnoczi <stefanha@gmail.com> > Sent: Monday, June 15, 2020 6:02 PM > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance > security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-------------. .---------------------------. > > | vIOMMU | | Guest process CR3, FL only| > > | | '---------------------------' > > .----------------/ > > | PASID Entry |--- PASID cache flush - > > '-------------' | > > | | V > > | | CR3 in GPA > > '-------------' > > Guest > > ------| Shadow |--------------------------|-------- > > v v v > > Host > > .-------------. .----------------------. > > | pIOMMU | | Bind FL for GVA-GPA | > > | | '----------------------' > > .----------------/ | > > | PASID Entry | V (Nested xlate) > > '----------------\.------------------------------. > > | | |SL for GPA-HPA, default domain| > > | | '------------------------------' > > '-------------' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > Hi, > Looks like an interesting feature! > > To check I understand this feature: can applications now pass virtual > addresses to devices instead of translating to IOVAs? > > If yes, can guest applications restrict the vSVA address space so the > device only has access to certain regions? > > On one hand replacing IOVA translation with virtual addresses simplifies > the application programming model, but does it give up isolation if the > device can now access all application memory? > with SVA each application is allocated with a unique PASID to tag its virtual address space. The device that claims SVA support must guarantee that one application can only program the device to access its own virtual address space (i.e. all DMAs triggered by this application are tagged with the application's PASID, and are translated by IOMMU's PASID-granular page table). So, isolation is not sacrificed in SVA. Thanks Kevin
On Mon, Jun 15, 2020 at 12:39:40PM +0000, Liu, Yi L wrote: > > From: Stefan Hajnoczi <stefanha@gmail.com> > > Sent: Monday, June 15, 2020 6:02 PM > > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > > Intel platforms allows address space sharing between device DMA and > > > applications. SVA can reduce programming complexity and enhance security. > > > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > > guest application address space with passthru devices. This is called > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > > in the "Related series"). > > > > > > The high-level architecture for SVA virtualization is as below, the key > > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > > > > .-------------. .---------------------------. > > > | vIOMMU | | Guest process CR3, FL only| > > > | | '---------------------------' > > > .----------------/ > > > | PASID Entry |--- PASID cache flush - > > > '-------------' | > > > | | V > > > | | CR3 in GPA > > > '-------------' > > > Guest > > > ------| Shadow |--------------------------|-------- > > > v v v > > > Host > > > .-------------. .----------------------. > > > | pIOMMU | | Bind FL for GVA-GPA | > > > | | '----------------------' > > > .----------------/ | > > > | PASID Entry | V (Nested xlate) > > > '----------------\.------------------------------. > > > | | |SL for GPA-HPA, default domain| > > > | | '------------------------------' > > > '-------------' > > > Where: > > > - FL = First level/stage one page tables > > > - SL = Second level/stage two page tables > > > > Hi, > > Looks like an interesting feature! > > thanks for the interest. Stefan :-) > > > To check I understand this feature: can applications now pass virtual > > addresses to devices instead of translating to IOVAs? > > yes, application could pass virtual addresses to device directly. As > long as the virtual address is mapped in cpu page table, then IOMMU > would get it translated to physical address. > > > If yes, can guest applications restrict the vSVA address space so the > > device only has access to certain regions? > > do you mean restrict the access of certain virtual address regions of > guest application ? or certain guest memory? :-) Your reply below answered my question. I was wondering if applications can protect parts of their virtual memory space that should not be accessed by the device. It makes sense that there is a trade-off to simplify the programming model and performance might also be better if the application doesn't need to DMA map/unmap buffers frequently. Stefan
On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote: > > From: Stefan Hajnoczi <stefanha@gmail.com> > > Sent: Monday, June 15, 2020 6:02 PM > > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > > Intel platforms allows address space sharing between device DMA and > > > applications. SVA can reduce programming complexity and enhance > > security. > > > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > > guest application address space with passthru devices. This is called > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > > in the "Related series"). > > > > > > The high-level architecture for SVA virtualization is as below, the key > > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > > > > .-------------. .---------------------------. > > > | vIOMMU | | Guest process CR3, FL only| > > > | | '---------------------------' > > > .----------------/ > > > | PASID Entry |--- PASID cache flush - > > > '-------------' | > > > | | V > > > | | CR3 in GPA > > > '-------------' > > > Guest > > > ------| Shadow |--------------------------|-------- > > > v v v > > > Host > > > .-------------. .----------------------. > > > | pIOMMU | | Bind FL for GVA-GPA | > > > | | '----------------------' > > > .----------------/ | > > > | PASID Entry | V (Nested xlate) > > > '----------------\.------------------------------. > > > | | |SL for GPA-HPA, default domain| > > > | | '------------------------------' > > > '-------------' > > > Where: > > > - FL = First level/stage one page tables > > > - SL = Second level/stage two page tables > > > > Hi, > > Looks like an interesting feature! > > > > To check I understand this feature: can applications now pass virtual > > addresses to devices instead of translating to IOVAs? > > > > If yes, can guest applications restrict the vSVA address space so the > > device only has access to certain regions? > > > > On one hand replacing IOVA translation with virtual addresses simplifies > > the application programming model, but does it give up isolation if the > > device can now access all application memory? > > > > with SVA each application is allocated with a unique PASID to tag its > virtual address space. The device that claims SVA support must guarantee > that one application can only program the device to access its own virtual > address space (i.e. all DMAs triggered by this application are tagged with > the application's PASID, and are translated by IOMMU's PASID-granular > page table). So, isolation is not sacrificed in SVA. Isolation between applications is preserved but there is no isolation between the device and the application itself. The application needs to trust the device. Examples: 1. The device can snoop secret data from readable pages in the application's virtual memory space. 2. The device can gain arbitrary execution on the CPU by overwriting control flow addresses (e.g. function pointers, stack return addresses) in writable pages. Stefan
On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote: > Isolation between applications is preserved but there is no isolation > between the device and the application itself. The application needs to > trust the device. > > Examples: > > 1. The device can snoop secret data from readable pages in the > application's virtual memory space. > > 2. The device can gain arbitrary execution on the CPU by overwriting > control flow addresses (e.g. function pointers, stack return > addresses) in writable pages. To me, SVA seems to be that "middle layer" of secure where it's not as safe as VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course we pay overhead on buffer setups and on-the-fly translations), however it's far better than DMA with no IOMMU which can ruin the whole host/guest, because after all we do a lot of isolations as process based. IMHO it's the same as when we see a VM (or the QEMU process) as a whole along with the guest code. In some cases we don't care if the guest did some bad things to mess up with its own QEMU process. It is still ideal if we can even stop the guest from doing so, but when it's not easy to do it the ideal way, we just lower the requirement to not spread the influence to the host and other VMs. Thanks,
On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote: > On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote: > > > From: Stefan Hajnoczi <stefanha@gmail.com> > > > Sent: Monday, June 15, 2020 6:02 PM > > > > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > > > Intel platforms allows address space sharing between device DMA and > > > > applications. SVA can reduce programming complexity and enhance > > > security. > > > > > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > > > guest application address space with passthru devices. This is called > > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > > > in the "Related series"). > > > > > > > > The high-level architecture for SVA virtualization is as below, the key > > > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > > > > > > > .-------------. .---------------------------. > > > > | vIOMMU | | Guest process CR3, FL only| > > > > | | '---------------------------' > > > > .----------------/ > > > > | PASID Entry |--- PASID cache flush - > > > > '-------------' | > > > > | | V > > > > | | CR3 in GPA > > > > '-------------' > > > > Guest > > > > ------| Shadow |--------------------------|-------- > > > > v v v > > > > Host > > > > .-------------. .----------------------. > > > > | pIOMMU | | Bind FL for GVA-GPA | > > > > | | '----------------------' > > > > .----------------/ | > > > > | PASID Entry | V (Nested xlate) > > > > '----------------\.------------------------------. > > > > | | |SL for GPA-HPA, default domain| > > > > | | '------------------------------' > > > > '-------------' > > > > Where: > > > > - FL = First level/stage one page tables > > > > - SL = Second level/stage two page tables > > > > > > Hi, > > > Looks like an interesting feature! > > > > > > To check I understand this feature: can applications now pass virtual > > > addresses to devices instead of translating to IOVAs? > > > > > > If yes, can guest applications restrict the vSVA address space so the > > > device only has access to certain regions? > > > > > > On one hand replacing IOVA translation with virtual addresses simplifies > > > the application programming model, but does it give up isolation if the > > > device can now access all application memory? > > > > > > > with SVA each application is allocated with a unique PASID to tag its > > virtual address space. The device that claims SVA support must guarantee > > that one application can only program the device to access its own virtual > > address space (i.e. all DMAs triggered by this application are tagged with > > the application's PASID, and are translated by IOMMU's PASID-granular > > page table). So, isolation is not sacrificed in SVA. > > Isolation between applications is preserved but there is no isolation > between the device and the application itself. The application needs to > trust the device. Right. With all convenience comes security trust. With SVA there is an expectation that the device has the required security boundaries properly implemented. FWIW, what is our guarantee today that VF's are secure from one another or even its own PF? They can also generate transactions with any of its peer id's and there is nothing an IOMMU can do today. Other than rely on ACS. Even BusMaster enable can be ignored and devices (malicious or otherwise) can generate after the BM=0. With SVM you get the benefits of * Not having to register regions * Don't need to pin application space for DMA. > > Examples: > > 1. The device can snoop secret data from readable pages in the > application's virtual memory space. Aren't there other security technologies that can address this? > > 2. The device can gain arbitrary execution on the CPU by overwriting > control flow addresses (e.g. function pointers, stack return > addresses) in writable pages. I suppose technology like CET might be able to guard. The general expectation is code pages and anything that needs to be protected should be mapped nor writable. Cheers, Ashok
On Tue, Jun 16, 2020 at 10:00:16AM -0700, Raj, Ashok wrote: > On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote: > > On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote: > > > > From: Stefan Hajnoczi <stefanha@gmail.com> > > > > Sent: Monday, June 15, 2020 6:02 PM > > > > > > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote: > > > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > > > > Intel platforms allows address space sharing between device DMA and > > > > > applications. SVA can reduce programming complexity and enhance > > > > security. > > > > > > > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > > > > guest application address space with passthru devices. This is called > > > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > > > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > > > > in the "Related series"). > > > > > > > > > > The high-level architecture for SVA virtualization is as below, the key > > > > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > > > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > > > > > > > > > > .-------------. .---------------------------. > > > > > | vIOMMU | | Guest process CR3, FL only| > > > > > | | '---------------------------' > > > > > .----------------/ > > > > > | PASID Entry |--- PASID cache flush - > > > > > '-------------' | > > > > > | | V > > > > > | | CR3 in GPA > > > > > '-------------' > > > > > Guest > > > > > ------| Shadow |--------------------------|-------- > > > > > v v v > > > > > Host > > > > > .-------------. .----------------------. > > > > > | pIOMMU | | Bind FL for GVA-GPA | > > > > > | | '----------------------' > > > > > .----------------/ | > > > > > | PASID Entry | V (Nested xlate) > > > > > '----------------\.------------------------------. > > > > > | | |SL for GPA-HPA, default domain| > > > > > | | '------------------------------' > > > > > '-------------' > > > > > Where: > > > > > - FL = First level/stage one page tables > > > > > - SL = Second level/stage two page tables > > > > > > > > Hi, > > > > Looks like an interesting feature! > > > > > > > > To check I understand this feature: can applications now pass virtual > > > > addresses to devices instead of translating to IOVAs? > > > > > > > > If yes, can guest applications restrict the vSVA address space so the > > > > device only has access to certain regions? > > > > > > > > On one hand replacing IOVA translation with virtual addresses simplifies > > > > the application programming model, but does it give up isolation if the > > > > device can now access all application memory? > > > > > > > > > > with SVA each application is allocated with a unique PASID to tag its > > > virtual address space. The device that claims SVA support must guarantee > > > that one application can only program the device to access its own virtual > > > address space (i.e. all DMAs triggered by this application are tagged with > > > the application's PASID, and are translated by IOMMU's PASID-granular > > > page table). So, isolation is not sacrificed in SVA. > > > > Isolation between applications is preserved but there is no isolation > > between the device and the application itself. The application needs to > > trust the device. > > Right. With all convenience comes security trust. With SVA there is an > expectation that the device has the required security boundaries properly > implemented. FWIW, what is our guarantee today that VF's are secure from > one another or even its own PF? They can also generate transactions with > any of its peer id's and there is nothing an IOMMU can do today. Other than > rely on ACS. Even BusMaster enable can be ignored and devices (malicious > or otherwise) can generate after the BM=0. With SVM you get the benefits of > > * Not having to register regions > * Don't need to pin application space for DMA. As along as the security model is clearly documented users can decide whether or not SVA meets their requirements. I just wanted to clarify what the security model is. > > > > > Examples: > > > > 1. The device can snoop secret data from readable pages in the > > application's virtual memory space. > > Aren't there other security technologies that can address this? Maybe the IOMMU could enforce Memory Protection Keys? Imagine each device is assigned a subset of memory protection keys and the IOMMU checks them on each device access. This would allow the application to mark certain pages off-limits to the device but the IOMMU could still walk the full process page table (no need to construct a special device page table for the IOMMU). > > > > 2. The device can gain arbitrary execution on the CPU by overwriting > > control flow addresses (e.g. function pointers, stack return > > addresses) in writable pages. > > I suppose technology like CET might be able to guard. The general > expectation is code pages and anything that needs to be protected should be > mapped nor writable. Function pointers are a common exception to this. They are often located in writable heap or stack pages. There might also be dynamic linker memory structures that are easy to hijack. Stefan
On Tue, Jun 16, 2020 at 12:09:16PM -0400, Peter Xu wrote: > On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote: > > Isolation between applications is preserved but there is no isolation > > between the device and the application itself. The application needs to > > trust the device. > > > > Examples: > > > > 1. The device can snoop secret data from readable pages in the > > application's virtual memory space. > > > > 2. The device can gain arbitrary execution on the CPU by overwriting > > control flow addresses (e.g. function pointers, stack return > > addresses) in writable pages. > > To me, SVA seems to be that "middle layer" of secure where it's not as safe as > VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course > we pay overhead on buffer setups and on-the-fly translations), however it's far > better than DMA with no IOMMU which can ruin the whole host/guest, because > after all we do a lot of isolations as process based. > > IMHO it's the same as when we see a VM (or the QEMU process) as a whole along > with the guest code. In some cases we don't care if the guest did some bad > things to mess up with its own QEMU process. It is still ideal if we can even > stop the guest from doing so, but when it's not easy to do it the ideal way, we > just lower the requirement to not spread the influence to the host and other > VMs. Makes sense. Stefan