[v2,00/15] vfio: expose virtual Shared Virtual Addressing to VMs

Message ID	1591877734-66527-1-git-send-email-yi.l.liu@intel.com (mailing list archive)
Headers	show Return-Path: <SRS0=3MuQ=7Y=vger.kernel.org=kvm-owner@kernel.org> IronPort-SDR: Qzh73Zpvtv4Q+8OeusGrmhzSWPyzLaF8qoEk+crYkCIic8dYaup/zGynwFoibennyF5QTcifRo AIiQDSj/nzIA== IronPort-SDR: slupFsEoGCu3+aZxDJFA/0zL5b17ZqrOsg0F2camZjWpwi38u1nWMEY+JyM7+b0EFAF+80FeiR uSzRmt7F1iIw== From: Liu Yi L <yi.l.liu@intel.com> To: alex.williamson@redhat.com, eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, hao.wu@intel.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Date: Thu, 11 Jun 2020 05:15:19 -0700 Message-Id: <1591877734-66527-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	vfio: expose virtual Shared Virtual Addressing to VMs \| expand [v2,00/15] vfio: expose virtual Shared Virtual Addressing to VMs [v2,01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() [v2,02/15] iommu: Report domain nesting info [v2,03/15] vfio/type1: Report iommu nesting info to userspace [v2,04/15] vfio: Add PASID allocation/free support [v2,05/15] iommu/vt-d: Support setting ioasid set to domain [v2,06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) [v2,07/15] iommu/uapi: Add iommu_gpasid_unbind_data [v2,08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid() [v2,09/15] iommu/vt-d: Check ownership for PASIDs from user-space [v2,10/15] vfio/type1: Support binding guest page tables to PASID [v2,11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache [v2,12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs [v2,13/15] vfio/pci: Expose PCIe PASID capability to guest [v2,14/15] vfio: Document dual stage control [v2,15/15] iommu/vt-d: Support reporting nesting capability info

Yi Liu June 11, 2020, 12:15 p.m. UTC

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Patch Overview:
 1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
 2. reports IOMMU nesting info to userspace ( patch 0002, 0003 and 0015)
 3. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0006)
 4. vfio support for binding guest page table to host (patch 0007, 0008, 0009, 0010)
 5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
 7. expose PASID capability to VM (patch 0013)
 8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
[PATCH v13 0/8] Nested Shared Virtual Address (SVA) VT-d support - merged
https://lkml.org/lkml/2020/5/13/1582

[PATCH v2 0/3] IOMMU user API enhancement - wip
https://lkml.org/lkml/2020/6/11/5

[PATCH 00/10] IOASID extensions for guest SVA - wip
https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.7-rc4-v2.
 int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data);

Complete set for current vSVA can be found in below branch.
This branch also includes some extra modifications to IOASID core code and
vt-d iommu driver cleanup patches.
https://github.com/luxis1999/linux-vsva.git:vsva-linux-5.7-rc4-v2

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git:vsva_5.7_rc4_qemu_rfcv6


Regards,
Yi Liu

Changelog:
	- Patch v1 -> Patch v2:
	  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
	     Hellwig.
	  b) Re-sequence the patch series for better bisect support.
	  c) Report IOMMU nesting cap info in detail instead of a format in
	     v1.
	  d) Enforce one group per nesting type container for vfio iommu type1
	     driver.
	  e) Build the vfio_mm related code from vfio.c to be a separate
	     vfio_pasid.ko.
	  f) Add PASID ownership check in IOMMU driver.
	  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
	     check. Added iommu_gpasid_unbind_data for unbind requests from
	     userspace.
	  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
	     and cahce_invld.
	  i) Document dual stage control in vfio.rst.
	  Patch v1: https://lore.kernel.org/linux-iommu/1584880325-10561-1-git-send-email-yi.l.liu@intel.com/

	- RFC v3 -> Patch v1:
	  a) Address comments to the PASID request(alloc/free) path
	  b) Report PASID alloc/free availabitiy to user-space
	  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
	  d) Adjusted to latest ioasid code implementation. e.g. remove the
	     code for tracking the allocated PASIDs as latest ioasid code
	     will track it, VFIO could use ioasid_free_set() to free all
	     PASIDs.
	  RFC v3: https://lore.kernel.org/linux-iommu/1580299912-86084-1-git-send-email-yi.l.liu@intel.com/

	- RFC v2 -> v3:
	  a) Refine the whole patchset to fit the roughly parts in this series
	  b) Adds complete vfio PASID management framework. e.g. pasid alloc,
	  free, reclaim in VM crash/down and per-VM PASID quota to prevent
	  PASID abuse.
	  c) Adds IOMMU uAPI version check and page table format check to ensure
	  version compatibility and hardware compatibility.
	  d) Adds vSVA vfio support for IOMMU-backed mdevs.
	  RFC v2: https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/

	- RFC v1 -> v2:
	  Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
	  RFC v1: https://lore.kernel.org/linux-iommu/1562324772-3084-1-git-send-email-yi.l.liu@intel.com/


Eric Auger (1):
  vfio: Document dual stage control

Liu Yi L (13):
  vfio/type1: Refactor vfio_iommu_type1_ioctl()
  iommu: Report domain nesting info
  vfio/type1: Report iommu nesting info to userspace
  vfio: Add PASID allocation/free support
  iommu/vt-d: Support setting ioasid set to domain
  vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  iommu/uapi: Add iommu_gpasid_unbind_data
  iommu/vt-d: Check ownership for PASIDs from user-space
  vfio/type1: Support binding guest page tables to PASID
  vfio/type1: Allow invalidating first-level/stage IOMMU cache
  vfio/type1: Add vSVA support for IOMMU-backed mdevs
  vfio/pci: Expose PCIe PASID capability to guest
  iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
  iommu: Pass domain and unbind_data to sva_unbind_gpasid()

 Documentation/driver-api/vfio.rst  |  64 ++++
 drivers/iommu/intel-iommu.c        | 107 ++++++-
 drivers/iommu/intel-svm.c          |  20 +-
 drivers/iommu/iommu.c              |   4 +-
 drivers/vfio/Kconfig               |   6 +
 drivers/vfio/Makefile              |   1 +
 drivers/vfio/pci/vfio_pci_config.c |   2 +-
 drivers/vfio/vfio_iommu_type1.c    | 614 ++++++++++++++++++++++++++++++++-----
 drivers/vfio/vfio_pasid.c          | 191 ++++++++++++
 include/linux/intel-iommu.h        |  23 +-
 include/linux/iommu.h              |  10 +-
 include/linux/vfio.h               |  54 ++++
 include/uapi/linux/iommu.h         |  47 +++
 include/uapi/linux/vfio.h          |  78 +++++
 14 files changed, 1134 insertions(+), 87 deletions(-)
 create mode 100644 drivers/vfio/vfio_pasid.c

Stefan Hajnoczi June 15, 2020, 10:02 a.m. UTC | #1

On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> Intel platforms allows address space sharing between device DMA and
> applications. SVA can reduce programming complexity and enhance security.
> 
> This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> guest application address space with passthru devices. This is called
> vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> changes. For IOMMU and QEMU changes, they are in separate series (listed
> in the "Related series").
> 
> The high-level architecture for SVA virtualization is as below, the key
> design of vSVA support is to utilize the dual-stage IOMMU translation (
> also known as IOMMU nesting translation) capability in host IOMMU.
> 
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables

Hi,
Looks like an interesting feature!

To check I understand this feature: can applications now pass virtual
addresses to devices instead of translating to IOVAs?

If yes, can guest applications restrict the vSVA address space so the
device only has access to certain regions?

On one hand replacing IOVA translation with virtual addresses simplifies
the application programming model, but does it give up isolation if the
device can now access all application memory?

Thanks,
Stefan

Yi Liu June 15, 2020, 12:39 p.m. UTC | #2

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 15, 2020 6:02 PM
> 
> On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance security.
> >
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> >
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> >
> >
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> 
> Hi,
> Looks like an interesting feature!

thanks for the interest. Stefan :-)

> To check I understand this feature: can applications now pass virtual
> addresses to devices instead of translating to IOVAs?

yes, application could pass virtual addresses to device directly. As
long as the virtual address is mapped in cpu page table, then IOMMU
would get it translated to physical address.

> If yes, can guest applications restrict the vSVA address space so the
> device only has access to certain regions?

do you mean restrict the access of certain virtual address regions of
guest application ? or certain guest memory? :-)

> On one hand replacing IOVA translation with virtual addresses simplifies
> the application programming model, but does it give up isolation if the
> device can now access all application memory?

yeah, you are right, SVA simplifies application programming model. And
today, we do allow access all application memory by SVA. this is also
another benefit of SVA. e.g. say an accelerator gets a copy of data from
a buffer written by cpu. If there is some other data which is directed
by a pointer (a virtual address) within the data got from memory, accelerator
could do another DMA to fetch it without cpu's involvement.

Regards,
Yi Liu

> Thanks,
> Stefan

Tian, Kevin June 16, 2020, 2:26 a.m. UTC | #3

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 15, 2020 6:02 PM
> 
> On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance
> security.
> >
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> >
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> >
> >
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> 
> Hi,
> Looks like an interesting feature!
> 
> To check I understand this feature: can applications now pass virtual
> addresses to devices instead of translating to IOVAs?
> 
> If yes, can guest applications restrict the vSVA address space so the
> device only has access to certain regions?
> 
> On one hand replacing IOVA translation with virtual addresses simplifies
> the application programming model, but does it give up isolation if the
> device can now access all application memory?
> 

with SVA each application is allocated with a unique PASID to tag its
virtual address space. The device that claims SVA support must guarantee 
that one application can only program the device to access its own virtual
address space (i.e. all DMAs triggered by this application are tagged with
the application's PASID, and are translated by IOMMU's PASID-granular
page table). So, isolation is not sacrificed in SVA.

Thanks
Kevin

Stefan Hajnoczi June 16, 2020, 3:34 p.m. UTC | #4

On Mon, Jun 15, 2020 at 12:39:40PM +0000, Liu, Yi L wrote:
> > From: Stefan Hajnoczi <stefanha@gmail.com>
> > Sent: Monday, June 15, 2020 6:02 PM
> > 
> > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > Intel platforms allows address space sharing between device DMA and
> > > applications. SVA can reduce programming complexity and enhance security.
> > >
> > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > guest application address space with passthru devices. This is called
> > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > in the "Related series").
> > >
> > > The high-level architecture for SVA virtualization is as below, the key
> > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > also known as IOMMU nesting translation) capability in host IOMMU.
> > >
> > >
> > >     .-------------.  .---------------------------.
> > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > >     |             |  '---------------------------'
> > >     .----------------/
> > >     | PASID Entry |--- PASID cache flush -
> > >     '-------------'                       |
> > >     |             |                       V
> > >     |             |                CR3 in GPA
> > >     '-------------'
> > > Guest
> > > ------| Shadow |--------------------------|--------
> > >       v        v                          v
> > > Host
> > >     .-------------.  .----------------------.
> > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > >     |             |  '----------------------'
> > >     .----------------/  |
> > >     | PASID Entry |     V (Nested xlate)
> > >     '----------------\.------------------------------.
> > >     |             |   |SL for GPA-HPA, default domain|
> > >     |             |   '------------------------------'
> > >     '-------------'
> > > Where:
> > >  - FL = First level/stage one page tables
> > >  - SL = Second level/stage two page tables
> > 
> > Hi,
> > Looks like an interesting feature!
> 
> thanks for the interest. Stefan :-)
> 
> > To check I understand this feature: can applications now pass virtual
> > addresses to devices instead of translating to IOVAs?
> 
> yes, application could pass virtual addresses to device directly. As
> long as the virtual address is mapped in cpu page table, then IOMMU
> would get it translated to physical address.
> 
> > If yes, can guest applications restrict the vSVA address space so the
> > device only has access to certain regions?
> 
> do you mean restrict the access of certain virtual address regions of
> guest application ? or certain guest memory? :-)

Your reply below answered my question. I was wondering if applications
can protect parts of their virtual memory space that should not be
accessed by the device. It makes sense that there is a trade-off to
simplify the programming model and performance might also be better if
the application doesn't need to DMA map/unmap buffers frequently.

Stefan

Stefan Hajnoczi June 16, 2020, 3:49 p.m. UTC | #5

On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > From: Stefan Hajnoczi <stefanha@gmail.com>
> > Sent: Monday, June 15, 2020 6:02 PM
> > 
> > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > Intel platforms allows address space sharing between device DMA and
> > > applications. SVA can reduce programming complexity and enhance
> > security.
> > >
> > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > guest application address space with passthru devices. This is called
> > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > in the "Related series").
> > >
> > > The high-level architecture for SVA virtualization is as below, the key
> > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > also known as IOMMU nesting translation) capability in host IOMMU.
> > >
> > >
> > >     .-------------.  .---------------------------.
> > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > >     |             |  '---------------------------'
> > >     .----------------/
> > >     | PASID Entry |--- PASID cache flush -
> > >     '-------------'                       |
> > >     |             |                       V
> > >     |             |                CR3 in GPA
> > >     '-------------'
> > > Guest
> > > ------| Shadow |--------------------------|--------
> > >       v        v                          v
> > > Host
> > >     .-------------.  .----------------------.
> > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > >     |             |  '----------------------'
> > >     .----------------/  |
> > >     | PASID Entry |     V (Nested xlate)
> > >     '----------------\.------------------------------.
> > >     |             |   |SL for GPA-HPA, default domain|
> > >     |             |   '------------------------------'
> > >     '-------------'
> > > Where:
> > >  - FL = First level/stage one page tables
> > >  - SL = Second level/stage two page tables
> > 
> > Hi,
> > Looks like an interesting feature!
> > 
> > To check I understand this feature: can applications now pass virtual
> > addresses to devices instead of translating to IOVAs?
> > 
> > If yes, can guest applications restrict the vSVA address space so the
> > device only has access to certain regions?
> > 
> > On one hand replacing IOVA translation with virtual addresses simplifies
> > the application programming model, but does it give up isolation if the
> > device can now access all application memory?
> > 
> 
> with SVA each application is allocated with a unique PASID to tag its
> virtual address space. The device that claims SVA support must guarantee 
> that one application can only program the device to access its own virtual
> address space (i.e. all DMAs triggered by this application are tagged with
> the application's PASID, and are translated by IOMMU's PASID-granular
> page table). So, isolation is not sacrificed in SVA.

Isolation between applications is preserved but there is no isolation
between the device and the application itself. The application needs to
trust the device.

Examples:

1. The device can snoop secret data from readable pages in the
   application's virtual memory space.

2. The device can gain arbitrary execution on the CPU by overwriting
   control flow addresses (e.g. function pointers, stack return
   addresses) in writable pages.

Stefan

Peter Xu June 16, 2020, 4:09 p.m. UTC | #6

On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> Isolation between applications is preserved but there is no isolation
> between the device and the application itself. The application needs to
> trust the device.
> 
> Examples:
> 
> 1. The device can snoop secret data from readable pages in the
>    application's virtual memory space.
> 
> 2. The device can gain arbitrary execution on the CPU by overwriting
>    control flow addresses (e.g. function pointers, stack return
>    addresses) in writable pages.

To me, SVA seems to be that "middle layer" of secure where it's not as safe as
VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course
we pay overhead on buffer setups and on-the-fly translations), however it's far
better than DMA with no IOMMU which can ruin the whole host/guest, because
after all we do a lot of isolations as process based.

IMHO it's the same as when we see a VM (or the QEMU process) as a whole along
with the guest code.  In some cases we don't care if the guest did some bad
things to mess up with its own QEMU process.  It is still ideal if we can even
stop the guest from doing so, but when it's not easy to do it the ideal way, we
just lower the requirement to not spread the influence to the host and other
VMs.

Thanks,

Ashok Raj June 16, 2020, 5 p.m. UTC | #7

On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > > From: Stefan Hajnoczi <stefanha@gmail.com>
> > > Sent: Monday, June 15, 2020 6:02 PM
> > > 
> > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > > Intel platforms allows address space sharing between device DMA and
> > > > applications. SVA can reduce programming complexity and enhance
> > > security.
> > > >
> > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > > guest application address space with passthru devices. This is called
> > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > > in the "Related series").
> > > >
> > > > The high-level architecture for SVA virtualization is as below, the key
> > > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > > also known as IOMMU nesting translation) capability in host IOMMU.
> > > >
> > > >
> > > >     .-------------.  .---------------------------.
> > > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > > >     |             |  '---------------------------'
> > > >     .----------------/
> > > >     | PASID Entry |--- PASID cache flush -
> > > >     '-------------'                       |
> > > >     |             |                       V
> > > >     |             |                CR3 in GPA
> > > >     '-------------'
> > > > Guest
> > > > ------| Shadow |--------------------------|--------
> > > >       v        v                          v
> > > > Host
> > > >     .-------------.  .----------------------.
> > > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > > >     |             |  '----------------------'
> > > >     .----------------/  |
> > > >     | PASID Entry |     V (Nested xlate)
> > > >     '----------------\.------------------------------.
> > > >     |             |   |SL for GPA-HPA, default domain|
> > > >     |             |   '------------------------------'
> > > >     '-------------'
> > > > Where:
> > > >  - FL = First level/stage one page tables
> > > >  - SL = Second level/stage two page tables
> > > 
> > > Hi,
> > > Looks like an interesting feature!
> > > 
> > > To check I understand this feature: can applications now pass virtual
> > > addresses to devices instead of translating to IOVAs?
> > > 
> > > If yes, can guest applications restrict the vSVA address space so the
> > > device only has access to certain regions?
> > > 
> > > On one hand replacing IOVA translation with virtual addresses simplifies
> > > the application programming model, but does it give up isolation if the
> > > device can now access all application memory?
> > > 
> > 
> > with SVA each application is allocated with a unique PASID to tag its
> > virtual address space. The device that claims SVA support must guarantee 
> > that one application can only program the device to access its own virtual
> > address space (i.e. all DMAs triggered by this application are tagged with
> > the application's PASID, and are translated by IOMMU's PASID-granular
> > page table). So, isolation is not sacrificed in SVA.
> 
> Isolation between applications is preserved but there is no isolation
> between the device and the application itself. The application needs to
> trust the device.

Right. With all convenience comes security trust. With SVA there is an
expectation that the device has the required security boundaries properly
implemented. FWIW, what is our guarantee today that VF's are secure from
one another or even its own PF? They can also generate transactions with
any of its peer id's and there is nothing an IOMMU can do today. Other than
rely on ACS. Even BusMaster enable can be ignored and devices (malicious
or otherwise) can generate after the BM=0. With SVM you get the benefits of

* Not having to register regions
* Don't need to pin application space for DMA.

> 
> Examples:
> 
> 1. The device can snoop secret data from readable pages in the
>    application's virtual memory space.

Aren't there other security technologies that can address this?

> 
> 2. The device can gain arbitrary execution on the CPU by overwriting
>    control flow addresses (e.g. function pointers, stack return
>    addresses) in writable pages.

I suppose technology like CET might be able to guard. The general
expectation is code pages and anything that needs to be protected should be
mapped nor writable.

Cheers,
Ashok

Stefan Hajnoczi June 22, 2020, 12:49 p.m. UTC | #8

On Tue, Jun 16, 2020 at 10:00:16AM -0700, Raj, Ashok wrote:
> On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > > > From: Stefan Hajnoczi <stefanha@gmail.com>
> > > > Sent: Monday, June 15, 2020 6:02 PM
> > > > 
> > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > > > Intel platforms allows address space sharing between device DMA and
> > > > > applications. SVA can reduce programming complexity and enhance
> > > > security.
> > > > >
> > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > > > guest application address space with passthru devices. This is called
> > > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > > > in the "Related series").
> > > > >
> > > > > The high-level architecture for SVA virtualization is as below, the key
> > > > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > > > also known as IOMMU nesting translation) capability in host IOMMU.
> > > > >
> > > > >
> > > > >     .-------------.  .---------------------------.
> > > > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > > > >     |             |  '---------------------------'
> > > > >     .----------------/
> > > > >     | PASID Entry |--- PASID cache flush -
> > > > >     '-------------'                       |
> > > > >     |             |                       V
> > > > >     |             |                CR3 in GPA
> > > > >     '-------------'
> > > > > Guest
> > > > > ------| Shadow |--------------------------|--------
> > > > >       v        v                          v
> > > > > Host
> > > > >     .-------------.  .----------------------.
> > > > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > > > >     |             |  '----------------------'
> > > > >     .----------------/  |
> > > > >     | PASID Entry |     V (Nested xlate)
> > > > >     '----------------\.------------------------------.
> > > > >     |             |   |SL for GPA-HPA, default domain|
> > > > >     |             |   '------------------------------'
> > > > >     '-------------'
> > > > > Where:
> > > > >  - FL = First level/stage one page tables
> > > > >  - SL = Second level/stage two page tables
> > > > 
> > > > Hi,
> > > > Looks like an interesting feature!
> > > > 
> > > > To check I understand this feature: can applications now pass virtual
> > > > addresses to devices instead of translating to IOVAs?
> > > > 
> > > > If yes, can guest applications restrict the vSVA address space so the
> > > > device only has access to certain regions?
> > > > 
> > > > On one hand replacing IOVA translation with virtual addresses simplifies
> > > > the application programming model, but does it give up isolation if the
> > > > device can now access all application memory?
> > > > 
> > > 
> > > with SVA each application is allocated with a unique PASID to tag its
> > > virtual address space. The device that claims SVA support must guarantee 
> > > that one application can only program the device to access its own virtual
> > > address space (i.e. all DMAs triggered by this application are tagged with
> > > the application's PASID, and are translated by IOMMU's PASID-granular
> > > page table). So, isolation is not sacrificed in SVA.
> > 
> > Isolation between applications is preserved but there is no isolation
> > between the device and the application itself. The application needs to
> > trust the device.
> 
> Right. With all convenience comes security trust. With SVA there is an
> expectation that the device has the required security boundaries properly
> implemented. FWIW, what is our guarantee today that VF's are secure from
> one another or even its own PF? They can also generate transactions with
> any of its peer id's and there is nothing an IOMMU can do today. Other than
> rely on ACS. Even BusMaster enable can be ignored and devices (malicious
> or otherwise) can generate after the BM=0. With SVM you get the benefits of
> 
> * Not having to register regions
> * Don't need to pin application space for DMA.

As along as the security model is clearly documented users can decide
whether or not SVA meets their requirements. I just wanted to clarify
what the security model is.

> 
> > 
> > Examples:
> > 
> > 1. The device can snoop secret data from readable pages in the
> >    application's virtual memory space.
> 
> Aren't there other security technologies that can address this?

Maybe the IOMMU could enforce Memory Protection Keys? Imagine each
device is assigned a subset of memory protection keys and the IOMMU
checks them on each device access. This would allow the application to
mark certain pages off-limits to the device but the IOMMU could still
walk the full process page table (no need to construct a special device
page table for the IOMMU).

> > 
> > 2. The device can gain arbitrary execution on the CPU by overwriting
> >    control flow addresses (e.g. function pointers, stack return
> >    addresses) in writable pages.
> 
> I suppose technology like CET might be able to guard. The general
> expectation is code pages and anything that needs to be protected should be
> mapped nor writable.

Function pointers are a common exception to this. They are often located
in writable heap or stack pages.

There might also be dynamic linker memory structures that are easy to
hijack.

Stefan

Stefan Hajnoczi June 22, 2020, 12:49 p.m. UTC | #9

On Tue, Jun 16, 2020 at 12:09:16PM -0400, Peter Xu wrote:
> On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> > Isolation between applications is preserved but there is no isolation
> > between the device and the application itself. The application needs to
> > trust the device.
> > 
> > Examples:
> > 
> > 1. The device can snoop secret data from readable pages in the
> >    application's virtual memory space.
> > 
> > 2. The device can gain arbitrary execution on the CPU by overwriting
> >    control flow addresses (e.g. function pointers, stack return
> >    addresses) in writable pages.
> 
> To me, SVA seems to be that "middle layer" of secure where it's not as safe as
> VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course
> we pay overhead on buffer setups and on-the-fly translations), however it's far
> better than DMA with no IOMMU which can ruin the whole host/guest, because
> after all we do a lot of isolations as process based.
> 
> IMHO it's the same as when we see a VM (or the QEMU process) as a whole along
> with the guest code.  In some cases we don't care if the guest did some bad
> things to mess up with its own QEMU process.  It is still ideal if we can even
> stop the guest from doing so, but when it's not easy to do it the ideal way, we
> just lower the requirement to not spread the influence to the host and other
> VMs.

Makes sense.

Stefan

[v2,00/15] vfio: expose virtual Shared Virtual Addressing to VMs

Message

Comments