mbox series

[v6,0/9] vfio/mdev: IOMMU aware mediated device

Message ID 20190213040301.23021-1-baolu.lu@linux.intel.com (mailing list archive)
Headers show
Series vfio/mdev: IOMMU aware mediated device | expand

Message

Baolu Lu Feb. 13, 2019, 4:02 a.m. UTC
Hi,

The Mediate Device is a framework for fine-grained physical device
sharing across the isolated domains. Currently the mdev framework
is designed to be independent of the platform IOMMU support. As the
result, the DMA isolation relies on the mdev parent device in a
vendor specific way.

There are several cases where a mediated device could be protected
and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
[1] introduces a new translation mode called 'scalable mode', which
enables PASID-granular translations. The vt-d scalable mode is the
key ingredient for Scalable I/O Virtualization [2] [3] which allows
sharing a device in minimal possible granularity (ADI - Assignable
Device Interface).

A mediated device backed by an ADI could be protected and isolated
by the IOMMU since 1) the parent device supports tagging an unique
PASID to all DMA traffic out of the mediated device; and 2) the DMA
translation unit (IOMMU) supports the PASID granular translation.
We can apply IOMMU protection and isolation to this kind of devices
just as what we are doing with an assignable PCI device.

In order to distinguish the IOMMU-capable mediated devices from those
which still need to rely on parent devices, this patch set adds one
new member in struct mdev_device.

* iommu_device
  - This, if set, indicates that the mediated device could
    be fully isolated and protected by IOMMU via attaching
    an iommu domain to this device. If empty, it indicates
    using vendor defined isolation.

Below helpers are added to set and get above iommu device in mdev core
implementation.

* mdev_set/get_iommu_device(dev, iommu_device)
  - Set or get the iommu device which represents this mdev
    in IOMMU's device scope. Drivers don't need to set the
    iommu device if it uses vendor defined isolation.

The mdev parent device driver could opt-in that the mdev could be
fully isolated and protected by the IOMMU when the mdev is being
created by invoking mdev_set_iommu_device() in its @create().

In the vfio_iommu_type1_attach_group(), a domain allocated through
iommu_domain_alloc() will be attached to the mdev iommu device if
an iommu device has been set. Otherwise, the dummy external domain
will be used and all the DMA isolation and protection are routed to
parent driver as the result.

On IOMMU side, a basic requirement is allowing to attach multiple
domains to a PCI device if the device advertises the capability
and the IOMMU hardware supports finer granularity translations than
the normal PCI Source ID based translation.

As the result, a PCI device could work in two modes: normal mode
and auxiliary mode. In the normal mode, a pci device could be
isolated in the Source ID granularity; the pci device itself could
be assigned to a user application by attaching a single domain
to it. In the auxiliary mode, a pci device could be isolated in
finer granularity, hence subsets of the device could be assigned
to different user level application by attaching a different domain
to each subset.

Below APIs are introduced in iommu generic layer for aux-domain
purpose:

* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
  - Check whether both IOMMU and device support IOMMU aux
    domain feature. Below aux-domain specific interfaces
    are available only after this returns true.

* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
  - Enable/disable device specific aux-domain feature.

* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
  - Check whether the aux domain specific feature enabled or
    not.

* iommu_aux_attach_device(domain, dev)
  - Attaches @domain to @dev in the auxiliary mode. Multiple
    domains could be attached to a single device in the
    auxiliary mode with each domain representing an isolated
    address space for an assignable subset of the device.

* iommu_aux_detach_device(domain, dev)
  - Detach @domain which has been attached to @dev in the
    auxiliary mode.

* iommu_aux_get_pasid(domain, dev)
  - Return ID used for finer-granularity DMA translation.
    For the Intel Scalable IOV usage model, this will be
    a PASID. The device which supports Scalable IOV needs
    to write this ID to the device register so that DMA
    requests could be tagged with a right PASID prefix.

In order for the ease of discussion, sometimes we call "a domain in
auxiliary mode' or simply 'an auxiliary domain' when a domain is
attached to a device for finer granularity translations. But we need
to keep in mind that this doesn't mean there is a differnt domain
type. A same domain could be bound to a device for Source ID based
translation, and bound to another device for finer granularity
translation at the same time.

This patch series extends both IOMMU and vfio components to support
mdev device passing through when it could be isolated and protected
by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
adds the interfaces and implementation of the multiple domains per
device. The second part (PATCH 7/09~9/09) adds the iommu device
attribute to each mdev, determines isolation type according to the
existence of an iommu device when attaching group in vfio type1 iommu
module, and attaches the domain to iommu aware mediated devices.

References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf

Best regards,
Lu Baolu

Change log:
  v5->v6:
  - Add a new API iommu_dev_feature_enabled() to check whether
    an IOMMU specific feature is enabled.
  - Rework the vt-d specific per device feature ops according
    to Joerg's comments. [https://lkml.org/lkml/2019/1/11/302].
  - PATCH 2/9 is added to move intel_iommu_enable_pasid() out
    of the scope of CONFIG_INTEL_IOMMU_SVM without functional
    changes.
  - All patches are rebased on top of vt-d branch of Joerg's
    iommu tree.

  v4->v5:
  - The iommu APIs have been updated with Joerg's proposal posted
    here https://www.spinics.net/lists/iommu/msg31874.html.
  - Some typos in commit message and comments have been fixed.
  - PATCH 3/8 was split from 4/8 to ease code review.
  - mdev->domain was removed and could bring back when there's a
    real consumer.
  - Other code review comments I received during v4 review period
    except the EXPORT_SYMBOL vs. EXPORT_SYMBOL_GPL in PATCH 6/8.
  - Rebase all patches to 5.0-rc1.

  v3->v4:
  - Use aux domain specific interfaces for domain attach and detach.
  - Rebase all patches to 4.20-rc1.

  v2->v3:
  - Remove domain type enum and use a pointer on mdev_device instead.
  - Add a generic interface for getting/setting per device iommu
    attributions. And use it for query aux domain capability, enable
    aux domain and disable aux domain purpose.
  - Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
  - We discussed the impact of the default domain implementation
    on reusing iommu_at(de)tach_device() interfaces. We agreed
    that reusing iommu_at(de)tach_device() interfaces is the right
    direction and we could tweak the code to remove the impact.
    https://www.spinics.net/lists/kvm/msg175285.html  
  - Removed the RFC tag since no objections received.
  - This patch has been submitted separately.
    https://www.spinics.net/lists/kvm/msg173936.html

  v1->v2:
  - Rewrite the patches with the concept of auxiliary domains.

Lu Baolu (9):
  iommu: Add APIs for multiple domains per device
  iommu/vt-d: Move enable pasid out of CONFIG_INTEL_IOMMU_SVM
  iommu/vt-d: Add per-device IOMMU feature ops entries
  iommu/vt-d: Move common code out of iommu_attch_device()
  iommu/vt-d: Aux-domain specific domain attach/detach
  iommu/vt-d: Return ID associated with an auxiliary domain
  vfio/mdev: Add iommu related member in mdev_device
  vfio/type1: Add domain at(de)taching group helpers
  vfio/type1: Handle different mdev isolation type

 drivers/iommu/intel-iommu.c      | 399 ++++++++++++++++++++++++++++---
 drivers/iommu/intel-svm.c        |  19 +-
 drivers/iommu/iommu.c            |  91 +++++++
 drivers/vfio/mdev/mdev_core.c    |  18 ++
 drivers/vfio/mdev/mdev_private.h |   1 +
 drivers/vfio/vfio_iommu_type1.c  | 132 ++++++++--
 include/linux/intel-iommu.h      |  13 +-
 include/linux/iommu.h            |  70 ++++++
 include/linux/mdev.h             |  14 ++
 9 files changed, 704 insertions(+), 53 deletions(-)

Comments

Alex Williamson Feb. 14, 2019, 8:14 p.m. UTC | #1
On Wed, 13 Feb 2019 12:02:52 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> The Mediate Device is a framework for fine-grained physical device
> sharing across the isolated domains. Currently the mdev framework
> is designed to be independent of the platform IOMMU support. As the
> result, the DMA isolation relies on the mdev parent device in a
> vendor specific way.
> 
> There are several cases where a mediated device could be protected
> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
> [1] introduces a new translation mode called 'scalable mode', which
> enables PASID-granular translations. The vt-d scalable mode is the
> key ingredient for Scalable I/O Virtualization [2] [3] which allows
> sharing a device in minimal possible granularity (ADI - Assignable
> Device Interface).
> 
> A mediated device backed by an ADI could be protected and isolated
> by the IOMMU since 1) the parent device supports tagging an unique
> PASID to all DMA traffic out of the mediated device; and 2) the DMA
> translation unit (IOMMU) supports the PASID granular translation.
> We can apply IOMMU protection and isolation to this kind of devices
> just as what we are doing with an assignable PCI device.
> 
> In order to distinguish the IOMMU-capable mediated devices from those
> which still need to rely on parent devices, this patch set adds one
> new member in struct mdev_device.
> 
> * iommu_device
>   - This, if set, indicates that the mediated device could
>     be fully isolated and protected by IOMMU via attaching
>     an iommu domain to this device. If empty, it indicates
>     using vendor defined isolation.
> 
> Below helpers are added to set and get above iommu device in mdev core
> implementation.
> 
> * mdev_set/get_iommu_device(dev, iommu_device)
>   - Set or get the iommu device which represents this mdev
>     in IOMMU's device scope. Drivers don't need to set the
>     iommu device if it uses vendor defined isolation.
> 
> The mdev parent device driver could opt-in that the mdev could be
> fully isolated and protected by the IOMMU when the mdev is being
> created by invoking mdev_set_iommu_device() in its @create().
> 
> In the vfio_iommu_type1_attach_group(), a domain allocated through
> iommu_domain_alloc() will be attached to the mdev iommu device if
> an iommu device has been set. Otherwise, the dummy external domain
> will be used and all the DMA isolation and protection are routed to
> parent driver as the result.
> 
> On IOMMU side, a basic requirement is allowing to attach multiple
> domains to a PCI device if the device advertises the capability
> and the IOMMU hardware supports finer granularity translations than
> the normal PCI Source ID based translation.
> 
> As the result, a PCI device could work in two modes: normal mode
> and auxiliary mode. In the normal mode, a pci device could be
> isolated in the Source ID granularity; the pci device itself could
> be assigned to a user application by attaching a single domain
> to it. In the auxiliary mode, a pci device could be isolated in
> finer granularity, hence subsets of the device could be assigned
> to different user level application by attaching a different domain
> to each subset.
> 
> Below APIs are introduced in iommu generic layer for aux-domain
> purpose:
> 
> * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Check whether both IOMMU and device support IOMMU aux
>     domain feature. Below aux-domain specific interfaces
>     are available only after this returns true.
> 
> * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Enable/disable device specific aux-domain feature.
> 
> * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
>   - Check whether the aux domain specific feature enabled or
>     not.
> 
> * iommu_aux_attach_device(domain, dev)
>   - Attaches @domain to @dev in the auxiliary mode. Multiple
>     domains could be attached to a single device in the
>     auxiliary mode with each domain representing an isolated
>     address space for an assignable subset of the device.
> 
> * iommu_aux_detach_device(domain, dev)
>   - Detach @domain which has been attached to @dev in the
>     auxiliary mode.
> 
> * iommu_aux_get_pasid(domain, dev)
>   - Return ID used for finer-granularity DMA translation.
>     For the Intel Scalable IOV usage model, this will be
>     a PASID. The device which supports Scalable IOV needs
>     to write this ID to the device register so that DMA
>     requests could be tagged with a right PASID prefix.
> 
> In order for the ease of discussion, sometimes we call "a domain in
> auxiliary mode' or simply 'an auxiliary domain' when a domain is
> attached to a device for finer granularity translations. But we need
> to keep in mind that this doesn't mean there is a differnt domain
> type. A same domain could be bound to a device for Source ID based
> translation, and bound to another device for finer granularity
> translation at the same time.
> 
> This patch series extends both IOMMU and vfio components to support
> mdev device passing through when it could be isolated and protected
> by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
> adds the interfaces and implementation of the multiple domains per
> device. The second part (PATCH 7/09~9/09) adds the iommu device
> attribute to each mdev, determines isolation type according to the
> existence of an iommu device when attaching group in vfio type1 iommu
> module, and attaches the domain to iommu aware mediated devices.
> 
> References:
> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> 
> Best regards,
> Lu Baolu
> 
> Change log:
>   v5->v6:

This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do
we go from here?  I think we need an ack from Kirti since they have an
interest here.  Presumably this looks ok to the ARM folks.  Do we have
any consumers of this code yet?  Theoretically I think a vfio-pci-like
meta driver could be written as an mdev vendor driver with this support
(restricted to type1 iommu use cases).  Thanks,

Alex
Jean-Philippe Brucker Feb. 15, 2019, 6:46 p.m. UTC | #2
On 14/02/2019 20:14, Alex Williamson wrote:
>> This patch series extends both IOMMU and vfio components to support
>> mdev device passing through when it could be isolated and protected
>> by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
>> adds the interfaces and implementation of the multiple domains per
>> device. The second part (PATCH 7/09~9/09) adds the iommu device
>> attribute to each mdev, determines isolation type according to the
>> existence of an iommu device when attaching group in vfio type1 iommu
>> module, and attaches the domain to iommu aware mediated devices.
>> 
>> References:
>> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>> 
>> Best regards,
>> Lu Baolu
>> 
>> Change log:
>>   v5->v6:
> 
> This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do
> we go from here?  I think we need an ack from Kirti since they have an
> interest here.  Presumably this looks ok to the ARM folks.

Looks great from my point of view. I focused on patch 1 since I'm
planning to reuse iommu_dev_features for SVA. I don't have time to test
auxd and mdev on SMMUv3 at the moment but I had a better look and, if it
helps, for patches 1 and 7-9:

Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

That said, are you planning to add back the mdev_get_iommu_domain()
function, in a separate patch? Because I think the parent driver still
needs a way to retrieve the PASID for an mdev?

Thanks,
Jean

> Do we have
> any consumers of this code yet?  Theoretically I think a vfio-pci-like
> meta driver could be written as an mdev vendor driver with this support
> (restricted to type1 iommu use cases).  Thanks,
> 
> Alex
Baolu Lu Feb. 19, 2019, 2:46 a.m. UTC | #3
Hi Jean,

On 2/16/19 2:46 AM, Jean-Philippe Brucker wrote:
> On 14/02/2019 20:14, Alex Williamson wrote:
>>> This patch series extends both IOMMU and vfio components to support
>>> mdev device passing through when it could be isolated and protected
>>> by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
>>> adds the interfaces and implementation of the multiple domains per
>>> device. The second part (PATCH 7/09~9/09) adds the iommu device
>>> attribute to each mdev, determines isolation type according to the
>>> existence of an iommu device when attaching group in vfio type1 iommu
>>> module, and attaches the domain to iommu aware mediated devices.
>>>
>>> References:
>>> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>>> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>>
>>> Best regards,
>>> Lu Baolu
>>>
>>> Change log:
>>>     v5->v6:
>>
>> This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do
>> we go from here?  I think we need an ack from Kirti since they have an
>> interest here.  Presumably this looks ok to the ARM folks.
> 
> Looks great from my point of view. I focused on patch 1 since I'm
> planning to reuse iommu_dev_features for SVA. I don't have time to test
> auxd and mdev on SMMUv3 at the moment but I had a better look and, if it
> helps, for patches 1 and 7-9:
> 
> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Thank you! I will add this in the next version.

> 
> That said, are you planning to add back the mdev_get_iommu_domain()
> function, in a separate patch? Because I think the parent driver still
> needs a way to retrieve the PASID for an mdev?

Yes.

As Kirti's suggestion, we removed this since there is currently no
consumer yet. We will bring it back together with the real consumer.

https://lkml.org/lkml/2018/11/16/124

Best regards,
Lu Baolu

> 
> Thanks,
> Jean
> 
>> Do we have
>> any consumers of this code yet?  Theoretically I think a vfio-pci-like
>> meta driver could be written as an mdev vendor driver with this support
>> (restricted to type1 iommu use cases).  Thanks,
>>
>> Alex
> 
>
Baolu Lu Feb. 19, 2019, 2:49 a.m. UTC | #4
Hi Kirti,

On 2/15/19 4:14 AM, Alex Williamson wrote:
> On Wed, 13 Feb 2019 12:02:52 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> Hi,
>>
>> The Mediate Device is a framework for fine-grained physical device
>> sharing across the isolated domains. Currently the mdev framework
>> is designed to be independent of the platform IOMMU support. As the
>> result, the DMA isolation relies on the mdev parent device in a
>> vendor specific way.
>>
>> There are several cases where a mediated device could be protected
>> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
>> [1] introduces a new translation mode called 'scalable mode', which
>> enables PASID-granular translations. The vt-d scalable mode is the
>> key ingredient for Scalable I/O Virtualization [2] [3] which allows
>> sharing a device in minimal possible granularity (ADI - Assignable
>> Device Interface).
>>
>> A mediated device backed by an ADI could be protected and isolated
>> by the IOMMU since 1) the parent device supports tagging an unique
>> PASID to all DMA traffic out of the mediated device; and 2) the DMA
>> translation unit (IOMMU) supports the PASID granular translation.
>> We can apply IOMMU protection and isolation to this kind of devices
>> just as what we are doing with an assignable PCI device.
>>
>> In order to distinguish the IOMMU-capable mediated devices from those
>> which still need to rely on parent devices, this patch set adds one
>> new member in struct mdev_device.
>>
>> * iommu_device
>>    - This, if set, indicates that the mediated device could
>>      be fully isolated and protected by IOMMU via attaching
>>      an iommu domain to this device. If empty, it indicates
>>      using vendor defined isolation.
>>
>> Below helpers are added to set and get above iommu device in mdev core
>> implementation.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>>    - Set or get the iommu device which represents this mdev
>>      in IOMMU's device scope. Drivers don't need to set the
>>      iommu device if it uses vendor defined isolation.
>>
>> The mdev parent device driver could opt-in that the mdev could be
>> fully isolated and protected by the IOMMU when the mdev is being
>> created by invoking mdev_set_iommu_device() in its @create().
>>
>> In the vfio_iommu_type1_attach_group(), a domain allocated through
>> iommu_domain_alloc() will be attached to the mdev iommu device if
>> an iommu device has been set. Otherwise, the dummy external domain
>> will be used and all the DMA isolation and protection are routed to
>> parent driver as the result.
>>
>> On IOMMU side, a basic requirement is allowing to attach multiple
>> domains to a PCI device if the device advertises the capability
>> and the IOMMU hardware supports finer granularity translations than
>> the normal PCI Source ID based translation.
>>
>> As the result, a PCI device could work in two modes: normal mode
>> and auxiliary mode. In the normal mode, a pci device could be
>> isolated in the Source ID granularity; the pci device itself could
>> be assigned to a user application by attaching a single domain
>> to it. In the auxiliary mode, a pci device could be isolated in
>> finer granularity, hence subsets of the device could be assigned
>> to different user level application by attaching a different domain
>> to each subset.
>>
>> Below APIs are introduced in iommu generic layer for aux-domain
>> purpose:
>>
>> * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
>>    - Check whether both IOMMU and device support IOMMU aux
>>      domain feature. Below aux-domain specific interfaces
>>      are available only after this returns true.
>>
>> * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
>>    - Enable/disable device specific aux-domain feature.
>>
>> * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
>>    - Check whether the aux domain specific feature enabled or
>>      not.
>>
>> * iommu_aux_attach_device(domain, dev)
>>    - Attaches @domain to @dev in the auxiliary mode. Multiple
>>      domains could be attached to a single device in the
>>      auxiliary mode with each domain representing an isolated
>>      address space for an assignable subset of the device.
>>
>> * iommu_aux_detach_device(domain, dev)
>>    - Detach @domain which has been attached to @dev in the
>>      auxiliary mode.
>>
>> * iommu_aux_get_pasid(domain, dev)
>>    - Return ID used for finer-granularity DMA translation.
>>      For the Intel Scalable IOV usage model, this will be
>>      a PASID. The device which supports Scalable IOV needs
>>      to write this ID to the device register so that DMA
>>      requests could be tagged with a right PASID prefix.
>>
>> In order for the ease of discussion, sometimes we call "a domain in
>> auxiliary mode' or simply 'an auxiliary domain' when a domain is
>> attached to a device for finer granularity translations. But we need
>> to keep in mind that this doesn't mean there is a differnt domain
>> type. A same domain could be bound to a device for Source ID based
>> translation, and bound to another device for finer granularity
>> translation at the same time.
>>
>> This patch series extends both IOMMU and vfio components to support
>> mdev device passing through when it could be isolated and protected
>> by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
>> adds the interfaces and implementation of the multiple domains per
>> device. The second part (PATCH 7/09~9/09) adds the iommu device
>> attribute to each mdev, determines isolation type according to the
>> existence of an iommu device when attaching group in vfio type1 iommu
>> module, and attaches the domain to iommu aware mediated devices.
>>
>> References:
>> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>
>> Best regards,
>> Lu Baolu
>>
>> Change log:
>>    v5->v6:
> 
> This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do
> we go from here?  I think we need an ack from Kirti since they have an
> interest here.  Presumably this looks ok to the ARM folks.  Do we have

Any comments?

Best regards,
Lu Baolu

> any consumers of this code yet?  Theoretically I think a vfio-pci-like
> meta driver could be written as an mdev vendor driver with this support
> (restricted to type1 iommu use cases).  Thanks,
> 
> Alex
>
Yi Liu Feb. 20, 2019, 5:09 a.m. UTC | #5
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, February 15, 2019 4:15 AM
> To: Lu Baolu <baolu.lu@linux.intel.com>
> Subject: Re: [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device
> 
> On Wed, 13 Feb 2019 12:02:52 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
> > Hi,
> >
> > The Mediate Device is a framework for fine-grained physical device
> > sharing across the isolated domains. Currently the mdev framework is
> > designed to be independent of the platform IOMMU support. As the
> > result, the DMA isolation relies on the mdev parent device in a vendor
> > specific way.
> >
> > There are several cases where a mediated device could be protected and
> > isolated by the platform IOMMU. For example, Intel vt-d rev3.0 [1]
> > introduces a new translation mode called 'scalable mode', which
> > enables PASID-granular translations. The vt-d scalable mode is the key
> > ingredient for Scalable I/O Virtualization [2] [3] which allows
> > sharing a device in minimal possible granularity (ADI - Assignable
> > Device Interface).
> >
> > A mediated device backed by an ADI could be protected and isolated by
> > the IOMMU since 1) the parent device supports tagging an unique PASID
> > to all DMA traffic out of the mediated device; and 2) the DMA
> > translation unit (IOMMU) supports the PASID granular translation.
> > We can apply IOMMU protection and isolation to this kind of devices
> > just as what we are doing with an assignable PCI device.
> >
> > In order to distinguish the IOMMU-capable mediated devices from those
> > which still need to rely on parent devices, this patch set adds one
> > new member in struct mdev_device.
> >
> > * iommu_device
> >   - This, if set, indicates that the mediated device could
> >     be fully isolated and protected by IOMMU via attaching
> >     an iommu domain to this device. If empty, it indicates
> >     using vendor defined isolation.
> >
> > Below helpers are added to set and get above iommu device in mdev core
> > implementation.
> >
> > * mdev_set/get_iommu_device(dev, iommu_device)
> >   - Set or get the iommu device which represents this mdev
> >     in IOMMU's device scope. Drivers don't need to set the
> >     iommu device if it uses vendor defined isolation.
> >
> > The mdev parent device driver could opt-in that the mdev could be
> > fully isolated and protected by the IOMMU when the mdev is being
> > created by invoking mdev_set_iommu_device() in its @create().
> >
> > In the vfio_iommu_type1_attach_group(), a domain allocated through
> > iommu_domain_alloc() will be attached to the mdev iommu device if an
> > iommu device has been set. Otherwise, the dummy external domain will
> > be used and all the DMA isolation and protection are routed to parent
> > driver as the result.
> >
> > On IOMMU side, a basic requirement is allowing to attach multiple
> > domains to a PCI device if the device advertises the capability and
> > the IOMMU hardware supports finer granularity translations than the
> > normal PCI Source ID based translation.
> >
> > As the result, a PCI device could work in two modes: normal mode and
> > auxiliary mode. In the normal mode, a pci device could be isolated in
> > the Source ID granularity; the pci device itself could be assigned to
> > a user application by attaching a single domain to it. In the
> > auxiliary mode, a pci device could be isolated in finer granularity,
> > hence subsets of the device could be assigned to different user level
> > application by attaching a different domain to each subset.
> >
> > Below APIs are introduced in iommu generic layer for aux-domain
> > purpose:
> >
> > * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
> >   - Check whether both IOMMU and device support IOMMU aux
> >     domain feature. Below aux-domain specific interfaces
> >     are available only after this returns true.
> >
> > * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
> >   - Enable/disable device specific aux-domain feature.
> >
> > * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
> >   - Check whether the aux domain specific feature enabled or
> >     not.
> >
> > * iommu_aux_attach_device(domain, dev)
> >   - Attaches @domain to @dev in the auxiliary mode. Multiple
> >     domains could be attached to a single device in the
> >     auxiliary mode with each domain representing an isolated
> >     address space for an assignable subset of the device.
> >
> > * iommu_aux_detach_device(domain, dev)
> >   - Detach @domain which has been attached to @dev in the
> >     auxiliary mode.
> >
> > * iommu_aux_get_pasid(domain, dev)
> >   - Return ID used for finer-granularity DMA translation.
> >     For the Intel Scalable IOV usage model, this will be
> >     a PASID. The device which supports Scalable IOV needs
> >     to write this ID to the device register so that DMA
> >     requests could be tagged with a right PASID prefix.
> >
> > In order for the ease of discussion, sometimes we call "a domain in
> > auxiliary mode' or simply 'an auxiliary domain' when a domain is
> > attached to a device for finer granularity translations. But we need
> > to keep in mind that this doesn't mean there is a differnt domain
> > type. A same domain could be bound to a device for Source ID based
> > translation, and bound to another device for finer granularity
> > translation at the same time.
> >
> > This patch series extends both IOMMU and vfio components to support
> > mdev device passing through when it could be isolated and protected by
> > the IOMMU units. The first part of this series (PATCH 1/09~6/09) adds
> > the interfaces and implementation of the multiple domains per device.
> > The second part (PATCH 7/09~9/09) adds the iommu device attribute to
> > each mdev, determines isolation type according to the existence of an
> > iommu device when attaching group in vfio type1 iommu module, and
> > attaches the domain to iommu aware mediated devices.
> >
> > References:
> > [1]
> > https://software.intel.com/en-us/download/intel-virtualization-technol
> > ogy-for-directed-io-architecture-specification
> > [2]
> > https://software.intel.com/en-us/download/intel-scalable-io-virtualiza
> > tion-technical-specification [3]
> > https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> >
> > Best regards,
> > Lu Baolu
> >
> > Change log:
> >   v5->v6:
> 
> This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do we go from
> here?  I think we need an ack from Kirti since they have an interest here.  Presumably
> this looks ok to the ARM folks.  Do we have any consumers of this code yet?
> Theoretically I think a vfio-pci-like meta driver could be written as an mdev vendor
> driver with this support (restricted to type1 iommu use cases).  Thanks,

Hi Alex,

I'll do some work on the vfio-pci-like meta driver to be a sample consumer of
the vfio mdev extensions in this patchset. Will send it later.

Thanks,
Yi Liu
Baolu Lu Feb. 22, 2019, 12:56 a.m. UTC | #6
Hi,

On 2/15/19 4:14 AM, Alex Williamson wrote:
> On Wed, 13 Feb 2019 12:02:52 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> Hi,
>>
>> The Mediate Device is a framework for fine-grained physical device
>> sharing across the isolated domains. Currently the mdev framework
>> is designed to be independent of the platform IOMMU support. As the
>> result, the DMA isolation relies on the mdev parent device in a
>> vendor specific way.
>>
>> There are several cases where a mediated device could be protected
>> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
>> [1] introduces a new translation mode called 'scalable mode', which
>> enables PASID-granular translations. The vt-d scalable mode is the
>> key ingredient for Scalable I/O Virtualization [2] [3] which allows
>> sharing a device in minimal possible granularity (ADI - Assignable
>> Device Interface).
>>
>> A mediated device backed by an ADI could be protected and isolated
>> by the IOMMU since 1) the parent device supports tagging an unique
>> PASID to all DMA traffic out of the mediated device; and 2) the DMA
>> translation unit (IOMMU) supports the PASID granular translation.
>> We can apply IOMMU protection and isolation to this kind of devices
>> just as what we are doing with an assignable PCI device.
>>
>> In order to distinguish the IOMMU-capable mediated devices from those
>> which still need to rely on parent devices, this patch set adds one
>> new member in struct mdev_device.
>>
>> * iommu_device
>>    - This, if set, indicates that the mediated device could
>>      be fully isolated and protected by IOMMU via attaching
>>      an iommu domain to this device. If empty, it indicates
>>      using vendor defined isolation.
>>
>> Below helpers are added to set and get above iommu device in mdev core
>> implementation.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>>    - Set or get the iommu device which represents this mdev
>>      in IOMMU's device scope. Drivers don't need to set the
>>      iommu device if it uses vendor defined isolation.
>>
>> The mdev parent device driver could opt-in that the mdev could be
>> fully isolated and protected by the IOMMU when the mdev is being
>> created by invoking mdev_set_iommu_device() in its @create().
>>
>> In the vfio_iommu_type1_attach_group(), a domain allocated through
>> iommu_domain_alloc() will be attached to the mdev iommu device if
>> an iommu device has been set. Otherwise, the dummy external domain
>> will be used and all the DMA isolation and protection are routed to
>> parent driver as the result.
>>
>> On IOMMU side, a basic requirement is allowing to attach multiple
>> domains to a PCI device if the device advertises the capability
>> and the IOMMU hardware supports finer granularity translations than
>> the normal PCI Source ID based translation.
>>
>> As the result, a PCI device could work in two modes: normal mode
>> and auxiliary mode. In the normal mode, a pci device could be
>> isolated in the Source ID granularity; the pci device itself could
>> be assigned to a user application by attaching a single domain
>> to it. In the auxiliary mode, a pci device could be isolated in
>> finer granularity, hence subsets of the device could be assigned
>> to different user level application by attaching a different domain
>> to each subset.
>>
>> Below APIs are introduced in iommu generic layer for aux-domain
>> purpose:
>>
>> * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
>>    - Check whether both IOMMU and device support IOMMU aux
>>      domain feature. Below aux-domain specific interfaces
>>      are available only after this returns true.
>>
>> * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
>>    - Enable/disable device specific aux-domain feature.
>>
>> * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
>>    - Check whether the aux domain specific feature enabled or
>>      not.
>>
>> * iommu_aux_attach_device(domain, dev)
>>    - Attaches @domain to @dev in the auxiliary mode. Multiple
>>      domains could be attached to a single device in the
>>      auxiliary mode with each domain representing an isolated
>>      address space for an assignable subset of the device.
>>
>> * iommu_aux_detach_device(domain, dev)
>>    - Detach @domain which has been attached to @dev in the
>>      auxiliary mode.
>>
>> * iommu_aux_get_pasid(domain, dev)
>>    - Return ID used for finer-granularity DMA translation.
>>      For the Intel Scalable IOV usage model, this will be
>>      a PASID. The device which supports Scalable IOV needs
>>      to write this ID to the device register so that DMA
>>      requests could be tagged with a right PASID prefix.
>>
>> In order for the ease of discussion, sometimes we call "a domain in
>> auxiliary mode' or simply 'an auxiliary domain' when a domain is
>> attached to a device for finer granularity translations. But we need
>> to keep in mind that this doesn't mean there is a differnt domain
>> type. A same domain could be bound to a device for Source ID based
>> translation, and bound to another device for finer granularity
>> translation at the same time.
>>
>> This patch series extends both IOMMU and vfio components to support
>> mdev device passing through when it could be isolated and protected
>> by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
>> adds the interfaces and implementation of the multiple domains per
>> device. The second part (PATCH 7/09~9/09) adds the iommu device
>> attribute to each mdev, determines isolation type according to the
>> existence of an iommu device when attaching group in vfio type1 iommu
>> module, and attaches the domain to iommu aware mediated devices.
>>
>> References:
>> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>
>> Best regards,
>> Lu Baolu
>>
>> Change log:
>>    v5->v6:
> 
> This looks pretty reasonable with Jean-Philippe's nit fixups.  Where do
> we go from here?  I think we need an ack from Kirti since they have an
> interest here.  Presumably this looks ok to the ARM folks.  Do we have
> any consumers of this code yet?  Theoretically I think a vfio-pci-like
> meta driver could be written as an mdev vendor driver with this support
> (restricted to type1 iommu use cases).  Thanks,
> 

I saw Jean's device driver API for SVA which is based on the work in
this patch series. Let me update this patch series with Jean's comments
here so that people could always base on the latest code.

Best regards,
Lu Baolu