diff mbox series

[3/3] Documentation: fpga: dfl: Add description for VFIO Mdev support

Message ID 1599549212-24253-4-git-send-email-yilun.xu@intel.com
State New
Headers show
Series add VFIO mdev support for DFL devices | expand

Commit Message

Xu Yilun Sept. 8, 2020, 7:13 a.m. UTC
This patch adds description for VFIO Mdev support for dfl devices on
dfl bus.

Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
---
 Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Alex Williamson Sept. 8, 2020, 9:10 p.m. UTC | #1
On Tue,  8 Sep 2020 15:13:32 +0800
Xu Yilun <yilun.xu@intel.com> wrote:

> This patch adds description for VFIO Mdev support for dfl devices on
> dfl bus.
> 
> Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
> ---
>  Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
> index 0404fe6..f077754 100644
> --- a/Documentation/fpga/dfl.rst
> +++ b/Documentation/fpga/dfl.rst
> @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
>  could be a reference.
>  
>  
> +VFIO Mdev support for DFL devices
> +=================================
> +As we introduced a dfl bus for private features, they could be added to dfl bus
> +as independent dfl devices. There is a requirement to handle these devices
> +either by kernel drivers or by direct access from userspace. Usually we bind
> +the kernel drivers to devices which provide board management functions, and
> +gives user direct access to devices which cooperate closely with user
> +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev
> +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it
> +realizes a group of callbacks and registers to the Mdev framework as a
> +parent (physical) device. It could then create one (available_instances == 1)
> +mdev device.
> +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE
> +device), which provide no DMA isolation for each sub device, this may leads to
> +DMA isolation problem if a private feature is designed to be capable of DMA.
> +The AFU user could potentially access the whole device addressing space and
> +impact the private feature. So now the general HW design rule is, no DMA
> +capability for private features. It eliminates the DMA isolation problem.

What's the advantage of entangling mdev/vfio in this approach versus
simply exposing the MMIO region of the device via sysfs (similar to a
resource file in pci-sysfs)?  This implementation doesn't support
interrupts, it doesn't support multiplexing of a device, it doesn't
perform any degree of mediation, it seems to simply say "please don't
do DMA".  I don't think that's acceptable for an mdev driver.  If you
want to play loose with isolation, do it somewhere else.  Thanks,

Alex
Xu Yilun Sept. 10, 2020, 8:32 a.m. UTC | #2
Hi Alex:

Thanks for your quick response and is helpful to me. I did some more
investigation and some comments inline.

On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote:
> On Tue,  8 Sep 2020 15:13:32 +0800
> Xu Yilun <yilun.xu@intel.com> wrote:
> 
> > This patch adds description for VFIO Mdev support for dfl devices on
> > dfl bus.
> > 
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
> > ---
> >  Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
> > index 0404fe6..f077754 100644
> > --- a/Documentation/fpga/dfl.rst
> > +++ b/Documentation/fpga/dfl.rst
> > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
> >  could be a reference.
> >  
> >  
> > +VFIO Mdev support for DFL devices
> > +=================================
> > +As we introduced a dfl bus for private features, they could be added to dfl bus
> > +as independent dfl devices. There is a requirement to handle these devices
> > +either by kernel drivers or by direct access from userspace. Usually we bind
> > +the kernel drivers to devices which provide board management functions, and
> > +gives user direct access to devices which cooperate closely with user
> > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev
> > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it
> > +realizes a group of callbacks and registers to the Mdev framework as a
> > +parent (physical) device. It could then create one (available_instances == 1)
> > +mdev device.
> > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE
> > +device), which provide no DMA isolation for each sub device, this may leads to
> > +DMA isolation problem if a private feature is designed to be capable of DMA.
> > +The AFU user could potentially access the whole device addressing space and
> > +impact the private feature. So now the general HW design rule is, no DMA
> > +capability for private features. It eliminates the DMA isolation problem.
> 
> What's the advantage of entangling mdev/vfio in this approach versus
> simply exposing the MMIO region of the device via sysfs (similar to a
> resource file in pci-sysfs)?  This implementation doesn't support
> interrupts, it doesn't support multiplexing of a device, it doesn't
> perform any degree of mediation, it seems to simply say "please don't
> do DMA".  I don't think that's acceptable for an mdev driver.  If you
> want to play loose with isolation, do it somewhere else.  Thanks,

The intention of the patchset is to enable the userspace drivers for dfl
devices. The dfl devices are actually different IP blocks integrated in
FPGA to support different board functionalities. They are sub devices of
the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we
want some of the dfl devices handled by the userspace drivers.

Some dfl devices are capable of interrupt. I didn't add interrupt code
in this patch cause now the IRQ capable dfl devices are all handled by
kernel drivers. But as a generic FPGA platform, IRQ handling for userspace
drivers should be supported.

And I can see there are several ways to enable the userspace driver.

1. Some specific sysfs like pci do. But seems it is not the common way for
userspace driver. It does't support interrupt. And potentially users
operate on the same mmio region together with kernel driver at the same
time.

2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you
think it is good we implement an VFIO driver for dfl devices?

3. VFIO mdev. I implemented it because it will not block us from lacking
of valid iommu group. And since the driver didn't perform any mediation,
I should give up.

4. UIO driver. It should work. I'm wondering if option 2 covers the
functionalities of UIO and has more enhancement. So option 2 may be
better?

Thanks again for your time, and I really appreciate you would give some
guide on it.

Yilun.

> 
> Alex
Alex Williamson Sept. 10, 2020, 3:49 p.m. UTC | #3
On Thu, 10 Sep 2020 16:32:30 +0800
Xu Yilun <yilun.xu@intel.com> wrote:

> Hi Alex:
> 
> Thanks for your quick response and is helpful to me. I did some more
> investigation and some comments inline.
> 
> On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote:
> > On Tue,  8 Sep 2020 15:13:32 +0800
> > Xu Yilun <yilun.xu@intel.com> wrote:
> >   
> > > This patch adds description for VFIO Mdev support for dfl devices on
> > > dfl bus.
> > > 
> > > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
> > > ---
> > >  Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++
> > >  1 file changed, 20 insertions(+)
> > > 
> > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
> > > index 0404fe6..f077754 100644
> > > --- a/Documentation/fpga/dfl.rst
> > > +++ b/Documentation/fpga/dfl.rst
> > > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
> > >  could be a reference.
> > >  
> > >  
> > > +VFIO Mdev support for DFL devices
> > > +=================================
> > > +As we introduced a dfl bus for private features, they could be added to dfl bus
> > > +as independent dfl devices. There is a requirement to handle these devices
> > > +either by kernel drivers or by direct access from userspace. Usually we bind
> > > +the kernel drivers to devices which provide board management functions, and
> > > +gives user direct access to devices which cooperate closely with user
> > > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev
> > > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it
> > > +realizes a group of callbacks and registers to the Mdev framework as a
> > > +parent (physical) device. It could then create one (available_instances == 1)
> > > +mdev device.
> > > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE
> > > +device), which provide no DMA isolation for each sub device, this may leads to
> > > +DMA isolation problem if a private feature is designed to be capable of DMA.
> > > +The AFU user could potentially access the whole device addressing space and
> > > +impact the private feature. So now the general HW design rule is, no DMA
> > > +capability for private features. It eliminates the DMA isolation problem.  
> > 
> > What's the advantage of entangling mdev/vfio in this approach versus
> > simply exposing the MMIO region of the device via sysfs (similar to a
> > resource file in pci-sysfs)?  This implementation doesn't support
> > interrupts, it doesn't support multiplexing of a device, it doesn't
> > perform any degree of mediation, it seems to simply say "please don't
> > do DMA".  I don't think that's acceptable for an mdev driver.  If you
> > want to play loose with isolation, do it somewhere else.  Thanks,  
> 
> The intention of the patchset is to enable the userspace drivers for dfl
> devices. The dfl devices are actually different IP blocks integrated in
> FPGA to support different board functionalities. They are sub devices of
> the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we
> want some of the dfl devices handled by the userspace drivers.
> 
> Some dfl devices are capable of interrupt. I didn't add interrupt code
> in this patch cause now the IRQ capable dfl devices are all handled by
> kernel drivers. But as a generic FPGA platform, IRQ handling for userspace
> drivers should be supported.
> 
> And I can see there are several ways to enable the userspace driver.
> 
> 1. Some specific sysfs like pci do. But seems it is not the common way for
> userspace driver. It does't support interrupt. And potentially users
> operate on the same mmio region together with kernel driver at the same
> time.
> 
> 2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you
> think it is good we implement an VFIO driver for dfl devices?
> 
> 3. VFIO mdev. I implemented it because it will not block us from lacking
> of valid iommu group. And since the driver didn't perform any mediation,
> I should give up.
> 
> 4. UIO driver. It should work. I'm wondering if option 2 covers the
> functionalities of UIO and has more enhancement. So option 2 may be
> better?
> 
> Thanks again for your time, and I really appreciate you would give some
> guide on it.


VFIO no-iommu was intended as a transition helper for platforms that do
not support an IOMMU, particularly running within a VM where we use
regular, IOMMU protected devices within the host, but allow no-iommu
within the guest such that the host is still protected from the guest
sandbox.  There should be no new use cases of no-iommu, it's unsafe, it
taints the kernel where it's used (guest in the above intended use
case).  If you intend long term distribution support of a solution,
VFIO no-iommu should not be considered an option.

VFIO mdev requires that the mdev vendor driver mediates access to the
device in order to provide isolation.  In the initial vGPU use cases,
we expect that isolation to be provided via devices specific means, ex.
GPU GART, but we've since included system level components like IOMMU
backing devices and auxiliary domains, the latter to make use of IOMMU
PASID support.

As implemented in this proposal, mdev is being used only to subvert the
IOMMU grouping requirements of VFIO in order to order to expose a
device that is potentially fully capable of DMA to userspace with no
isolation whatsoever.  If not for the IOMMU grouping, this driver could
simply be a VFIO bus driver making use of the vfio-platform interface.
Either way, without isolation, this does not belong in the realm of
VFIO.

Given your architecture, the only potentially valid mdev use case I can
see would be if the mdev vendor driver binds to the PCIe device,
allowing multiplexing of the parent device by carving out fpga
functional blocks from MMIO BAR space, and providing isolation by
enforcing that the parent device never enables bus master, assuming
that would prevent any of the fpga sub-components from performing DMA.

Are there worthwhile use cases of these fpga devices without DMA?

If you need DMA (or the device is potentially capable of DMA and
cannot be audited to prevent it) and cannot provide isolation then
please don't use VFIO or mdev, doing so would violate the notion of
secure userspace device access that we've worked to achieve in this
ecosystem.

If you choose another route, pci-sysfs already provides full BAR access
via the resource files in sysfs, but you could also expose individual
sysfs files with the same capabilities per fpga functional unit to
resolve the conflict between kernel and userspace "ownership".  UIO
might also be a solution.  This proposal to restrict userspace usage to
devices that don't perform DMA is akin to uio_pci_generic, where the
user is not expected to enable bus master, but nothing prevents them
from doing so and as a result it's a gateway for all sorts of
unsupportable drivers.  mdev should not be used to follow that example.
Thanks,

Alex
Xu Yilun Sept. 11, 2020, 6:32 a.m. UTC | #4
On Thu, Sep 10, 2020 at 09:49:03AM -0600, Alex Williamson wrote:
> On Thu, 10 Sep 2020 16:32:30 +0800
> Xu Yilun <yilun.xu@intel.com> wrote:
> 
> > Hi Alex:
> > 
> > Thanks for your quick response and is helpful to me. I did some more
> > investigation and some comments inline.
> > 
> > On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote:
> > > On Tue,  8 Sep 2020 15:13:32 +0800
> > > Xu Yilun <yilun.xu@intel.com> wrote:
> > >   
> > > > This patch adds description for VFIO Mdev support for dfl devices on
> > > > dfl bus.
> > > > 
> > > > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > > > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
> > > > ---
> > > >  Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++
> > > >  1 file changed, 20 insertions(+)
> > > > 
> > > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
> > > > index 0404fe6..f077754 100644
> > > > --- a/Documentation/fpga/dfl.rst
> > > > +++ b/Documentation/fpga/dfl.rst
> > > > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
> > > >  could be a reference.
> > > >  
> > > >  
> > > > +VFIO Mdev support for DFL devices
> > > > +=================================
> > > > +As we introduced a dfl bus for private features, they could be added to dfl bus
> > > > +as independent dfl devices. There is a requirement to handle these devices
> > > > +either by kernel drivers or by direct access from userspace. Usually we bind
> > > > +the kernel drivers to devices which provide board management functions, and
> > > > +gives user direct access to devices which cooperate closely with user
> > > > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev
> > > > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it
> > > > +realizes a group of callbacks and registers to the Mdev framework as a
> > > > +parent (physical) device. It could then create one (available_instances == 1)
> > > > +mdev device.
> > > > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE
> > > > +device), which provide no DMA isolation for each sub device, this may leads to
> > > > +DMA isolation problem if a private feature is designed to be capable of DMA.
> > > > +The AFU user could potentially access the whole device addressing space and
> > > > +impact the private feature. So now the general HW design rule is, no DMA
> > > > +capability for private features. It eliminates the DMA isolation problem.  
> > > 
> > > What's the advantage of entangling mdev/vfio in this approach versus
> > > simply exposing the MMIO region of the device via sysfs (similar to a
> > > resource file in pci-sysfs)?  This implementation doesn't support
> > > interrupts, it doesn't support multiplexing of a device, it doesn't
> > > perform any degree of mediation, it seems to simply say "please don't
> > > do DMA".  I don't think that's acceptable for an mdev driver.  If you
> > > want to play loose with isolation, do it somewhere else.  Thanks,  
> > 
> > The intention of the patchset is to enable the userspace drivers for dfl
> > devices. The dfl devices are actually different IP blocks integrated in
> > FPGA to support different board functionalities. They are sub devices of
> > the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we
> > want some of the dfl devices handled by the userspace drivers.
> > 
> > Some dfl devices are capable of interrupt. I didn't add interrupt code
> > in this patch cause now the IRQ capable dfl devices are all handled by
> > kernel drivers. But as a generic FPGA platform, IRQ handling for userspace
> > drivers should be supported.
> > 
> > And I can see there are several ways to enable the userspace driver.
> > 
> > 1. Some specific sysfs like pci do. But seems it is not the common way for
> > userspace driver. It does't support interrupt. And potentially users
> > operate on the same mmio region together with kernel driver at the same
> > time.
> > 
> > 2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you
> > think it is good we implement an VFIO driver for dfl devices?
> > 
> > 3. VFIO mdev. I implemented it because it will not block us from lacking
> > of valid iommu group. And since the driver didn't perform any mediation,
> > I should give up.
> > 
> > 4. UIO driver. It should work. I'm wondering if option 2 covers the
> > functionalities of UIO and has more enhancement. So option 2 may be
> > better?
> > 
> > Thanks again for your time, and I really appreciate you would give some
> > guide on it.
> 
> 
> VFIO no-iommu was intended as a transition helper for platforms that do
> not support an IOMMU, particularly running within a VM where we use
> regular, IOMMU protected devices within the host, but allow no-iommu
> within the guest such that the host is still protected from the guest
> sandbox.  There should be no new use cases of no-iommu, it's unsafe, it
> taints the kernel where it's used (guest in the above intended use
> case).  If you intend long term distribution support of a solution,
> VFIO no-iommu should not be considered an option.
> 
> VFIO mdev requires that the mdev vendor driver mediates access to the
> device in order to provide isolation.  In the initial vGPU use cases,
> we expect that isolation to be provided via devices specific means, ex.
> GPU GART, but we've since included system level components like IOMMU
> backing devices and auxiliary domains, the latter to make use of IOMMU
> PASID support.
> 
> As implemented in this proposal, mdev is being used only to subvert the
> IOMMU grouping requirements of VFIO in order to order to expose a
> device that is potentially fully capable of DMA to userspace with no
> isolation whatsoever.  If not for the IOMMU grouping, this driver could
> simply be a VFIO bus driver making use of the vfio-platform interface.
> Either way, without isolation, this does not belong in the realm of
> VFIO.
> 
> Given your architecture, the only potentially valid mdev use case I can
> see would be if the mdev vendor driver binds to the PCIe device,
> allowing multiplexing of the parent device by carving out fpga
> functional blocks from MMIO BAR space, and providing isolation by
> enforcing that the parent device never enables bus master, assuming
> that would prevent any of the fpga sub-components from performing DMA.
> 
> Are there worthwhile use cases of these fpga devices without DMA?

The board (Intel PAC N3000) we want to support is a Smart NIC. The FPGA
part is responsible for the various MAC layer offloading. The software
for FPGA usually doesn't touch the network data stream (they are all
handled by FPGA logic), it does some configurations and link status
reading so no DMA is required. These configurations are sometimes very
specific to dynamic RTL logic developed by user, so this is the purpose
we handle them in userspace.

There are some other usercases, which use FPGA to do some dedicated
algorithm like for deep learning. They need to perform memory in and
memory out. For these cases, it seems possible we carving out the
related part as the mdev, leaving management part in parent device
driver.

> 
> If you need DMA (or the device is potentially capable of DMA and
> cannot be audited to prevent it) and cannot provide isolation then
> please don't use VFIO or mdev, doing so would violate the notion of
> secure userspace device access that we've worked to achieve in this
> ecosystem.
> 
> If you choose another route, pci-sysfs already provides full BAR access
> via the resource files in sysfs, but you could also expose individual
> sysfs files with the same capabilities per fpga functional unit to
> resolve the conflict between kernel and userspace "ownership".  UIO
> might also be a solution.  This proposal to restrict userspace usage to
> devices that don't perform DMA is akin to uio_pci_generic, where the
> user is not expected to enable bus master, but nothing prevents them

We are going to exposes these devices not capable of DMA, so seems UIO
is the right way to go.

> from doing so and as a result it's a gateway for all sorts of
> unsupportable drivers.  mdev should not be used to follow that example.



I should thank you again for the detail explanation.

Yilun


> Thanks,
> 
> Alex
diff mbox series

Patch

diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
index 0404fe6..f077754 100644
--- a/Documentation/fpga/dfl.rst
+++ b/Documentation/fpga/dfl.rst
@@ -502,6 +502,26 @@  FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
 could be a reference.
 
 
+VFIO Mdev support for DFL devices
+=================================
+As we introduced a dfl bus for private features, they could be added to dfl bus
+as independent dfl devices. There is a requirement to handle these devices
+either by kernel drivers or by direct access from userspace. Usually we bind
+the kernel drivers to devices which provide board management functions, and
+gives user direct access to devices which cooperate closely with user
+controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev
+implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it
+realizes a group of callbacks and registers to the Mdev framework as a
+parent (physical) device. It could then create one (available_instances == 1)
+mdev device.
+Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE
+device), which provide no DMA isolation for each sub device, this may leads to
+DMA isolation problem if a private feature is designed to be capable of DMA.
+The AFU user could potentially access the whole device addressing space and
+impact the private feature. So now the general HW design rule is, no DMA
+capability for private features. It eliminates the DMA isolation problem.
+
+
 Open discussion
 ===============
 FME driver exports one ioctl (DFL_FPGA_FME_PORT_PR) for partial reconfiguration