Message ID | 1599549212-24253-4-git-send-email-yilun.xu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add VFIO mdev support for DFL devices | expand |
On Tue, 8 Sep 2020 15:13:32 +0800 Xu Yilun <yilun.xu@intel.com> wrote: > This patch adds description for VFIO Mdev support for dfl devices on > dfl bus. > > Signed-off-by: Xu Yilun <yilun.xu@intel.com> > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> > --- > Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst > index 0404fe6..f077754 100644 > --- a/Documentation/fpga/dfl.rst > +++ b/Documentation/fpga/dfl.rst > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c) > could be a reference. > > > +VFIO Mdev support for DFL devices > +================================= > +As we introduced a dfl bus for private features, they could be added to dfl bus > +as independent dfl devices. There is a requirement to handle these devices > +either by kernel drivers or by direct access from userspace. Usually we bind > +the kernel drivers to devices which provide board management functions, and > +gives user direct access to devices which cooperate closely with user > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it > +realizes a group of callbacks and registers to the Mdev framework as a > +parent (physical) device. It could then create one (available_instances == 1) > +mdev device. > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE > +device), which provide no DMA isolation for each sub device, this may leads to > +DMA isolation problem if a private feature is designed to be capable of DMA. > +The AFU user could potentially access the whole device addressing space and > +impact the private feature. So now the general HW design rule is, no DMA > +capability for private features. It eliminates the DMA isolation problem. What's the advantage of entangling mdev/vfio in this approach versus simply exposing the MMIO region of the device via sysfs (similar to a resource file in pci-sysfs)? This implementation doesn't support interrupts, it doesn't support multiplexing of a device, it doesn't perform any degree of mediation, it seems to simply say "please don't do DMA". I don't think that's acceptable for an mdev driver. If you want to play loose with isolation, do it somewhere else. Thanks, Alex
Hi Alex: Thanks for your quick response and is helpful to me. I did some more investigation and some comments inline. On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote: > On Tue, 8 Sep 2020 15:13:32 +0800 > Xu Yilun <yilun.xu@intel.com> wrote: > > > This patch adds description for VFIO Mdev support for dfl devices on > > dfl bus. > > > > Signed-off-by: Xu Yilun <yilun.xu@intel.com> > > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> > > --- > > Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst > > index 0404fe6..f077754 100644 > > --- a/Documentation/fpga/dfl.rst > > +++ b/Documentation/fpga/dfl.rst > > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c) > > could be a reference. > > > > > > +VFIO Mdev support for DFL devices > > +================================= > > +As we introduced a dfl bus for private features, they could be added to dfl bus > > +as independent dfl devices. There is a requirement to handle these devices > > +either by kernel drivers or by direct access from userspace. Usually we bind > > +the kernel drivers to devices which provide board management functions, and > > +gives user direct access to devices which cooperate closely with user > > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev > > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it > > +realizes a group of callbacks and registers to the Mdev framework as a > > +parent (physical) device. It could then create one (available_instances == 1) > > +mdev device. > > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE > > +device), which provide no DMA isolation for each sub device, this may leads to > > +DMA isolation problem if a private feature is designed to be capable of DMA. > > +The AFU user could potentially access the whole device addressing space and > > +impact the private feature. So now the general HW design rule is, no DMA > > +capability for private features. It eliminates the DMA isolation problem. > > What's the advantage of entangling mdev/vfio in this approach versus > simply exposing the MMIO region of the device via sysfs (similar to a > resource file in pci-sysfs)? This implementation doesn't support > interrupts, it doesn't support multiplexing of a device, it doesn't > perform any degree of mediation, it seems to simply say "please don't > do DMA". I don't think that's acceptable for an mdev driver. If you > want to play loose with isolation, do it somewhere else. Thanks, The intention of the patchset is to enable the userspace drivers for dfl devices. The dfl devices are actually different IP blocks integrated in FPGA to support different board functionalities. They are sub devices of the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we want some of the dfl devices handled by the userspace drivers. Some dfl devices are capable of interrupt. I didn't add interrupt code in this patch cause now the IRQ capable dfl devices are all handled by kernel drivers. But as a generic FPGA platform, IRQ handling for userspace drivers should be supported. And I can see there are several ways to enable the userspace driver. 1. Some specific sysfs like pci do. But seems it is not the common way for userspace driver. It does't support interrupt. And potentially users operate on the same mmio region together with kernel driver at the same time. 2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you think it is good we implement an VFIO driver for dfl devices? 3. VFIO mdev. I implemented it because it will not block us from lacking of valid iommu group. And since the driver didn't perform any mediation, I should give up. 4. UIO driver. It should work. I'm wondering if option 2 covers the functionalities of UIO and has more enhancement. So option 2 may be better? Thanks again for your time, and I really appreciate you would give some guide on it. Yilun. > > Alex
On Thu, 10 Sep 2020 16:32:30 +0800 Xu Yilun <yilun.xu@intel.com> wrote: > Hi Alex: > > Thanks for your quick response and is helpful to me. I did some more > investigation and some comments inline. > > On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote: > > On Tue, 8 Sep 2020 15:13:32 +0800 > > Xu Yilun <yilun.xu@intel.com> wrote: > > > > > This patch adds description for VFIO Mdev support for dfl devices on > > > dfl bus. > > > > > > Signed-off-by: Xu Yilun <yilun.xu@intel.com> > > > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> > > > --- > > > Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++ > > > 1 file changed, 20 insertions(+) > > > > > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst > > > index 0404fe6..f077754 100644 > > > --- a/Documentation/fpga/dfl.rst > > > +++ b/Documentation/fpga/dfl.rst > > > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c) > > > could be a reference. > > > > > > > > > +VFIO Mdev support for DFL devices > > > +================================= > > > +As we introduced a dfl bus for private features, they could be added to dfl bus > > > +as independent dfl devices. There is a requirement to handle these devices > > > +either by kernel drivers or by direct access from userspace. Usually we bind > > > +the kernel drivers to devices which provide board management functions, and > > > +gives user direct access to devices which cooperate closely with user > > > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev > > > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it > > > +realizes a group of callbacks and registers to the Mdev framework as a > > > +parent (physical) device. It could then create one (available_instances == 1) > > > +mdev device. > > > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE > > > +device), which provide no DMA isolation for each sub device, this may leads to > > > +DMA isolation problem if a private feature is designed to be capable of DMA. > > > +The AFU user could potentially access the whole device addressing space and > > > +impact the private feature. So now the general HW design rule is, no DMA > > > +capability for private features. It eliminates the DMA isolation problem. > > > > What's the advantage of entangling mdev/vfio in this approach versus > > simply exposing the MMIO region of the device via sysfs (similar to a > > resource file in pci-sysfs)? This implementation doesn't support > > interrupts, it doesn't support multiplexing of a device, it doesn't > > perform any degree of mediation, it seems to simply say "please don't > > do DMA". I don't think that's acceptable for an mdev driver. If you > > want to play loose with isolation, do it somewhere else. Thanks, > > The intention of the patchset is to enable the userspace drivers for dfl > devices. The dfl devices are actually different IP blocks integrated in > FPGA to support different board functionalities. They are sub devices of > the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we > want some of the dfl devices handled by the userspace drivers. > > Some dfl devices are capable of interrupt. I didn't add interrupt code > in this patch cause now the IRQ capable dfl devices are all handled by > kernel drivers. But as a generic FPGA platform, IRQ handling for userspace > drivers should be supported. > > And I can see there are several ways to enable the userspace driver. > > 1. Some specific sysfs like pci do. But seems it is not the common way for > userspace driver. It does't support interrupt. And potentially users > operate on the same mmio region together with kernel driver at the same > time. > > 2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you > think it is good we implement an VFIO driver for dfl devices? > > 3. VFIO mdev. I implemented it because it will not block us from lacking > of valid iommu group. And since the driver didn't perform any mediation, > I should give up. > > 4. UIO driver. It should work. I'm wondering if option 2 covers the > functionalities of UIO and has more enhancement. So option 2 may be > better? > > Thanks again for your time, and I really appreciate you would give some > guide on it. VFIO no-iommu was intended as a transition helper for platforms that do not support an IOMMU, particularly running within a VM where we use regular, IOMMU protected devices within the host, but allow no-iommu within the guest such that the host is still protected from the guest sandbox. There should be no new use cases of no-iommu, it's unsafe, it taints the kernel where it's used (guest in the above intended use case). If you intend long term distribution support of a solution, VFIO no-iommu should not be considered an option. VFIO mdev requires that the mdev vendor driver mediates access to the device in order to provide isolation. In the initial vGPU use cases, we expect that isolation to be provided via devices specific means, ex. GPU GART, but we've since included system level components like IOMMU backing devices and auxiliary domains, the latter to make use of IOMMU PASID support. As implemented in this proposal, mdev is being used only to subvert the IOMMU grouping requirements of VFIO in order to order to expose a device that is potentially fully capable of DMA to userspace with no isolation whatsoever. If not for the IOMMU grouping, this driver could simply be a VFIO bus driver making use of the vfio-platform interface. Either way, without isolation, this does not belong in the realm of VFIO. Given your architecture, the only potentially valid mdev use case I can see would be if the mdev vendor driver binds to the PCIe device, allowing multiplexing of the parent device by carving out fpga functional blocks from MMIO BAR space, and providing isolation by enforcing that the parent device never enables bus master, assuming that would prevent any of the fpga sub-components from performing DMA. Are there worthwhile use cases of these fpga devices without DMA? If you need DMA (or the device is potentially capable of DMA and cannot be audited to prevent it) and cannot provide isolation then please don't use VFIO or mdev, doing so would violate the notion of secure userspace device access that we've worked to achieve in this ecosystem. If you choose another route, pci-sysfs already provides full BAR access via the resource files in sysfs, but you could also expose individual sysfs files with the same capabilities per fpga functional unit to resolve the conflict between kernel and userspace "ownership". UIO might also be a solution. This proposal to restrict userspace usage to devices that don't perform DMA is akin to uio_pci_generic, where the user is not expected to enable bus master, but nothing prevents them from doing so and as a result it's a gateway for all sorts of unsupportable drivers. mdev should not be used to follow that example. Thanks, Alex
On Thu, Sep 10, 2020 at 09:49:03AM -0600, Alex Williamson wrote: > On Thu, 10 Sep 2020 16:32:30 +0800 > Xu Yilun <yilun.xu@intel.com> wrote: > > > Hi Alex: > > > > Thanks for your quick response and is helpful to me. I did some more > > investigation and some comments inline. > > > > On Tue, Sep 08, 2020 at 03:10:02PM -0600, Alex Williamson wrote: > > > On Tue, 8 Sep 2020 15:13:32 +0800 > > > Xu Yilun <yilun.xu@intel.com> wrote: > > > > > > > This patch adds description for VFIO Mdev support for dfl devices on > > > > dfl bus. > > > > > > > > Signed-off-by: Xu Yilun <yilun.xu@intel.com> > > > > Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> > > > > --- > > > > Documentation/fpga/dfl.rst | 20 ++++++++++++++++++++ > > > > 1 file changed, 20 insertions(+) > > > > > > > > diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst > > > > index 0404fe6..f077754 100644 > > > > --- a/Documentation/fpga/dfl.rst > > > > +++ b/Documentation/fpga/dfl.rst > > > > @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c) > > > > could be a reference. > > > > > > > > > > > > +VFIO Mdev support for DFL devices > > > > +================================= > > > > +As we introduced a dfl bus for private features, they could be added to dfl bus > > > > +as independent dfl devices. There is a requirement to handle these devices > > > > +either by kernel drivers or by direct access from userspace. Usually we bind > > > > +the kernel drivers to devices which provide board management functions, and > > > > +gives user direct access to devices which cooperate closely with user > > > > +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev > > > > +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it > > > > +realizes a group of callbacks and registers to the Mdev framework as a > > > > +parent (physical) device. It could then create one (available_instances == 1) > > > > +mdev device. > > > > +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE > > > > +device), which provide no DMA isolation for each sub device, this may leads to > > > > +DMA isolation problem if a private feature is designed to be capable of DMA. > > > > +The AFU user could potentially access the whole device addressing space and > > > > +impact the private feature. So now the general HW design rule is, no DMA > > > > +capability for private features. It eliminates the DMA isolation problem. > > > > > > What's the advantage of entangling mdev/vfio in this approach versus > > > simply exposing the MMIO region of the device via sysfs (similar to a > > > resource file in pci-sysfs)? This implementation doesn't support > > > interrupts, it doesn't support multiplexing of a device, it doesn't > > > perform any degree of mediation, it seems to simply say "please don't > > > do DMA". I don't think that's acceptable for an mdev driver. If you > > > want to play loose with isolation, do it somewhere else. Thanks, > > > > The intention of the patchset is to enable the userspace drivers for dfl > > devices. The dfl devices are actually different IP blocks integrated in > > FPGA to support different board functionalities. They are sub devices of > > the FPGA PCIe device. Their mmio blocks are in PCIE bar regions. And we > > want some of the dfl devices handled by the userspace drivers. > > > > Some dfl devices are capable of interrupt. I didn't add interrupt code > > in this patch cause now the IRQ capable dfl devices are all handled by > > kernel drivers. But as a generic FPGA platform, IRQ handling for userspace > > drivers should be supported. > > > > And I can see there are several ways to enable the userspace driver. > > > > 1. Some specific sysfs like pci do. But seems it is not the common way for > > userspace driver. It does't support interrupt. And potentially users > > operate on the same mmio region together with kernel driver at the same > > time. > > > > 2. VFIO driver with NOIOMMU enabled. I think it meets our needs. Do you > > think it is good we implement an VFIO driver for dfl devices? > > > > 3. VFIO mdev. I implemented it because it will not block us from lacking > > of valid iommu group. And since the driver didn't perform any mediation, > > I should give up. > > > > 4. UIO driver. It should work. I'm wondering if option 2 covers the > > functionalities of UIO and has more enhancement. So option 2 may be > > better? > > > > Thanks again for your time, and I really appreciate you would give some > > guide on it. > > > VFIO no-iommu was intended as a transition helper for platforms that do > not support an IOMMU, particularly running within a VM where we use > regular, IOMMU protected devices within the host, but allow no-iommu > within the guest such that the host is still protected from the guest > sandbox. There should be no new use cases of no-iommu, it's unsafe, it > taints the kernel where it's used (guest in the above intended use > case). If you intend long term distribution support of a solution, > VFIO no-iommu should not be considered an option. > > VFIO mdev requires that the mdev vendor driver mediates access to the > device in order to provide isolation. In the initial vGPU use cases, > we expect that isolation to be provided via devices specific means, ex. > GPU GART, but we've since included system level components like IOMMU > backing devices and auxiliary domains, the latter to make use of IOMMU > PASID support. > > As implemented in this proposal, mdev is being used only to subvert the > IOMMU grouping requirements of VFIO in order to order to expose a > device that is potentially fully capable of DMA to userspace with no > isolation whatsoever. If not for the IOMMU grouping, this driver could > simply be a VFIO bus driver making use of the vfio-platform interface. > Either way, without isolation, this does not belong in the realm of > VFIO. > > Given your architecture, the only potentially valid mdev use case I can > see would be if the mdev vendor driver binds to the PCIe device, > allowing multiplexing of the parent device by carving out fpga > functional blocks from MMIO BAR space, and providing isolation by > enforcing that the parent device never enables bus master, assuming > that would prevent any of the fpga sub-components from performing DMA. > > Are there worthwhile use cases of these fpga devices without DMA? The board (Intel PAC N3000) we want to support is a Smart NIC. The FPGA part is responsible for the various MAC layer offloading. The software for FPGA usually doesn't touch the network data stream (they are all handled by FPGA logic), it does some configurations and link status reading so no DMA is required. These configurations are sometimes very specific to dynamic RTL logic developed by user, so this is the purpose we handle them in userspace. There are some other usercases, which use FPGA to do some dedicated algorithm like for deep learning. They need to perform memory in and memory out. For these cases, it seems possible we carving out the related part as the mdev, leaving management part in parent device driver. > > If you need DMA (or the device is potentially capable of DMA and > cannot be audited to prevent it) and cannot provide isolation then > please don't use VFIO or mdev, doing so would violate the notion of > secure userspace device access that we've worked to achieve in this > ecosystem. > > If you choose another route, pci-sysfs already provides full BAR access > via the resource files in sysfs, but you could also expose individual > sysfs files with the same capabilities per fpga functional unit to > resolve the conflict between kernel and userspace "ownership". UIO > might also be a solution. This proposal to restrict userspace usage to > devices that don't perform DMA is akin to uio_pci_generic, where the > user is not expected to enable bus master, but nothing prevents them We are going to exposes these devices not capable of DMA, so seems UIO is the right way to go. > from doing so and as a result it's a gateway for all sorts of > unsupportable drivers. mdev should not be used to follow that example. I should thank you again for the detail explanation. Yilun > Thanks, > > Alex
diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst index 0404fe6..f077754 100644 --- a/Documentation/fpga/dfl.rst +++ b/Documentation/fpga/dfl.rst @@ -502,6 +502,26 @@ FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c) could be a reference. +VFIO Mdev support for DFL devices +================================= +As we introduced a dfl bus for private features, they could be added to dfl bus +as independent dfl devices. There is a requirement to handle these devices +either by kernel drivers or by direct access from userspace. Usually we bind +the kernel drivers to devices which provide board management functions, and +gives user direct access to devices which cooperate closely with user +controlled Accelerated Function Unit (AFU). We realize this with a VFIO Mdev +implementation. When we bind the vfio-mdev-dfl driver to a dfl device, it +realizes a group of callbacks and registers to the Mdev framework as a +parent (physical) device. It could then create one (available_instances == 1) +mdev device. +Since dfl devices are sub devices of FPGA DFL physical devices (e.g. PCIE +device), which provide no DMA isolation for each sub device, this may leads to +DMA isolation problem if a private feature is designed to be capable of DMA. +The AFU user could potentially access the whole device addressing space and +impact the private feature. So now the general HW design rule is, no DMA +capability for private features. It eliminates the DMA isolation problem. + + Open discussion =============== FME driver exports one ioctl (DFL_FPGA_FME_PORT_PR) for partial reconfiguration