diff mbox series

[04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

Message ID 20211115020552.2378167-5-baolu.lu@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Fix BUG_ON in vfio_iommu_group_notifier() | expand

Commit Message

Baolu Lu Nov. 15, 2021, 2:05 a.m. UTC
IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
then all of the downstream devices will be part of the same IOMMU group
as the bridge. As long as the bridge kernel driver doesn't map and
access any PCI mmio bar, it's safe to bind it to the device in a USER-
owned group. Hence, safe to suppress the kernel DMA ownership auto-
claiming.

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
class of kernel drivers. This is not always safe. For example, the SHPC
system design requires that it must be integrated into a PCI-to-PCI
bridge or a host bridge. The shpchp_core driver relies on the PCI mmio
bar access for the controller functionality. Binding it to the device
belonging to a USER-owned group will allow the user to change the
controller via p2p transactions which is unknown to the hot-plug driver
and could lead to some unpredictable consequences.

Now that we have driver self-declaration of safety we should rely on that.
This change may cause regression on some platforms, since all bridges were
exempted before, but now they have to be manually audited before doing so.
This is actually the desired outcome anyway.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Bjorn Helgaas Nov. 15, 2021, 8:44 p.m. UTC | #1
On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge.

I think this means something like: "If a PCIe Switch Downstream Port
lacks <a specific set of ACS capabilities>, all downstream devices
will be part of the same IOMMU group as the switch," right?

If so, can you fill in the details to make it specific and concrete?

> As long as the bridge kernel driver doesn't map and
> access any PCI mmio bar, it's safe to bind it to the device in a USER-
> owned group. Hence, safe to suppress the kernel DMA ownership auto-
> claiming.

s/mmio/MMIO/ (also below)
s/bar/BAR/ (also below)

I don't understand what "kernel DMA ownership auto-claiming" means.
Presumably that's explained in previous patches and a code comment
near "suppress_auto_claim_dma_owner".

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
> class of kernel drivers. 

Permitted them to do what?

> This is not always safe. For example, the SHPC
> system design requires that it must be integrated into a PCI-to-PCI
> bridge or a host bridge.

If this SHPC example is important, it would be nice to have a citation
to the spec section that requires this.

> The shpchp_core driver relies on the PCI mmio
> bar access for the controller functionality. Binding it to the device
> belonging to a USER-owned group will allow the user to change the
> controller via p2p transactions which is unknown to the hot-plug driver
> and could lead to some unpredictable consequences.
> 
> Now that we have driver self-declaration of safety we should rely on that.

Can you spell out what drivers are self-declaring?  Are they declaring
that they don't program their devices to do DMA?

> This change may cause regression on some platforms, since all bridges were
> exempted before, but now they have to be manually audited before doing so.
> This is actually the desired outcome anyway.

Please spell out what regression this may cause and how users would
recognize it.  Also, please give a hint about why that is desirable.

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..1285862a9aa8 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
>  	.driver.pm	= PCIE_PORTDRV_PM_OPS,
> +
> +	.driver.suppress_auto_claim_dma_owner = true,
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
>
Baolu Lu Nov. 16, 2021, 7:24 a.m. UTC | #2
Hi Bjorn,

On 2021/11/16 4:44, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
>> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
>> then all of the downstream devices will be part of the same IOMMU group
>> as the bridge.
> 
> I think this means something like: "If a PCIe Switch Downstream Port
> lacks <a specific set of ACS capabilities>, all downstream devices
> will be part of the same IOMMU group as the switch," right?

For this patch, yes.

> 
> If so, can you fill in the details to make it specific and concrete?

The existing vfio implementation allows a kernel driver to bind with a
PCI bridge while its downstream devices are assigned to the user space
though there lacks ACS-like isolation in bridge.

drivers/vfio/vfio.c:
  540 static bool vfio_dev_driver_allowed(struct device *dev,
  541                                     struct device_driver *drv)
  542 {
  543         if (dev_is_pci(dev)) {
  544                 struct pci_dev *pdev = to_pci_dev(dev);
  545
  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
  547                         return true;
  548         }

We are moving the group viability check to IOMMU core, and trying to
make it compatible with the current vfio policy. We saw three types of
bridges:

#1) PCIe/PCI-to-PCI bridges
     These bridges are configured in the PCI framework, there's no
     dedicated driver for such devices.

#2) Generic PCIe switch downstream port
     The port driver doesn't map and access any MMIO in the PCI BAR.
     The iommu group is viable to user even this driver is bound.

#3) Hot Plug Controller
     The controller driver maps and access the device MMIO. The iommu
     group is not viable to user with this driver bound to its device.

>> As long as the bridge kernel driver doesn't map and
>> access any PCI mmio bar, it's safe to bind it to the device in a USER-
>> owned group. Hence, safe to suppress the kernel DMA ownership auto-
>> claiming.
> 
> s/mmio/MMIO/ (also below)
> s/bar/BAR/ (also below)

Sure.

> 
> I don't understand what "kernel DMA ownership auto-claiming" means.
> Presumably that's explained in previous patches and a code comment
> near "suppress_auto_claim_dma_owner".

When a device driver is about to bind with a device, the driver core
will claim the kernel DMA ownership automatically for the driver.

This implies that
- if success, the kernel driver is controlling the device for DMA. Any
   devices sitting in the same iommu group shouldn't be assigned to user.
- if failure, some devices sitting in the same iommu group have already
   been assigned to user space. The driver binding process should abort.

But there are some exceptions where suppress_auto_claim_dma_owner comes
to play.

#1) vfio-like drivers which will assign the devices to user space after
     driver binding;
#2) (compatible with exiting vfio policy) some drivers are allowed to be
     bound with a device while its siblings in the iommu group are
     assigned to user space. Typically, these drivers include
     - pci_stub driver
     - pci bridge drivers

For above drivers, we use driver.suppress_auto_claim_dma_owner as a hint
to tell the driver core to ignore the kernel dma ownership claiming.

> 
>> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
>> class of kernel drivers.
> 
> Permitted them to do what?

As I explained above.

> 
>> This is not always safe. For example, the SHPC
>> system design requires that it must be integrated into a PCI-to-PCI
>> bridge or a host bridge.
> 
> If this SHPC example is important, it would be nice to have a citation
> to the spec section that requires this.

I just used it as an example to show that allowing any driver to be
bound to a PCI bridge device in a USER-viable iommu group is too loose.

> 
>> The shpchp_core driver relies on the PCI mmio
>> bar access for the controller functionality. Binding it to the device
>> belonging to a USER-owned group will allow the user to change the
>> controller via p2p transactions which is unknown to the hot-plug driver
>> and could lead to some unpredictable consequences.
>>
>> Now that we have driver self-declaration of safety we should rely on that.
> 
> Can you spell out what drivers are self-declaring?  Are they declaring
> that they don't program their devices to do DMA?

Sure. Set driver.suppress_auto_claim_dma_owner = true.

> 
>> This change may cause regression on some platforms, since all bridges were
>> exempted before, but now they have to be manually audited before doing so.
>> This is actually the desired outcome anyway.
> 
> Please spell out what regression this may cause and how users would
> recognize it.  Also, please give a hint about why that is desirable.

Sure.

Before this series, bridge drivers are always allowed to be bound with
PCI/PCIe bridge which is sitting in an iommu group assigned to user
space. After this, only those drivers which have
suppress_auto_claim_dma_owner set could be done so. Otherwise, the
driver binding or group user assignment will fail.

The criteria of what kind of drivers could have this hint set is "driver
doesn't map and access the MMIO define in the PCI BARs".

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..1285862a9aa8 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>>   	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>> +
>> +	.driver.suppress_auto_claim_dma_owner = true,
>>   };
>>   
>>   static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
>> -- 
>> 2.25.1
>>

Best regards,
baolu
Bjorn Helgaas Nov. 16, 2021, 8:22 p.m. UTC | #3
On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > then all of the downstream devices will be part of the same IOMMU group
> > > as the bridge.
> > 
> > I think this means something like: "If a PCIe Switch Downstream Port
> > lacks <a specific set of ACS capabilities>, all downstream devices
> > will be part of the same IOMMU group as the switch," right?
> 
> For this patch, yes.
> 
> > If so, can you fill in the details to make it specific and concrete?
> 
> The existing vfio implementation allows a kernel driver to bind with a
> PCI bridge while its downstream devices are assigned to the user space
> though there lacks ACS-like isolation in bridge.
> 
> drivers/vfio/vfio.c:
>  540 static bool vfio_dev_driver_allowed(struct device *dev,
>  541                                     struct device_driver *drv)
>  542 {
>  543         if (dev_is_pci(dev)) {
>  544                 struct pci_dev *pdev = to_pci_dev(dev);
>  545
>  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>  547                         return true;
>  548         }
> 
> We are moving the group viability check to IOMMU core, and trying to
> make it compatible with the current vfio policy. We saw three types of
> bridges:
> 
> #1) PCIe/PCI-to-PCI bridges
>     These bridges are configured in the PCI framework, there's no
>     dedicated driver for such devices.
> 
> #2) Generic PCIe switch downstream port
>     The port driver doesn't map and access any MMIO in the PCI BAR.
>     The iommu group is viable to user even this driver is bound.
> 
> #3) Hot Plug Controller
>     The controller driver maps and access the device MMIO. The iommu
>     group is not viable to user with this driver bound to its device.

I *guess* the question here is whether the bridge can or will do DMA?

I think that's orthogonal to the question of whether it implements
BARs, so I'm not sure why the MMIO BARs are part of this discussion.
I assume it's theoretically possible for a driver to use registers in
config space to program a device to do DMA, even if the device has no
BARs.

Bjorn
Jason Gunthorpe Nov. 16, 2021, 8:48 p.m. UTC | #4
On Tue, Nov 16, 2021 at 02:22:01PM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> > On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > > then all of the downstream devices will be part of the same IOMMU group
> > > > as the bridge.
> > > 
> > > I think this means something like: "If a PCIe Switch Downstream Port
> > > lacks <a specific set of ACS capabilities>, all downstream devices
> > > will be part of the same IOMMU group as the switch," right?
> > 
> > For this patch, yes.
> > 
> > > If so, can you fill in the details to make it specific and concrete?
> > 
> > The existing vfio implementation allows a kernel driver to bind with a
> > PCI bridge while its downstream devices are assigned to the user space
> > though there lacks ACS-like isolation in bridge.
> > 
> > drivers/vfio/vfio.c:
> >  540 static bool vfio_dev_driver_allowed(struct device *dev,
> >  541                                     struct device_driver *drv)
> >  542 {
> >  543         if (dev_is_pci(dev)) {
> >  544                 struct pci_dev *pdev = to_pci_dev(dev);
> >  545
> >  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> >  547                         return true;
> >  548         }
> > 
> > We are moving the group viability check to IOMMU core, and trying to
> > make it compatible with the current vfio policy. We saw three types of
> > bridges:
> > 
> > #1) PCIe/PCI-to-PCI bridges
> >     These bridges are configured in the PCI framework, there's no
> >     dedicated driver for such devices.
> > 
> > #2) Generic PCIe switch downstream port
> >     The port driver doesn't map and access any MMIO in the PCI BAR.
> >     The iommu group is viable to user even this driver is bound.
> > 
> > #3) Hot Plug Controller
> >     The controller driver maps and access the device MMIO. The iommu
> >     group is not viable to user with this driver bound to its device.
> 
> I *guess* the question here is whether the bridge can or will do DMA?
> I think that's orthogonal to the question of whether it implements
> BARs, so I'm not sure why the MMIO BARs are part of this discussion.
> I assume it's theoretically possible for a driver to use registers in
> config space to program a device to do DMA, even if the device has no
> BARs.

There are two questions Lu is trying to get at:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
    a dma_map_* API is a sure indicate the driver is doing DMA

    Kernel DMA doesn't work if the PCI device is attached to
    non-default iommu domain.

 2) If the bridge driver uses MMIO, is it tolerant to hostile
    userspace also touching the same MMIO registers via P2P DMA
    attacks?

    Conservatively if the driver maps a MMIO region at all
    we can say it fails this test.

Unless someone want to do the audit work, identifying MMIO usage alone
is sufficient to disqualify a driver.

Jason
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..1285862a9aa8 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -203,6 +203,8 @@  static struct pci_driver pcie_portdriver = {
 	.err_handler	= &pcie_portdrv_err_handler,
 
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+
+	.driver.suppress_auto_claim_dma_owner = true,
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)