Message ID | 20220224124344.86192-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] pci/ats: do not allow broken devices to be assigned | expand |
On 24.02.2022 13:43, Roger Pau Monne wrote: > Introduce a new field to mark devices as broken: having it set > prevents the device from being assigned to domains. Use the field in > order to mark ATS devices that have failed a flush as broken, thus > preventing them to be assigned to any guest. > > This allows the device IOMMU context entry to be cleaned up properly, > as calling _pci_hide_device will just change the ownership of the > device, but the IOMMU context entry of the device would be left as-is. > It would also leak a Domain ID, as removing the device from it's > previous owner will allow releasing the DID used by the device without > having cleaned up the context entry. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > RFC: I haven't tested the code path, as I have no ATS devices on the > box I'm currently testing on. In any case, ATS is not supported, and > removing the call to _pci_hide_device in iommu_dev_iotlb_flush_timeout > should allow to remove the dependency on recursive pcidevs lock. No objection in principle. Whether this is the only dependency on recursive pcidevs lock isn't really know though, is it? > TBD: it's unclear whether we still need the pcidevs_lock in > iommu_dev_iotlb_flush_timeout. The caller of > iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a > list of pdevs without holding the pcidevs_lock. Analysis of whether / where recursive uses are needed should imo include cases where the lock ought to be held, but currently isn't (like apparently this case). > @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > ASSERT(pdev && (pdev->domain == hardware_domain || > pdev->domain == dom_io)); > > + /* Do not allow broken devices to be assigned. */ > + rc = -EBADF; > + if ( pdev->broken ) > + goto done; I think this wants exceptions for Dom0 and DomIO. An admin may be able to fix things in Dom0, e.g. by updating device firmware. Jan
On Thu, Feb 24, 2022 at 01:58:31PM +0100, Jan Beulich wrote: > On 24.02.2022 13:43, Roger Pau Monne wrote: > > Introduce a new field to mark devices as broken: having it set > > prevents the device from being assigned to domains. Use the field in > > order to mark ATS devices that have failed a flush as broken, thus > > preventing them to be assigned to any guest. > > > > This allows the device IOMMU context entry to be cleaned up properly, > > as calling _pci_hide_device will just change the ownership of the > > device, but the IOMMU context entry of the device would be left as-is. > > It would also leak a Domain ID, as removing the device from it's > > previous owner will allow releasing the DID used by the device without > > having cleaned up the context entry. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > RFC: I haven't tested the code path, as I have no ATS devices on the > > box I'm currently testing on. In any case, ATS is not supported, and > > removing the call to _pci_hide_device in iommu_dev_iotlb_flush_timeout > > should allow to remove the dependency on recursive pcidevs lock. > > No objection in principle. Whether this is the only dependency on > recursive pcidevs lock isn't really know though, is it? Indeed. I didn't word this clearly. The recursive lock was introduced for this specific use case. Whether we have gained more recursive paths in the meantime I haven't assessed. > > TBD: it's unclear whether we still need the pcidevs_lock in > > iommu_dev_iotlb_flush_timeout. The caller of > > iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a > > list of pdevs without holding the pcidevs_lock. > > Analysis of whether / where recursive uses are needed should imo > include cases where the lock ought to be held, but currently isn't > (like apparently this case). Well, I'm not opposed to that. I think just aiming to get the current usages analyzed will already be hard. > > @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > > ASSERT(pdev && (pdev->domain == hardware_domain || > > pdev->domain == dom_io)); > > > > + /* Do not allow broken devices to be assigned. */ > > + rc = -EBADF; > > + if ( pdev->broken ) > > + goto done; > > I think this wants exceptions for Dom0 and DomIO. An admin may be > able to fix things in Dom0, e.g. by updating device firmware. Doesn't a device get assigned to DomIO (or Dom0 if not using quarantine mode), and then when deassigned from DomIO gets assigned to Dom0? So there's no usage of assign_device in the path that (re)assigns a device used by a guest into Dom0? Or would you rather imply that pdev->broken should get cleared at some point (ie: when the device is assigned back to Dom0)? Thanks, Roger.
On 24.02.2022 15:53, Roger Pau Monné wrote: > On Thu, Feb 24, 2022 at 01:58:31PM +0100, Jan Beulich wrote: >> On 24.02.2022 13:43, Roger Pau Monne wrote: >>> TBD: it's unclear whether we still need the pcidevs_lock in >>> iommu_dev_iotlb_flush_timeout. The caller of >>> iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a >>> list of pdevs without holding the pcidevs_lock. >> >> Analysis of whether / where recursive uses are needed should imo >> include cases where the lock ought to be held, but currently isn't >> (like apparently this case). > > Well, I'm not opposed to that. I think just aiming to get the current > usages analyzed will already be hard. While I fully agree, the decision to drop recursive locking would better not put roadblocks in the way of adding locking where it is currently missing. >>> @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>> ASSERT(pdev && (pdev->domain == hardware_domain || >>> pdev->domain == dom_io)); >>> >>> + /* Do not allow broken devices to be assigned. */ >>> + rc = -EBADF; >>> + if ( pdev->broken ) >>> + goto done; >> >> I think this wants exceptions for Dom0 and DomIO. An admin may be >> able to fix things in Dom0, e.g. by updating device firmware. > > Doesn't a device get assigned to DomIO (or Dom0 if not using quarantine > mode), and then when deassigned from DomIO gets assigned to Dom0? > > So there's no usage of assign_device in the path that (re)assigns a > device used by a guest into Dom0? Well, this assumes all tool stacks behave like the xl one presently does. Which I think is the wrong way round: Making a device assignable should be "deassign from Dom0" (implicitly making it land in DomIO), while removing a device from the set of assignable ones should be "assign to Dom0" (implicitly deassigning from DomIO). That way (I think I said so elsewhere earlier on) libxl would also avoid the need to actually use DOMID_IO explicitly. It using DOMID_SELF like done in various other places would seem more clen to me. Paul - I think it was you who decided to make it work the way it currently does. Do you have any insights into your thought process back then which you could share? > Or would you rather imply that pdev->broken should get cleared at some > point (ie: when the device is assigned back to Dom0)? I did ponder this for a while when writing the earlier reply. But I decided against, for it being a functional change: _pci_hide_device() currently is also sticky. But yes, in principle if a misbehaving device _can_ be fixed (e.g. by updating its firmware), then I think there needs to be a way to clear the "broken" flag. Jan
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 70b6684981..4b81c1c04a 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -501,7 +501,7 @@ static void free_pdev(struct pci_seg *pseg, struct pci_dev *pdev) xfree(pdev); } -static void _pci_hide_device(struct pci_dev *pdev) +static void __init _pci_hide_device(struct pci_dev *pdev) { if ( pdev->domain ) return; @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); + /* Do not allow broken devices to be assigned. */ + rc = -EBADF; + if ( pdev->broken ) + goto done; + rc = pdev_msix_assign(d, pdev); if ( rc ) goto done; @@ -1585,9 +1590,7 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev) return; } - list_del(&pdev->domain_list); - pdev->domain = NULL; - _pci_hide_device(pdev); + pdev->broken = true; if ( !d->is_shutting_down && printk_ratelimit() ) printk(XENLOG_ERR "dom%d: ATS device %pp flush failed\n", diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c index 9f291f47e5..4436c22c05 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -227,7 +227,7 @@ static int __must_check dev_invalidate_sync(struct vtd_iommu *iommu, ASSERT(iommu->qinval_maddr); rc = queue_invalidate_wait(iommu, 0, 1, 1, 1); - if ( rc == -ETIMEDOUT ) + if ( rc == -ETIMEDOUT && !pdev->broken ) { struct domain *d = rcu_lock_domain_by_id(did_to_domain_id(iommu, did)); @@ -241,6 +241,12 @@ static int __must_check dev_invalidate_sync(struct vtd_iommu *iommu, iommu_dev_iotlb_flush_timeout(d, pdev); rcu_unlock_domain(d); } + else if ( rc == -ETIMEDOUT ) + /* + * The device is already marked as broken, ignore the error in order to + * allow deassign to succeed. + */ + rc = 0; return rc; } diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index b6d7e454f8..00b44e8487 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -108,6 +108,9 @@ struct pci_dev { /* Device with errata, ignore the BARs. */ bool ignore_bars; + /* Device misbehaving, prevent assigning it. */ + bool broken; + enum pdev_type { DEV_TYPE_PCI_UNKNOWN, DEV_TYPE_PCIe_ENDPOINT,
Introduce a new field to mark devices as broken: having it set prevents the device from being assigned to domains. Use the field in order to mark ATS devices that have failed a flush as broken, thus preventing them to be assigned to any guest. This allows the device IOMMU context entry to be cleaned up properly, as calling _pci_hide_device will just change the ownership of the device, but the IOMMU context entry of the device would be left as-is. It would also leak a Domain ID, as removing the device from it's previous owner will allow releasing the DID used by the device without having cleaned up the context entry. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- RFC: I haven't tested the code path, as I have no ATS devices on the box I'm currently testing on. In any case, ATS is not supported, and removing the call to _pci_hide_device in iommu_dev_iotlb_flush_timeout should allow to remove the dependency on recursive pcidevs lock. TBD: it's unclear whether we still need the pcidevs_lock in iommu_dev_iotlb_flush_timeout. The caller of iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a list of pdevs without holding the pcidevs_lock. TBD: getting rid of ATS altogether could also be an option, but it's more work. --- Cc: Oleksandr Andrushchenko <andr2000@gmail.com> --- xen/drivers/passthrough/pci.c | 11 +++++++---- xen/drivers/passthrough/vtd/qinval.c | 8 +++++++- xen/include/xen/pci.h | 3 +++ 3 files changed, 17 insertions(+), 5 deletions(-)