diff mbox series

[v2] pci/ats: do not allow broken devices to be assigned to guests

Message ID 20220224163701.89404-1-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series [v2] pci/ats: do not allow broken devices to be assigned to guests | expand

Commit Message

Roger Pau Monné Feb. 24, 2022, 4:37 p.m. UTC
Introduce a new field to mark devices as broken: having it set
prevents the device from being assigned to guests. Use the field in
order to mark ATS devices that have failed a flush as broken, thus
preventing them to be assigned to any guest.

This allows the device IOMMU context entry to be cleaned up properly,
as calling _pci_hide_device will just change the ownership of the
device, but the IOMMU context entry of the device would be left as-is.
It would also leak a Domain ID, as removing the device from it's
previous owner will allow releasing the DID used by the device without
having cleaned up the context entry.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Oleksandr Andrushchenko <andr2000@gmail.com>
---
Changes since v1:
 - Allow assigning broken devices to dom_io or the hardware domain.
---
 xen/drivers/passthrough/pci.c        | 11 +++++++----
 xen/drivers/passthrough/vtd/qinval.c |  8 +++++++-
 xen/include/xen/pci.h                |  3 +++
 3 files changed, 17 insertions(+), 5 deletions(-)

Comments

Jan Beulich Feb. 24, 2022, 4:43 p.m. UTC | #1
On 24.02.2022 17:37, Roger Pau Monne wrote:
> Introduce a new field to mark devices as broken: having it set
> prevents the device from being assigned to guests. Use the field in
> order to mark ATS devices that have failed a flush as broken, thus
> preventing them to be assigned to any guest.
> 
> This allows the device IOMMU context entry to be cleaned up properly,
> as calling _pci_hide_device will just change the ownership of the
> device, but the IOMMU context entry of the device would be left as-is.
> It would also leak a Domain ID, as removing the device from it's
> previous owner will allow releasing the DID used by the device without
> having cleaned up the context entry.

This DID aspect is VT-d specific, isn't it? I'd be inclined to ask to
make this explicit (which could be done while committing if no other
need for a v3 arises).

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Roger Pau Monné Feb. 25, 2022, 8:41 a.m. UTC | #2
On Thu, Feb 24, 2022 at 05:43:13PM +0100, Jan Beulich wrote:
> On 24.02.2022 17:37, Roger Pau Monne wrote:
> > Introduce a new field to mark devices as broken: having it set
> > prevents the device from being assigned to guests. Use the field in
> > order to mark ATS devices that have failed a flush as broken, thus
> > preventing them to be assigned to any guest.
> > 
> > This allows the device IOMMU context entry to be cleaned up properly,
> > as calling _pci_hide_device will just change the ownership of the
> > device, but the IOMMU context entry of the device would be left as-is.
> > It would also leak a Domain ID, as removing the device from it's
> > previous owner will allow releasing the DID used by the device without
> > having cleaned up the context entry.
> 
> This DID aspect is VT-d specific, isn't it? I'd be inclined to ask to
> make this explicit (which could be done while committing if no other
> need for a v3 arises).

Indeed. AMD doesn't use iommu_dev_iotlb_flush_timeout so the function
is VT-d specific. What about using:

"Introduce a new field to mark devices as broken: having it set
prevents the device from being assigned to guests. Use the field in
order to mark ATS devices that have failed a flush when using VT-d as
broken, thus preventing them to be assigned to any guest.

This allows the device IOMMU context entry to be cleaned up properly,
as calling _pci_hide_device will just change the ownership of the
device, but the IOMMU context entry of the device would be left as-is.
It would also leak a VT-d Domain ID if using one, as removing the
device from it's previous owner will allow releasing the IOMMU DID
used by the device without having cleaned up the context entry."

Thanks, Roger.
Jan Beulich Feb. 25, 2022, 8:50 a.m. UTC | #3
On 25.02.2022 09:41, Roger Pau Monné wrote:
> On Thu, Feb 24, 2022 at 05:43:13PM +0100, Jan Beulich wrote:
>> On 24.02.2022 17:37, Roger Pau Monne wrote:
>>> Introduce a new field to mark devices as broken: having it set
>>> prevents the device from being assigned to guests. Use the field in
>>> order to mark ATS devices that have failed a flush as broken, thus
>>> preventing them to be assigned to any guest.
>>>
>>> This allows the device IOMMU context entry to be cleaned up properly,
>>> as calling _pci_hide_device will just change the ownership of the
>>> device, but the IOMMU context entry of the device would be left as-is.
>>> It would also leak a Domain ID, as removing the device from it's
>>> previous owner will allow releasing the DID used by the device without
>>> having cleaned up the context entry.
>>
>> This DID aspect is VT-d specific, isn't it? I'd be inclined to ask to
>> make this explicit (which could be done while committing if no other
>> need for a v3 arises).
> 
> Indeed. AMD doesn't use iommu_dev_iotlb_flush_timeout so the function
> is VT-d specific.

But perhaps wrongly so. Which is why I'd prefer to ...

> What about using:
> 
> "Introduce a new field to mark devices as broken: having it set
> prevents the device from being assigned to guests. Use the field in
> order to mark ATS devices that have failed a flush when using VT-d as
> broken, thus preventing them to be assigned to any guest.

... omit VT-d here (i.e. leave this paragraph as you had it before),
but ...

> This allows the device IOMMU context entry to be cleaned up properly,
> as calling _pci_hide_device will just change the ownership of the
> device, but the IOMMU context entry of the device would be left as-is.
> It would also leak a VT-d Domain ID if using one, as removing the
> device from it's previous owner will allow releasing the IOMMU DID
> used by the device without having cleaned up the context entry."

... use this as replacement.

Jan
Roger Pau Monné Feb. 25, 2022, 9 a.m. UTC | #4
On Fri, Feb 25, 2022 at 09:50:03AM +0100, Jan Beulich wrote:
> On 25.02.2022 09:41, Roger Pau Monné wrote:
> > On Thu, Feb 24, 2022 at 05:43:13PM +0100, Jan Beulich wrote:
> >> On 24.02.2022 17:37, Roger Pau Monne wrote:
> >>> Introduce a new field to mark devices as broken: having it set
> >>> prevents the device from being assigned to guests. Use the field in
> >>> order to mark ATS devices that have failed a flush as broken, thus
> >>> preventing them to be assigned to any guest.
> >>>
> >>> This allows the device IOMMU context entry to be cleaned up properly,
> >>> as calling _pci_hide_device will just change the ownership of the
> >>> device, but the IOMMU context entry of the device would be left as-is.
> >>> It would also leak a Domain ID, as removing the device from it's
> >>> previous owner will allow releasing the DID used by the device without
> >>> having cleaned up the context entry.
> >>
> >> This DID aspect is VT-d specific, isn't it? I'd be inclined to ask to
> >> make this explicit (which could be done while committing if no other
> >> need for a v3 arises).
> > 
> > Indeed. AMD doesn't use iommu_dev_iotlb_flush_timeout so the function
> > is VT-d specific.
> 
> But perhaps wrongly so. Which is why I'd prefer to ...

I thought the same, but didn't care enough to try to fix the AMD side.

> > What about using:
> > 
> > "Introduce a new field to mark devices as broken: having it set
> > prevents the device from being assigned to guests. Use the field in
> > order to mark ATS devices that have failed a flush when using VT-d as
> > broken, thus preventing them to be assigned to any guest.
> 
> ... omit VT-d here (i.e. leave this paragraph as you had it before),
> but ...

OK, it wasn't my intention to make it sound like this is not required
for AMD, just not used ATM. Was merely trying to reflect the current
logic in the text.

> > This allows the device IOMMU context entry to be cleaned up properly,
> > as calling _pci_hide_device will just change the ownership of the
> > device, but the IOMMU context entry of the device would be left as-is.
> > It would also leak a VT-d Domain ID if using one, as removing the
> > device from it's previous owner will allow releasing the IOMMU DID
> > used by the device without having cleaned up the context entry."
> 
> ... use this as replacement.

Fine.

Thanks, Roger.
Tian, Kevin March 14, 2022, 4:23 a.m. UTC | #5
> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Friday, February 25, 2022 12:37 AM
> 
> Introduce a new field to mark devices as broken: having it set
> prevents the device from being assigned to guests. Use the field in
> order to mark ATS devices that have failed a flush as broken, thus
> preventing them to be assigned to any guest.
> 
> This allows the device IOMMU context entry to be cleaned up properly,
> as calling _pci_hide_device will just change the ownership of the
> device, but the IOMMU context entry of the device would be left as-is.
> It would also leak a Domain ID, as removing the device from it's
> previous owner will allow releasing the DID used by the device without
> having cleaned up the context entry.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> Cc: Oleksandr Andrushchenko <andr2000@gmail.com>
> ---
> Changes since v1:
>  - Allow assigning broken devices to dom_io or the hardware domain.
> ---
>  xen/drivers/passthrough/pci.c        | 11 +++++++----
>  xen/drivers/passthrough/vtd/qinval.c |  8 +++++++-
>  xen/include/xen/pci.h                |  3 +++
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 70b6684981..91b43a3f04 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -501,7 +501,7 @@ static void free_pdev(struct pci_seg *pseg, struct
> pci_dev *pdev)
>      xfree(pdev);
>  }
> 
> -static void _pci_hide_device(struct pci_dev *pdev)
> +static void __init _pci_hide_device(struct pci_dev *pdev)
>  {
>      if ( pdev->domain )
>          return;
> @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg,
> u8 bus, u8 devfn, u32 flag)
>      ASSERT(pdev && (pdev->domain == hardware_domain ||
>                      pdev->domain == dom_io));
> 
> +    /* Do not allow broken devices to be assigned to guests. */
> +    rc = -EBADF;
> +    if ( pdev->broken && d != hardware_domain && d != dom_io )
> +        goto done;
> +
>      rc = pdev_msix_assign(d, pdev);
>      if ( rc )
>          goto done;
> @@ -1585,9 +1590,7 @@ void iommu_dev_iotlb_flush_timeout(struct
> domain *d, struct pci_dev *pdev)
>          return;
>      }
> 
> -    list_del(&pdev->domain_list);
> -    pdev->domain = NULL;
> -    _pci_hide_device(pdev);
> +    pdev->broken = true;
> 
>      if ( !d->is_shutting_down && printk_ratelimit() )
>          printk(XENLOG_ERR "dom%d: ATS device %pp flush failed\n",
> diff --git a/xen/drivers/passthrough/vtd/qinval.c
> b/xen/drivers/passthrough/vtd/qinval.c
> index 9f291f47e5..510961a203 100644
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -227,7 +227,7 @@ static int __must_check dev_invalidate_sync(struct
> vtd_iommu *iommu,
> 
>      ASSERT(iommu->qinval_maddr);
>      rc = queue_invalidate_wait(iommu, 0, 1, 1, 1);
> -    if ( rc == -ETIMEDOUT )
> +    if ( rc == -ETIMEDOUT && !pdev->broken )
>      {
>          struct domain *d = rcu_lock_domain_by_id(did_to_domain_id(iommu,
> did));
> 
> @@ -241,6 +241,12 @@ static int __must_check dev_invalidate_sync(struct
> vtd_iommu *iommu,
>          iommu_dev_iotlb_flush_timeout(d, pdev);
>          rcu_unlock_domain(d);
>      }
> +    else if ( rc == -ETIMEDOUT )
> +        /*
> +         * The device is already marked as broken, ignore the error in order to
> +         * allow {de,}assign to succeed.
> +         */
> +        rc = 0;
> 
>      return rc;
>  }
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index b6d7e454f8..02b31f7259 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -108,6 +108,9 @@ struct pci_dev {
>      /* Device with errata, ignore the BARs. */
>      bool ignore_bars;
> 
> +    /* Device misbehaving, prevent assigning it to guests. */
> +    bool broken;
> +
>      enum pdev_type {
>          DEV_TYPE_PCI_UNKNOWN,
>          DEV_TYPE_PCIe_ENDPOINT,
> --
> 2.34.1
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 70b6684981..91b43a3f04 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -501,7 +501,7 @@  static void free_pdev(struct pci_seg *pseg, struct pci_dev *pdev)
     xfree(pdev);
 }
 
-static void _pci_hide_device(struct pci_dev *pdev)
+static void __init _pci_hide_device(struct pci_dev *pdev)
 {
     if ( pdev->domain )
         return;
@@ -1487,6 +1487,11 @@  static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     ASSERT(pdev && (pdev->domain == hardware_domain ||
                     pdev->domain == dom_io));
 
+    /* Do not allow broken devices to be assigned to guests. */
+    rc = -EBADF;
+    if ( pdev->broken && d != hardware_domain && d != dom_io )
+        goto done;
+
     rc = pdev_msix_assign(d, pdev);
     if ( rc )
         goto done;
@@ -1585,9 +1590,7 @@  void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev)
         return;
     }
 
-    list_del(&pdev->domain_list);
-    pdev->domain = NULL;
-    _pci_hide_device(pdev);
+    pdev->broken = true;
 
     if ( !d->is_shutting_down && printk_ratelimit() )
         printk(XENLOG_ERR "dom%d: ATS device %pp flush failed\n",
diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index 9f291f47e5..510961a203 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -227,7 +227,7 @@  static int __must_check dev_invalidate_sync(struct vtd_iommu *iommu,
 
     ASSERT(iommu->qinval_maddr);
     rc = queue_invalidate_wait(iommu, 0, 1, 1, 1);
-    if ( rc == -ETIMEDOUT )
+    if ( rc == -ETIMEDOUT && !pdev->broken )
     {
         struct domain *d = rcu_lock_domain_by_id(did_to_domain_id(iommu, did));
 
@@ -241,6 +241,12 @@  static int __must_check dev_invalidate_sync(struct vtd_iommu *iommu,
         iommu_dev_iotlb_flush_timeout(d, pdev);
         rcu_unlock_domain(d);
     }
+    else if ( rc == -ETIMEDOUT )
+        /*
+         * The device is already marked as broken, ignore the error in order to
+         * allow {de,}assign to succeed.
+         */
+        rc = 0;
 
     return rc;
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index b6d7e454f8..02b31f7259 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -108,6 +108,9 @@  struct pci_dev {
     /* Device with errata, ignore the BARs. */
     bool ignore_bars;
 
+    /* Device misbehaving, prevent assigning it to guests. */
+    bool broken;
+
     enum pdev_type {
         DEV_TYPE_PCI_UNKNOWN,
         DEV_TYPE_PCIe_ENDPOINT,