diff mbox

[v5,7/7] VT-d: Fix vt-d Device-TLB flush timeout issue.

Message ID 1454667507-79751-8-git-send-email-quan.xu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Quan Xu Feb. 5, 2016, 10:18 a.m. UTC
If Device-TLB flush is timeout, we'll hide the target ATS
device and crash the domain owning this ATS device.

If impacted domain is hardware domain, just throw out a warning.

The hidden device will be disallowed to be further assigned to
any domain.

Signed-off-by: Quan Xu <quan.xu@intel.com>
---
 xen/drivers/passthrough/pci.c         |  2 +-
 xen/drivers/passthrough/vtd/extern.h  |  8 +++-
 xen/drivers/passthrough/vtd/qinval.c  | 77 ++++++++++++++++++++++++++++++++++-
 xen/drivers/passthrough/vtd/x86/ats.c | 14 ++++++-
 4 files changed, 95 insertions(+), 6 deletions(-)

Comments

Jan Beulich Feb. 17, 2016, 2:40 p.m. UTC | #1
>>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -229,6 +229,69 @@ int qinval_device_iotlb(struct iommu *iommu,
>      return 0;
>  }
>  
> +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> +                                         u16 seg, u8 bus, u8 devfn,
> +                                         unsigned int lock)
> +{
> +    struct domain *d = NULL;
> +    struct pci_dev *pdev;
> +
> +    if ( test_bit(did, iommu->domid_bitmap) )
> +        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> +
> +    if ( d == NULL )
> +        return;
> +
> +    for_each_pdev(d, pdev)
> +    {
> +        if ( (pdev->seg == seg) &&
> +             (pdev->bus == bus) &&
> +             (pdev->devfn == devfn) )
> +        {
> +            ASSERT ( pdev->domain );

Stray blanks inside the parentheses.

> +            list_del(&pdev->domain_list);
> +            pdev->domain = NULL;
> +
> +            if ( !(lock & PCIDEVS_LOCK) )
> +                spin_lock(&pcidevs_lock);

This is too late - you may not iterate over pdev-s without holding
that lock, and the list_del() may not be done in that case either.
Plus this should be accompanied be an ASSERT() in the else case.

> +            if ( pci_hide_device(bus, devfn) )

But now I'm _really_ puzzled: You acquire the exact lock that
pci_hide_device() acquires. Hence, unless I've overlooked an earlier
change, I can't see this as other than an unconditional dead lock. Did
you test this code path at all?

> +            {
> +                printk(XENLOG_ERR
> +                       "IOMMU hide device %04x:%02x:%02x.%02x error.",
> +                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +            }

I hate to repeat myself: SBDF need to be printed in canonical
ssss:bb:dd.f form, to be grep-able. Also no full stops at the end
of log messages please. And (also mentioned before) if there are
error codes available, log them to aid diagnosis of problems.

Also - unnecessary figure braces.

> @@ -350,9 +413,19 @@ static int flush_iotlb_qi(
>          queue_invalidate_iotlb(iommu,
>                                 type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>                                 dw, did, size_order, 0, addr);
> -        if ( flush_dev_iotlb )
> -            ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> +
> +        /*
> +         * Before Device-TLB invalidation we need to synchronize
> +         * invalidation completions with hardware.
> +         */
>          rc = invalidate_sync(iommu);
> +        if ( rc )
> +             return rc;
> +
> +        if ( flush_dev_iotlb )
> +            ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> +                                       type, lock);
> +
>          if ( !ret )
>              ret = rc;

These two lines have now become pointless, and hence should
be deleted.

> @@ -162,6 +162,18 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
>              return -EOPNOTSUPP;
>          }
>  
> +        /*
> +         * Synchronize with hardware for Device-TLB invalidate
> +         * descriptor.
> +         */
> +        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
> +                                        pdev->bus, pdev->devfn, lock);
> +        if ( ret )
> +            printk(XENLOG_ERR
> +                   "Flush error %d on device %04x:%02x:%02x.%02x \n",
> +                   ret, pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                   PCI_FUNC(pdev->devfn));
> +
>          if ( !ret )
>              ret = rc;

The two context lines here show that you're now clobbering "ret".

Jan
Quan Xu Feb. 18, 2016, 7:36 a.m. UTC | #2
> On February 17, 2016 10:41pm, <JBeulich@suse.com> wrote:
> >>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > +            if ( pci_hide_device(bus, devfn) )
> 
> But now I'm _really_ puzzled: You acquire the exact lock that
> pci_hide_device() acquires. Hence, unless I've overlooked an earlier change, I
> can't see this as other than an unconditional dead lock. Did you test this code
> path at all?

Sorry, I didn't test this code path.
I did test the follows:
   1) Create domain with ATS device.
   2) Attach / Detach ATS device.

I think I could add a variation of pci_hide_device(), without "spin_lock(&pcidevs_lock) / spin_unlock(&pcidevs_lock)"
Or "__init".

But it is sure that different lock state is possible for different call trees when to flush an ATS device.
I verify it as follows:
1.print pcidevs_lock status in flush_iotlb_qi()

flush_iotlb_qi()
{
...
+    printk("__ pcidevs_lock : %d *__\n", spin_is_locked(&pcidevs_lock));
...
}

2. attach ATS device
      $xl pci-attach TestDom 0000:81:00.0
  #The print is "(XEN) __ pcidevs_lock : 1 *__"

3. reset memory of domain
      $ xl mem-set TestDom 2047m
  #the print is "(XEN) __ pcidevs_lock : 0 *__"

Quan
Jan Beulich Feb. 18, 2016, 8:35 a.m. UTC | #3
>>> On 18.02.16 at 08:36, <quan.xu@intel.com> wrote:
>>  On February 17, 2016 10:41pm, <JBeulich@suse.com> wrote:
>> >>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
>> > --- a/xen/drivers/passthrough/vtd/qinval.c
>> > +++ b/xen/drivers/passthrough/vtd/qinval.c
>> > +            if ( pci_hide_device(bus, devfn) )
>> 
>> But now I'm _really_ puzzled: You acquire the exact lock that
>> pci_hide_device() acquires. Hence, unless I've overlooked an earlier change, 
> I
>> can't see this as other than an unconditional dead lock. Did you test this 
> code
>> path at all?
> 
> Sorry, I didn't test this code path.
> I did test the follows:
>    1) Create domain with ATS device.
>    2) Attach / Detach ATS device.
> 
> I think I could add a variation of pci_hide_device(), without 
> "spin_lock(&pcidevs_lock) / spin_unlock(&pcidevs_lock)"
> Or "__init".

Which we already have - _pci_hide_device(). The function would
just need to be made non-static, but would otherwise fit your
purposes since you also don't need the alloc_pdev() the other
function does.

> But it is sure that different lock state is possible for different call 
> trees when to flush an ATS device.

Sure, that's why you pass down the flag. Presumably easiest
(albeit not nicest) will be to call the respective of the two functions
depending on the lock holding flag.

Jan
Quan Xu Feb. 18, 2016, 8:47 a.m. UTC | #4
> On February 18, 2016 4:36pm, <JBeulich@suse.com> wrote:
> >>> On 18.02.16 at 08:36, <quan.xu@intel.com> wrote:
> >>  On February 17, 2016 10:41pm, <JBeulich@suse.com> wrote:
> >> >>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
> >> > --- a/xen/drivers/passthrough/vtd/qinval.c
> >> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> >> > +            if ( pci_hide_device(bus, devfn) )
> >>
> >> But now I'm _really_ puzzled: You acquire the exact lock that
> >> pci_hide_device() acquires. Hence, unless I've overlooked an earlier
> >> change,
> > I
> >> can't see this as other than an unconditional dead lock. Did you test
> >> this
> > code
> >> path at all?
> >
> > Sorry, I didn't test this code path.
> > I did test the follows:
> >    1) Create domain with ATS device.
> >    2) Attach / Detach ATS device.
> >
> > I think I could add a variation of pci_hide_device(), without
> > "spin_lock(&pcidevs_lock) / spin_unlock(&pcidevs_lock)"
> > Or "__init".
> 
> Which we already have - _pci_hide_device(). The function would just need to be
> made non-static, but would otherwise fit your purposes since you also don't
> need the alloc_pdev() the other function does.
> 

But the _pci_hide_device() starts with '_', I think it is not a good idea to make it non-static.
I'd better introduce a new function to wrap the '_pci_hide_device()'. It is difficult to name the new function.
 

> > But it is sure that different lock state is possible for different
> > call trees when to flush an ATS device.
> 
> Sure, that's why you pass down the flag. Presumably easiest (albeit not nicest)
> will be to call the respective of the two functions depending on the lock holding
> flag.
> 

It is really a good idea. Thanks you for your advice.

Quan
Jan Beulich Feb. 18, 2016, 9:21 a.m. UTC | #5
>>> On 18.02.16 at 09:47, <quan.xu@intel.com> wrote:
>>  On February 18, 2016 4:36pm, <JBeulich@suse.com> wrote:
>> >>> On 18.02.16 at 08:36, <quan.xu@intel.com> wrote:
>> >>  On February 17, 2016 10:41pm, <JBeulich@suse.com> wrote:
>> >> >>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
>> >> > --- a/xen/drivers/passthrough/vtd/qinval.c
>> >> > +++ b/xen/drivers/passthrough/vtd/qinval.c
>> >> > +            if ( pci_hide_device(bus, devfn) )
>> >>
>> >> But now I'm _really_ puzzled: You acquire the exact lock that
>> >> pci_hide_device() acquires. Hence, unless I've overlooked an earlier
>> >> change,
>> > I
>> >> can't see this as other than an unconditional dead lock. Did you test
>> >> this
>> > code
>> >> path at all?
>> >
>> > Sorry, I didn't test this code path.
>> > I did test the follows:
>> >    1) Create domain with ATS device.
>> >    2) Attach / Detach ATS device.
>> >
>> > I think I could add a variation of pci_hide_device(), without
>> > "spin_lock(&pcidevs_lock) / spin_unlock(&pcidevs_lock)"
>> > Or "__init".
>> 
>> Which we already have - _pci_hide_device(). The function would just need to be
>> made non-static, but would otherwise fit your purposes since you also don't
>> need the alloc_pdev() the other function does.
> 
> But the _pci_hide_device() starts with '_', I think it is not a good idea to 
> make it non-static.
> I'd better introduce a new function to wrap the '_pci_hide_device()'. It is 
> difficult to name the new function.

Why add a wrapper - just rename the function (I indeed appreciate
you not wanting to globalize a function whose name starts with an
underscore), e.g. to pci_hide_device_locked() or
pci_hide_existing_device().

Jan
Quan Xu Feb. 18, 2016, 10 a.m. UTC | #6
> On  February 18, 2016 5:21pm,  <JBeulich@suse.com> wrote:
> >>> On 18.02.16 at 09:47, <quan.xu@intel.com> wrote:
> >>  On February 18, 2016 4:36pm, <JBeulich@suse.com> wrote:
> >> >>> On 18.02.16 at 08:36, <quan.xu@intel.com> wrote:
> >> >>  On February 17, 2016 10:41pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 05.02.16 at 11:18, <quan.xu@intel.com> wrote:
> >> >> > --- a/xen/drivers/passthrough/vtd/qinval.c
> >> >> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> >> >> > +            if ( pci_hide_device(bus, devfn) )
> >> >>
> >> >> But now I'm _really_ puzzled: You acquire the exact lock that
> >> >> pci_hide_device() acquires. Hence, unless I've overlooked an
> >> >> earlier change,
> >> > I
> >> >> can't see this as other than an unconditional dead lock. Did you
> >> >> test this
> >> > code
> >> >> path at all?
> >> >
> >> > Sorry, I didn't test this code path.
> >> > I did test the follows:
> >> >    1) Create domain with ATS device.
> >> >    2) Attach / Detach ATS device.
> >> >
> >> > I think I could add a variation of pci_hide_device(), without
> >> > "spin_lock(&pcidevs_lock) / spin_unlock(&pcidevs_lock)"
> >> > Or "__init".
> >>
> >> Which we already have - _pci_hide_device(). The function would just
> >> need to be made non-static, but would otherwise fit your purposes
> >> since you also don't need the alloc_pdev() the other function does.
> >
> > But the _pci_hide_device() starts with '_', I think it is not a good
> > idea to make it non-static.
> > I'd better introduce a new function to wrap the '_pci_hide_device()'.
> > It is difficult to name the new function.
> 
> Why add a wrapper - just rename the function (I indeed appreciate you not
> wanting to globalize a function whose name starts with an underscore), e.g. to
> pci_hide_device_locked() or pci_hide_existing_device().
>

Okay, I will rename the _pci_hide_device() to pci_hide_existing_device() in v6.
Quan
diff mbox

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 27b3ca7..2d7dc59 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -407,7 +407,7 @@  static void _pci_hide_device(struct pci_dev *pdev)
     list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
 }
 
-int __init pci_hide_device(int bus, int devfn)
+int pci_hide_device(int bus, int devfn)
 {
     struct pci_dev *pdev;
     int rc = -ENOMEM;
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index ec9c513..a129460 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -56,8 +56,12 @@  struct acpi_drhd_unit * find_ats_dev_drhd(struct iommu *iommu);
 
 int ats_device(const struct pci_dev *, const struct acpi_drhd_unit *);
 
-int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
-                         u64 addr, unsigned int size_order, u64 type);
+int dev_invalidate_iotlb(struct iommu *iommu, u16 did, u64 addr,
+                         unsigned int size_order, u64 type,
+                         unsigned int lock);
+int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did,
+                              u16 seg, u8 bus, u8 devfn,
+                              unsigned int lock);
 
 int qinval_device_iotlb(struct iommu *iommu,
                         u32 max_invs_pend, u16 sid, u16 size, u64 addr);
diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index f2e7ffb..76046a7 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -229,6 +229,69 @@  int qinval_device_iotlb(struct iommu *iommu,
     return 0;
 }
 
+static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
+                                         u16 seg, u8 bus, u8 devfn,
+                                         unsigned int lock)
+{
+    struct domain *d = NULL;
+    struct pci_dev *pdev;
+
+    if ( test_bit(did, iommu->domid_bitmap) )
+        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
+
+    if ( d == NULL )
+        return;
+
+    for_each_pdev(d, pdev)
+    {
+        if ( (pdev->seg == seg) &&
+             (pdev->bus == bus) &&
+             (pdev->devfn == devfn) )
+        {
+            ASSERT ( pdev->domain );
+            list_del(&pdev->domain_list);
+            pdev->domain = NULL;
+
+            if ( !(lock & PCIDEVS_LOCK) )
+                spin_lock(&pcidevs_lock);
+
+            if ( pci_hide_device(bus, devfn) )
+            {
+                printk(XENLOG_ERR
+                       "IOMMU hide device %04x:%02x:%02x.%02x error.",
+                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+            }
+
+            if ( !(lock & PCIDEVS_LOCK) )
+                spin_unlock(&pcidevs_lock);
+
+            break;
+        }
+    }
+
+    if ( !is_hardware_domain(d) )
+        domain_crash(d);
+    rcu_unlock_domain(d);
+}
+
+int dev_invalidate_iotlb_sync(struct iommu *iommu, u16 did,
+                              u16 seg, u8 bus, u8 devfn,
+                              unsigned int lock)
+{
+    struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
+    int rc = 0;
+
+    if ( qi_ctrl->qinval_maddr )
+    {
+        rc = queue_invalidate_wait(iommu, 0, 1, 1);
+        if ( rc == -ETIMEDOUT )
+            dev_invalidate_iotlb_timeout(iommu, did,
+                                         seg, bus, devfn, lock);
+    }
+
+    return rc;
+}
+
 static void queue_invalidate_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx)
 {
     unsigned long flags;
@@ -350,9 +413,19 @@  static int flush_iotlb_qi(
         queue_invalidate_iotlb(iommu,
                                type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
                                dw, did, size_order, 0, addr);
-        if ( flush_dev_iotlb )
-            ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
+
+        /*
+         * Before Device-TLB invalidation we need to synchronize
+         * invalidation completions with hardware.
+         */
         rc = invalidate_sync(iommu);
+        if ( rc )
+             return rc;
+
+        if ( flush_dev_iotlb )
+            ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
+                                       type, lock);
+
         if ( !ret )
             ret = rc;
     }
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 7c797f6..dcc3394 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -106,7 +106,7 @@  out:
 }
 
 int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
-    u64 addr, unsigned int size_order, u64 type)
+    u64 addr, unsigned int size_order, u64 type, unsigned int lock)
 {
     struct pci_ats_dev *pdev;
     int ret = 0;
@@ -162,6 +162,18 @@  int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
             return -EOPNOTSUPP;
         }
 
+        /*
+         * Synchronize with hardware for Device-TLB invalidate
+         * descriptor.
+         */
+        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
+                                        pdev->bus, pdev->devfn, lock);
+        if ( ret )
+            printk(XENLOG_ERR
+                   "Flush error %d on device %04x:%02x:%02x.%02x \n",
+                   ret, pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                   PCI_FUNC(pdev->devfn));
+
         if ( !ret )
             ret = rc;
     }