diff mbox series

x86/msi: fix locking for SRIOV devices

Message ID 20240805210956.364624-1-stewart.hildebrand@amd.com (mailing list archive)
State New
Headers show
Series x86/msi: fix locking for SRIOV devices | expand

Commit Message

Stewart Hildebrand Aug. 5, 2024, 9:09 p.m. UTC
In commit 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci
structure") a lock moved from allocate_and_map_msi_pirq() to the caller
and changed from pcidevs_lock() to read_lock(&d->pci_lock). However, one
call path wasn't updated to reflect the change, leading to a failed
assertion observed on debug builds of Xen when an SRIOV device is
present with one or more VFs enabled:

(XEN) Assertion 'd || pcidevs_locked()' failed at drivers/passthrough/pci.c:535
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Not tainted ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040284da8>] R pci_get_pdev+0x4c/0xab
(XEN)    [<ffff82d040344f5c>] F arch/x86/msi.c#read_pci_mem_bar+0x58/0x272
(XEN)    [<ffff82d04034530e>] F arch/x86/msi.c#msix_capability_init+0x198/0x755
(XEN)    [<ffff82d040345dad>] F arch/x86/msi.c#__pci_enable_msix+0x82/0xe8
(XEN)    [<ffff82d0403463e5>] F pci_enable_msi+0x3f/0x78
(XEN)    [<ffff82d04034be2b>] F map_domain_pirq+0x2a4/0x6dc
(XEN)    [<ffff82d04034d4d5>] F allocate_and_map_msi_pirq+0x103/0x262
(XEN)    [<ffff82d04035da5d>] F physdev_map_pirq+0x210/0x259
(XEN)    [<ffff82d04035e798>] F do_physdev_op+0x9c3/0x1454
(XEN)    [<ffff82d040329475>] F pv_hypercall+0x5ac/0x6af
(XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150

Fix it by passing the struct domain pointer to pci_get_pdev() in this
call path.

Fixes: 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci structure")
Reported-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
Candidate for backport to 4.19
---
 xen/arch/x86/msi.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Jan Beulich Aug. 6, 2024, 6:25 a.m. UTC | #1
On 05.08.2024 23:09, Stewart Hildebrand wrote:
> In commit 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci
> structure") a lock moved from allocate_and_map_msi_pirq() to the caller
> and changed from pcidevs_lock() to read_lock(&d->pci_lock). However, one
> call path wasn't updated to reflect the change, leading to a failed
> assertion observed on debug builds of Xen when an SRIOV device is
> present with one or more VFs enabled:
> 
> (XEN) Assertion 'd || pcidevs_locked()' failed at drivers/passthrough/pci.c:535
> (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Not tainted ]----

There must be more to this than just "SR-IOV with VFs enabled"? The
locking change has been there for quite a while, yet the issue was
noticed only know.

> @@ -829,7 +829,8 @@ static int msix_capability_init(struct pci_dev *dev,
>              vf = dev->sbdf.bdf;
>          }
>  
> -        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
> +        table_paddr = read_pci_mem_bar(dev->domain, seg, pbus, pslot, pfunc,
> +                                       bir, vf);

While dev->domain is the owner of dev (seg:bus:slot.func), it may not be
the owner of seg:pbus:pslot.pfunc. Or if it (reliably) is, I'd expect the
guarantees towards that to be spelled out in the description. (Dependency
on ordering of toolstack operations likely wouldn't count as a guarantee
to me. Yet if I'm not mistaken there's a problem in all cases if the VF
was already moved to DomIO, before the DomU is started.)

Further to this, until realizing that the code path leading here is
accessible only for Dom0, I suspected a security angle to this, and hence
didn't respond publicly on Matrix. I noticed, however, that apparently
there's another instance of the same issue (via the other call site of
allocate_and_map_msi_pirq(), i.e. from vpci_msix_arch_enable_entry()
through vpci_msi_enable()). Imo that wants dealing with at the same time
(potentially requiring a different overall approach).

The code path leading here being accessible only for Dom0 likely is also
a mistake (read: overly strict). In principle it ought to be possible to
move a PF to a driver domain, for the VFs to then be assigned to
individual DomU-s. In such a case I expect it shouldn't be Dom0 to invoke
PHYSDEVOP_prepare_msix.

Jan
Stewart Hildebrand Aug. 7, 2024, 5:14 a.m. UTC | #2
On 8/6/24 02:25, Jan Beulich wrote:
> On 05.08.2024 23:09, Stewart Hildebrand wrote:
>> In commit 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci
>> structure") a lock moved from allocate_and_map_msi_pirq() to the caller
>> and changed from pcidevs_lock() to read_lock(&d->pci_lock). However, one
>> call path wasn't updated to reflect the change, leading to a failed
>> assertion observed on debug builds of Xen when an SRIOV device is
>> present with one or more VFs enabled:
>>
>> (XEN) Assertion 'd || pcidevs_locked()' failed at drivers/passthrough/pci.c:535
>> (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Not tainted ]----
> 
> There must be more to this than just "SR-IOV with VFs enabled"? The
> locking change has been there for quite a while, yet the issue was
> noticed only know.

Yes, the conditions as far as I can tell are:
* PV dom0
* Debug build (debug=y) only
* There is a SR-IOV device in the system with one or more VFs enabled/configured
* Dom0 has loaded the driver for the VF and enabled MSI-X

The assert triggers when dom0 loads the VF driver and enables MSI-X
before PHYSDEVOP_prepare_msix is invoked.

>> @@ -829,7 +829,8 @@ static int msix_capability_init(struct pci_dev *dev,
>>              vf = dev->sbdf.bdf;
>>          }
>>  
>> -        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
>> +        table_paddr = read_pci_mem_bar(dev->domain, seg, pbus, pslot, pfunc,
>> +                                       bir, vf);
> 
> While dev->domain is the owner of dev (seg:bus:slot.func), it may not be
> the owner of seg:pbus:pslot.pfunc. Or if it (reliably) is, I'd expect the
> guarantees towards that to be spelled out in the description. (Dependency
> on ordering of toolstack operations likely wouldn't count as a guarantee
> to me. Yet if I'm not mistaken there's a problem in all cases if the VF
> was already moved to DomIO, before the DomU is started.)

On the possibility that they're not the same (now or in the future),
disregard this patch.

> Further to this, until realizing that the code path leading here is
> accessible only for Dom0, I suspected a security angle to this, and hence
> didn't respond publicly on Matrix. I noticed, however, that apparently
> there's another instance of the same issue (via the other call site of
> allocate_and_map_msi_pirq(), i.e. from vpci_msix_arch_enable_entry()
> through vpci_msi_enable()). Imo that wants dealing with at the same time
> (potentially requiring a different overall approach).

There's a simpler way to solve this: the VF doesn't strictly need to
obtain the pdev of its associated PF, so the call to pci_get_pdev() can
be removed from read_pci_mem_bar(). Patch to follow...

> The code path leading here being accessible only for Dom0 likely is also
> a mistake (read: overly strict). In principle it ought to be possible to
> move a PF to a driver domain, for the VFs to then be assigned to
> individual DomU-s. In such a case I expect it shouldn't be Dom0 to invoke
> PHYSDEVOP_prepare_msix.
> 
> Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 0c97fbb3fc03..95a48615a1a4 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -662,7 +662,8 @@  static int msi_capability_init(struct pci_dev *dev,
     return 0;
 }
 
-static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
+static u64 read_pci_mem_bar(struct domain *d, u16 seg, u8 bus, u8 slot, u8 func,
+                            u8 bir, int vf)
 {
     u8 limit;
     u32 addr, base = PCI_BASE_ADDRESS_0;
@@ -670,8 +671,7 @@  static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
 
     if ( vf >= 0 )
     {
-        struct pci_dev *pdev = pci_get_pdev(NULL,
-                                            PCI_SBDF(seg, bus, slot, func));
+        struct pci_dev *pdev = pci_get_pdev(d, PCI_SBDF(seg, bus, slot, func));
         unsigned int pos;
         uint16_t ctrl, num_vf, offset, stride;
 
@@ -829,7 +829,8 @@  static int msix_capability_init(struct pci_dev *dev,
             vf = dev->sbdf.bdf;
         }
 
-        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        table_paddr = read_pci_mem_bar(dev->domain, seg, pbus, pslot, pfunc,
+                                       bir, vf);
         WARN_ON(msi && msi->table_base != table_paddr);
         if ( !table_paddr )
         {
@@ -852,7 +853,8 @@  static int msix_capability_init(struct pci_dev *dev,
 
         pba_offset = pci_conf_read32(dev->sbdf, msix_pba_offset_reg(pos));
         bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
-        pba_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        pba_paddr = read_pci_mem_bar(dev->domain, seg, pbus, pslot, pfunc, bir,
+                                     vf);
         WARN_ON(!pba_paddr);
         pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;