diff mbox series

[v12,11/15] vpci: add initial support for virtual PCI bus topology

Message ID 20240109215145.430207-12-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Stewart Hildebrand Jan. 9, 2024, 9:51 p.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
In v11:
- Fixed code formatting
- Removed bogus write_unlock() call
- Fixed type for new_dev_number
In v10:
- Removed ASSERT(pcidevs_locked())
- Removed redundant code (local sbdf variable, clearing sbdf during
device removal, etc)
- Added __maybe_unused attribute to "out:" label
- Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
  first patch where it is used (previously was in "vpci: add hooks for
  PCI device assign/de-assign")
In v9:
- Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
In v8:
- Added write lock in add_virtual_device
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/Kconfig     |  4 +++
 xen/drivers/vpci/vpci.c | 57 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h |  8 ++++++
 xen/include/xen/vpci.h  | 11 ++++++++
 4 files changed, 80 insertions(+)

Comments

George Dunlap Jan. 12, 2024, 11:46 a.m. UTC | #1
On Tue, Jan 9, 2024 at 9:54 PM Stewart Hildebrand
<stewart.hildebrand@amd.com> wrote:
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 37f5922f3206..b58a822847be 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -484,6 +484,14 @@ struct domain
>       * 2. pdev->vpci->lock
>       */
>      rwlock_t pci_lock;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * The bitmap which shows which device numbers are already used by the
> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
> +     * next passed through virtual PCI device.
> +     */
> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
> +#endif
>  #endif

Without digging through the whole series, how big do we expect this
bitmap to be on typical systems?

If it's only going to be a handful of bytes, keeping it around for all
guests would be OK; but it's large, it would be better as a pointer,
since it's unused on the vast majority of guests.

 -George
Stewart Hildebrand Jan. 12, 2024, 1:50 p.m. UTC | #2
On 1/12/24 06:46, George Dunlap wrote:
> On Tue, Jan 9, 2024 at 9:54 PM Stewart Hildebrand
> <stewart.hildebrand@amd.com> wrote:
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 37f5922f3206..b58a822847be 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -484,6 +484,14 @@ struct domain
>>       * 2. pdev->vpci->lock
>>       */
>>      rwlock_t pci_lock;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * The bitmap which shows which device numbers are already used by the
>> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
>> +     * next passed through virtual PCI device.
>> +     */
>> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
>> +#endif
>>  #endif
> 
> Without digging through the whole series, how big do we expect this
> bitmap to be on typical systems?
> 
> If it's only going to be a handful of bytes, keeping it around for all
> guests would be OK; but it's large, it would be better as a pointer,
> since it's unused on the vast majority of guests.

Since the bitmap is an unsigned long type it will typically be 8 bytes, although only 4 bytes are actually used. VPCI_MAX_VIRT_DEV is currently fixed at 32, as we are only tracking D (not the whole SBDF) in the bitmap so far.
George Dunlap Jan. 15, 2024, 11:48 a.m. UTC | #3
On Fri, Jan 12, 2024 at 1:50 PM Stewart Hildebrand
<stewart.hildebrand@amd.com> wrote:
>
> On 1/12/24 06:46, George Dunlap wrote:
> > On Tue, Jan 9, 2024 at 9:54 PM Stewart Hildebrand
> > <stewart.hildebrand@amd.com> wrote:
> >> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> >> index 37f5922f3206..b58a822847be 100644
> >> --- a/xen/include/xen/sched.h
> >> +++ b/xen/include/xen/sched.h
> >> @@ -484,6 +484,14 @@ struct domain
> >>       * 2. pdev->vpci->lock
> >>       */
> >>      rwlock_t pci_lock;
> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> >> +    /*
> >> +     * The bitmap which shows which device numbers are already used by the
> >> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
> >> +     * next passed through virtual PCI device.
> >> +     */
> >> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
> >> +#endif
> >>  #endif
> >
> > Without digging through the whole series, how big do we expect this
> > bitmap to be on typical systems?
> >
> > If it's only going to be a handful of bytes, keeping it around for all
> > guests would be OK; but it's large, it would be better as a pointer,
> > since it's unused on the vast majority of guests.
>
> Since the bitmap is an unsigned long type it will typically be 8 bytes, although only 4 bytes are actually used. VPCI_MAX_VIRT_DEV is currently fixed at 32, as we are only tracking D (not the whole SBDF) in the bitmap so far.

OK, that's fine with me.  (FYI I replied because I thought you needed
my ack specifically for sched.h; looks like any of THE REST will do,
however.)

 -George
Jan Beulich Jan. 25, 2024, 4 p.m. UTC | #4
On 09.01.2024 22:51, Stewart Hildebrand wrote:
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>  	bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> +	bool
> +	depends on HAS_VPCI

Wouldn't this better be "select", or even just "imply"?

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static int add_virtual_device(struct pci_dev *pdev)
> +{
> +    struct domain *d = pdev->domain;
> +    unsigned int new_dev_number;
> +
> +    if ( is_hardware_domain(d) )
> +        return 0;
> +
> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> +
> +    /*
> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
> +     * there are multi-function ones which are not yet supported.
> +     */
> +    if ( pdev->info.is_extfn && !pdev->info.is_virtfn )
> +    {
> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> +                 &pdev->sbdf);

The message suggests you ought to check pdev->devfn to have the low
three bits clear. Yet what you check are two booleans.

Further doesn't this require the multi-function bit to be emulated
clear? And finally don't you then also need to disallow assignment of
devices with phantom functions?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -484,6 +484,14 @@ struct domain
>       * 2. pdev->vpci->lock
>       */
>      rwlock_t pci_lock;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * The bitmap which shows which device numbers are already used by the
> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
> +     * next passed through virtual PCI device.
> +     */
> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
> +#endif
>  #endif

With this the 2nd #endif would likely better gain a comment.

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>  
>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>  
> +/*
> + * Maximum number of devices supported by the virtual bus topology:
> + * each PCI bus supports 32 devices/slots at max or up to 256 when
> + * there are multi-function ones which are not yet supported.
> + */
> +#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)

The limit being this means only bus 0 / seg 0 is supported, which I
think the comment would better also say. (In add_virtual_device(),
which has a similar comment, there's then at least a 2nd one saying
so.)

Jan
Stewart Hildebrand Feb. 2, 2024, 3:30 a.m. UTC | #5
On 1/25/24 11:00, Jan Beulich wrote:
> On 09.01.2024 22:51, Stewart Hildebrand wrote:
>> --- a/xen/drivers/Kconfig
>> +++ b/xen/drivers/Kconfig
>> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>>  config HAS_VPCI
>>  	bool
>>  
>> +config HAS_VPCI_GUEST_SUPPORT
>> +	bool
>> +	depends on HAS_VPCI
> 
> Wouldn't this better be "select", or even just "imply"?

I prefer "select"

> 
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>  extern vpci_register_init_t *const __end_vpci_array[];
>>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>  
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +static int add_virtual_device(struct pci_dev *pdev)
>> +{
>> +    struct domain *d = pdev->domain;
>> +    unsigned int new_dev_number;
>> +
>> +    if ( is_hardware_domain(d) )
>> +        return 0;
>> +
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> +    /*
>> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
>> +     * there are multi-function ones which are not yet supported.
>> +     */
>> +    if ( pdev->info.is_extfn && !pdev->info.is_virtfn )
>> +    {
>> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
>> +                 &pdev->sbdf);
> 
> The message suggests you ought to check pdev->devfn to have the low
> three bits clear. Yet what you check are two booleans.

I'll check pdev->sbdf.fn

> 
> Further doesn't this require the multi-function bit to be emulated
> clear?

I consider this to be future work. The header type register, where the
bit resides, is not yet emulated for domUs. I have a series in the works
for emulating additional registers (including PCI_HEADER_TYPE), but I'm
planning to wait to submit it until after the current series is finished
so as to not delay the current series any further.

> And finally don't you then also need to disallow assignment of
> devices with phantom functions?

No, I don't think so. My understanding is that there is no configuration
space associated with the phantom SBDFs. There's no special handling
required in vPCI per se, because the phantom function RIDs get mapped
in the IOMMU when the device gets assigned. Future work would include
exposing the PCI Express Capability, including device control register
with the phantom function enable bit. I say this having only done
limited testing of phantom functions on ARM, and by faking it using the
pci-phantom= Xen arg because I don't have a real device with phantom
function capability.

> 
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -484,6 +484,14 @@ struct domain
>>       * 2. pdev->vpci->lock
>>       */
>>      rwlock_t pci_lock;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * The bitmap which shows which device numbers are already used by the
>> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
>> +     * next passed through virtual PCI device.
>> +     */
>> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
>> +#endif
>>  #endif
> 
> With this the 2nd #endif would likely better gain a comment.

I will add it. Actually, I see no harm in adding a comment for both of
these #endifs.

> 
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>>  
>>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>>  
>> +/*
>> + * Maximum number of devices supported by the virtual bus topology:
>> + * each PCI bus supports 32 devices/slots at max or up to 256 when
>> + * there are multi-function ones which are not yet supported.
>> + */
>> +#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
> 
> The limit being this means only bus 0 / seg 0 is supported, which I
> think the comment would better also say. (In add_virtual_device(),
> which has a similar comment, there's then at least a 2nd one saying
> so.)

OK, I'll adjust the comment.
diff mbox series

Patch

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..780490cf8e39 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@  source "drivers/video/Kconfig"
 config HAS_VPCI
 	bool
 
+config HAS_VPCI_GUEST_SUPPORT
+	bool
+	depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index a0e8b1012509..57cfabfd9ad3 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -40,6 +40,49 @@  extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    unsigned int new_dev_number;
+
+    if ( is_hardware_domain(d) )
+        return 0;
+
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    /*
+     * Each PCI bus supports 32 devices/slots at max or up to 256 when
+     * there are multi-function ones which are not yet supported.
+     */
+    if ( pdev->info.is_extfn && !pdev->info.is_virtfn )
+    {
+        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+                 &pdev->sbdf);
+        return -EOPNOTSUPP;
+    }
+    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+                                         VPCI_MAX_VIRT_DEV);
+    if ( new_dev_number == VPCI_MAX_VIRT_DEV )
+        return -ENOSPC;
+
+    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+    /*
+     * Both segment and bus number are 0:
+     *  - we emulate a single host bridge for the guest, e.g. segment 0
+     *  - with bus 0 the virtual devices are seen as embedded
+     *    endpoints behind the root complex
+     *
+     * TODO: add support for multi-function devices.
+     */
+    pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
+
+    return 0;
+}
+
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -49,6 +92,12 @@  void vpci_deassign_device(struct pci_dev *pdev)
     if ( !has_vpci(pdev->domain) || !pdev->vpci )
         return;
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
+        __clear_bit(pdev->vpci->guest_sbdf.dev,
+                    &pdev->domain->vpci_dev_assigned_map);
+#endif
+
     spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
     {
@@ -105,6 +154,13 @@  int vpci_assign_device(struct pci_dev *pdev)
     INIT_LIST_HEAD(&pdev->vpci->handlers);
     spin_lock_init(&pdev->vpci->lock);
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    pdev->vpci->guest_sbdf.sbdf = ~0;
+    rc = add_virtual_device(pdev);
+    if ( rc )
+        goto out;
+#endif
+
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
         rc = __start_vpci_array[i](pdev);
@@ -112,6 +168,7 @@  int vpci_assign_device(struct pci_dev *pdev)
             break;
     }
 
+ out: __maybe_unused;
     if ( rc )
         vpci_deassign_device(pdev);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f5922f3206..b58a822847be 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -484,6 +484,14 @@  struct domain
      * 2. pdev->vpci->lock
      */
     rwlock_t pci_lock;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * The bitmap which shows which device numbers are already used by the
+     * virtual PCI bus topology and is used to assign a unique SBDF to the
+     * next passed through virtual PCI device.
+     */
+    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
+#endif
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 77320a667e55..053467f04982 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -21,6 +21,13 @@  typedef int vpci_register_init_t(struct pci_dev *dev);
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
+/*
+ * Maximum number of devices supported by the virtual bus topology:
+ * each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
+
 #define REGISTER_VPCI_INIT(x, p)                \
   static vpci_register_init_t *const x##_entry  \
                __used_section(".data.vpci." p) = x
@@ -175,6 +182,10 @@  struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /* Guest SBDF of the device. */
+    pci_sbdf_t guest_sbdf;
+#endif
 #endif
 };