diff mbox series

[v3,10/11] vpci: Add initial support for virtual PCI bus topology

Message ID 20210930075223.860329-11-andr2000@gmail.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Oleksandr Andrushchenko Sept. 30, 2021, 7:52 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/common/domain.c           |  3 ++
 xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c       | 14 +++++++-
 xen/include/xen/pci.h         | 22 +++++++++++++
 xen/include/xen/sched.h       |  8 +++++
 5 files changed, 106 insertions(+), 1 deletion(-)

Comments

Jan Beulich Sept. 30, 2021, 8:51 a.m. UTC | #1
On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.

This reads a little odd: If bus is already known (and I think you imply
segment to also be known), it's only DF which get assigned.

> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.

Or up to 256 when there are multi-function ones. Imo you at least want
to spell out how that case is intended to be handled (even if maybe
the code doesn't cover that case yet, in which case a respective code
comment would also want leaving).

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      return ret;
>  }
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT

May I ask why the code enclosed by this conditional has been put here
rather than under drivers/vpci/?

> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
> +                                                const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +        if ( vdev->pdev == pdev )
> +            return vdev;
> +    return NULL;
> +}

No locking here or ...

> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    ASSERT(!pci_find_virtual_device(d, pdev));

... in this first caller that I've managed to spot? See also below as
to up the call tree the pcidevs-lock being held (which at the very
least you would then want to ASSERT() for here).

> +    /* Each PCI bus supports 32 devices/slots at max. */
> +    if ( d->vpci_dev_next > 31 )
> +        return -ENOSPC;

Please avoid open-coding literals when they can be suitably expressed.

> +    vdev = xzalloc(struct vpci_dev);
> +    if ( !vdev )
> +        return -ENOMEM;
> +
> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
> +    vdev->seg = 0;
> +
> +    /*
> +     * The bus number is set to 0, so virtual devices are seen
> +     * as embedded endpoints behind the root complex.
> +     */
> +    vdev->bus = 0;

Strictly speaking both of these assignments are redundant with you
using xzalloc(). I'd prefer if there was just a comment, as the compiler
has no way recognizing this in order to eliminate these stores.

> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
> +
> +    vdev->pdev = pdev;
> +    vdev->domain = d;
> +
> +    pcidevs_lock();
> +    list_add_tail(&vdev->list, &d->vdev_list);
> +    pcidevs_unlock();

I don't support a global lock getting (ab)used for per-domain list
management.

Apart from that I don't understand why you acquire the lock here. Does
that mean the functions further were truly left without any locking,
by you not having noticed that this lock is already being held by the
sole caller?

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to assign for hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return vpci_bar_add_handlers(d, dev);
> +    rc = vpci_bar_add_handlers(d, dev);
> +    if ( rc )
> +        return rc;
> +
> +    return pci_add_virtual_device(d, dev);
>  }

I've peeked at the earlier patch, and both there and here I'm struggling to
see how undoing partially completed steps or fully completed earlier steps
is intended to work. I'm not convinced it is legitimate to leave handlers
in place until the tool stack manages to roll back the failed device
assignment.

>  /* Notify vPCI that device is de-assigned from guest. */
>  int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to de-assign from hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> +    rc = pci_remove_virtual_device(d, dev);
> +    if ( rc )
> +        return rc;
> +
>      return vpci_bar_remove_handlers(d, dev);
>  }

So what's the ultimate effect of a partially de-assigned device, where
one of the later steps failed? In a number of places we do best-effort
full cleanup, by recording errors but nevertheless continuing with
subsequent cleanup steps. I wonder whether that's a model to use here
as well.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -137,6 +137,24 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +struct vpci_dev {
> +    struct list_head list;
> +    /* Physical PCI device this virtual device is connected to. */
> +    const struct pci_dev *pdev;
> +    /* Virtual SBDF of the device. */
> +    union {
> +        struct {
> +            uint8_t devfn;
> +            uint8_t bus;
> +            uint16_t seg;
> +        };
> +        pci_sbdf_t sbdf;

Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
providing all you need? By putting it in a union with a custom
struct you set yourself up for things going out of sync if anyone
chose to alter pci_sbdf_t's layout.

> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>  int pci_add_device(u16 seg, u8 bus, u8 devfn,
>                     const struct pci_dev_info *, nodeid_t node);
>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +#endif

Like for their definitions I question the placement of these
declarations.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -444,6 +444,14 @@ struct domain
>  
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    struct list_head vdev_list;
> +    /*
> +     * Current device number used by the virtual PCI bus topology
> +     * to assign a unique SBDF to a passed through virtual PCI device.
> +     */
> +    int vpci_dev_next;

In how far can the number stored here be negative? If it can't be,
please use unsigned int.

As to the comment - "current" is ambiguous: Is it the number that
was used last, or the next one to be used?

Jan
Oleksandr Andrushchenko Sept. 30, 2021, 9:34 a.m. UTC | #2
On 30.09.21 11:51, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
> This reads a little odd: If bus is already known (and I think you imply
> segment to also be known), it's only DF which get assigned.
But at the end of the day we set all the parts of that SBDF.
Otherwise I should write "Assign DF as we know that bus and segment
are 0"
>
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
> Or up to 256 when there are multi-function ones. Imo you at least want
> to spell out how that case is intended to be handled (even if maybe
> the code doesn't cover that case yet, in which case a respective code
> comment would also want leaving).
We are not supporting multi-function yet, so I'll add a comment.
>
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>       return ret;
>>   }
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> May I ask why the code enclosed by this conditional has been put here
> rather than under drivers/vpci/?
Indeed this can be moved to xen/drivers/vpci/vpci.c.
I'll move and update function names accordingly.
>
>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>> +                                                const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +        if ( vdev->pdev == pdev )
>> +            return vdev;
>> +    return NULL;
>> +}
> No locking here or ...
>
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    ASSERT(!pci_find_virtual_device(d, pdev));
> ... in this first caller that I've managed to spot? See also below as
> to up the call tree the pcidevs-lock being held (which at the very
> least you would then want to ASSERT() for here).
I will move the code to vpci and make sure proper locking there
>
>> +    /* Each PCI bus supports 32 devices/slots at max. */
>> +    if ( d->vpci_dev_next > 31 )
>> +        return -ENOSPC;
> Please avoid open-coding literals when they can be suitably expressed.
I failed to find a suitable constant for that. Could you please point
me to the one I can use here?
>
>> +    vdev = xzalloc(struct vpci_dev);
>> +    if ( !vdev )
>> +        return -ENOMEM;
>> +
>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>> +    vdev->seg = 0;
>> +
>> +    /*
>> +     * The bus number is set to 0, so virtual devices are seen
>> +     * as embedded endpoints behind the root complex.
>> +     */
>> +    vdev->bus = 0;
> Strictly speaking both of these assignments are redundant with you
> using xzalloc(). I'd prefer if there was just a comment, as the compiler
> has no way recognizing this in order to eliminate these stores.
Yes, I just put the assignments to be explicitly seen here as being 0.
I will remove those and put a comment.
>
>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> +
>> +    vdev->pdev = pdev;
>> +    vdev->domain = d;
>> +
>> +    pcidevs_lock();
>> +    list_add_tail(&vdev->list, &d->vdev_list);
>> +    pcidevs_unlock();
> I don't support a global lock getting (ab)used for per-domain list
> management.
>
> Apart from that I don't understand why you acquire the lock here. Does
> that mean the functions further were truly left without any locking,
> by you not having noticed that this lock is already being held by the
> sole caller?
I'll re-work locking with respect to the new location for this, e.g. vpci
>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to assign for hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return vpci_bar_add_handlers(d, dev);
>> +    rc = vpci_bar_add_handlers(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    return pci_add_virtual_device(d, dev);
>>   }
> I've peeked at the earlier patch, and both there and here I'm struggling to
> see how undoing partially completed steps or fully completed earlier steps
> is intended to work. I'm not convinced it is legitimate to leave handlers
> in place until the tool stack manages to roll back the failed device
> assignment.
I'll see what and how we can roll back in case of error
>
>>   /* Notify vPCI that device is de-assigned from guest. */
>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> +    rc = pci_remove_virtual_device(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>>       return vpci_bar_remove_handlers(d, dev);
>>   }
> So what's the ultimate effect of a partially de-assigned device, where
> one of the later steps failed? In a number of places we do best-effort
> full cleanup, by recording errors but nevertheless continuing with
> subsequent cleanup steps. I wonder whether that's a model to use here
> as well.
>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -137,6 +137,24 @@ struct pci_dev {
>>       struct vpci *vpci;
>>   };
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +struct vpci_dev {
>> +    struct list_head list;
>> +    /* Physical PCI device this virtual device is connected to. */
>> +    const struct pci_dev *pdev;
>> +    /* Virtual SBDF of the device. */
>> +    union {
>> +        struct {
>> +            uint8_t devfn;
>> +            uint8_t bus;
>> +            uint16_t seg;
>> +        };
>> +        pci_sbdf_t sbdf;
> Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
> providing all you need? By putting it in a union with a custom
> struct you set yourself up for things going out of sync if anyone
> chose to alter pci_sbdf_t's layout.
Sure, pci_sbdf_t should be enough
>
>> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>>   int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>                      const struct pci_dev_info *, nodeid_t node);
>>   int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +#endif
> Like for their definitions I question the placement of these
> declarations.
Will move to vpci.h
>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -444,6 +444,14 @@ struct domain
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    struct list_head vdev_list;
>> +    /*
>> +     * Current device number used by the virtual PCI bus topology
>> +     * to assign a unique SBDF to a passed through virtual PCI device.
>> +     */
>> +    int vpci_dev_next;
> In how far can the number stored here be negative? If it can't be,
> please use unsigned int.
Will use unsigned
>
> As to the comment - "current" is ambiguous: Is it the number that
> was used last, or the next one to be used?
I will update the comment to remove ambiguity
> Jan
>
Thank you,
Oleksandr
Jan Beulich Sept. 30, 2021, 10:23 a.m. UTC | #3
On 30.09.2021 11:34, Oleksandr Andrushchenko wrote:
> On 30.09.21 11:51, Jan Beulich wrote:
>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>       return ret;
>>>   }
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> May I ask why the code enclosed by this conditional has been put here
>> rather than under drivers/vpci/?
> Indeed this can be moved to xen/drivers/vpci/vpci.c.
> I'll move and update function names accordingly.
>>
>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>> +                                                const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +        if ( vdev->pdev == pdev )
>>> +            return vdev;
>>> +    return NULL;
>>> +}
>> No locking here or ...
>>
>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>> ... in this first caller that I've managed to spot? See also below as
>> to up the call tree the pcidevs-lock being held (which at the very
>> least you would then want to ASSERT() for here).
> I will move the code to vpci and make sure proper locking there
>>
>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>> +    if ( d->vpci_dev_next > 31 )
>>> +        return -ENOSPC;
>> Please avoid open-coding literals when they can be suitably expressed.
> I failed to find a suitable constant for that. Could you please point
> me to the one I can use here?

I wasn't hinting at a constant, but at an expression. If you grep, you
will find e.g. at least one instance of PCI_FUNC(~0); I'd suggest to
use PCI_SLOT(~0) here. (My rule of thumb is: Before I write a literal
number anywhere outside of a #define, and not 0 or 1 or some such
starting a loop, I try to think hard how that number can instead be
expressed. Such expressions then often also serve as documentation for
what the number actually means, helping future readers.)

Jan
Oleksandr Andrushchenko Sept. 30, 2021, 10:26 a.m. UTC | #4
On 30.09.21 13:23, Jan Beulich wrote:
> On 30.09.2021 11:34, Oleksandr Andrushchenko wrote:
>> On 30.09.21 11:51, Jan Beulich wrote:
>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>>        return ret;
>>>>    }
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> May I ask why the code enclosed by this conditional has been put here
>>> rather than under drivers/vpci/?
>> Indeed this can be moved to xen/drivers/vpci/vpci.c.
>> I'll move and update function names accordingly.
>>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>>> +                                                const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +        if ( vdev->pdev == pdev )
>>>> +            return vdev;
>>>> +    return NULL;
>>>> +}
>>> No locking here or ...
>>>
>>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>> ... in this first caller that I've managed to spot? See also below as
>>> to up the call tree the pcidevs-lock being held (which at the very
>>> least you would then want to ASSERT() for here).
>> I will move the code to vpci and make sure proper locking there
>>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>>> +    if ( d->vpci_dev_next > 31 )
>>>> +        return -ENOSPC;
>>> Please avoid open-coding literals when they can be suitably expressed.
>> I failed to find a suitable constant for that. Could you please point
>> me to the one I can use here?
> I wasn't hinting at a constant, but at an expression. If you grep, you
> will find e.g. at least one instance of PCI_FUNC(~0); I'd suggest to
> use PCI_SLOT(~0) here.
Great, will use this. It is indeed does the job in a clear way.
Thank you!!
>   (My rule of thumb is: Before I write a literal
> number anywhere outside of a #define, and not 0 or 1 or some such
> starting a loop, I try to think hard how that number can instead be
> expressed. Such expressions then often also serve as documentation for
> what the number actually means, helping future readers.)
Sounds good
> Jan
>
>
Thank you,
Oleksandr
Roger Pau Monné Oct. 26, 2021, 11:33 a.m. UTC | #5
On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>     functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/common/domain.c           |  3 ++
>  xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c       | 14 +++++++-
>  xen/include/xen/pci.h         | 22 +++++++++++++
>  xen/include/xen/sched.h       |  8 +++++
>  5 files changed, 106 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 40d67ec34232..e0170087612d 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    INIT_LIST_HEAD(&d->vdev_list);
> +#endif
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 805ab86ed555..5b963d75d1ba 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      return ret;
>  }
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
> +                                                const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +        if ( vdev->pdev == pdev )
> +            return vdev;
> +    return NULL;
> +}
> +
> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    ASSERT(!pci_find_virtual_device(d, pdev));
> +
> +    /* Each PCI bus supports 32 devices/slots at max. */
> +    if ( d->vpci_dev_next > 31 )
> +        return -ENOSPC;
> +
> +    vdev = xzalloc(struct vpci_dev);
> +    if ( !vdev )
> +        return -ENOMEM;
> +
> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
> +    vdev->seg = 0;
> +
> +    /*
> +     * The bus number is set to 0, so virtual devices are seen
> +     * as embedded endpoints behind the root complex.
> +     */
> +    vdev->bus = 0;
> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);

This would likely be better as a bitmap where you set the bits of
in-use slots. Then you can use find_first_bit or similar to get a free
slot.

Long term you might want to allow the caller to provide a pre-selected
slot, as it's possible for users to request the device to appear at a
specific slot on the emulated bus.

> +
> +    vdev->pdev = pdev;
> +    vdev->domain = d;
> +
> +    pcidevs_lock();
> +    list_add_tail(&vdev->list, &d->vdev_list);
> +    pcidevs_unlock();
> +
> +    return 0;
> +}
> +
> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    pcidevs_lock();
> +    vdev = pci_find_virtual_device(d, pdev);
> +    if ( vdev )
> +        list_del(&vdev->list);
> +    pcidevs_unlock();
> +    xfree(vdev);
> +    return 0;
> +}
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> +
>  /* Caller should hold the pcidevs_lock */
>  static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>                             uint8_t devfn)
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 702f7b5d5dda..d787f13e679e 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to assign for hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return vpci_bar_add_handlers(d, dev);
> +    rc = vpci_bar_add_handlers(d, dev);
> +    if ( rc )
> +        return rc;
> +
> +    return pci_add_virtual_device(d, dev);
>  }
>  
>  /* Notify vPCI that device is de-assigned from guest. */
>  int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to de-assign from hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> +    rc = pci_remove_virtual_device(d, dev);
> +    if ( rc )
> +        return rc;
> +
>      return vpci_bar_remove_handlers(d, dev);
>  }
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 43b8a0817076..33033a3a8f8d 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -137,6 +137,24 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +struct vpci_dev {
> +    struct list_head list;
> +    /* Physical PCI device this virtual device is connected to. */
> +    const struct pci_dev *pdev;
> +    /* Virtual SBDF of the device. */
> +    union {
> +        struct {
> +            uint8_t devfn;
> +            uint8_t bus;
> +            uint16_t seg;
> +        };
> +        pci_sbdf_t sbdf;
> +    };
> +    struct domain *domain;
> +};
> +#endif

I wonder whether this is strictly needed. Won't it be enough to store
the virtual (ie: guest) sbdf inside the existing vpci struct?

It would avoid the overhead of the translation you do from pdev ->
vdev, and there doesn't seem to be anything relevant stored in
vpci_dev apart from the virtual sbdf.

Thanks, Roger.
Oleksandr Andrushchenko Nov. 3, 2021, 6:34 a.m. UTC | #6
Hi, Roger

On 26.10.21 14:33, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Since v2:
>>   - remove casts that are (a) malformed and (b) unnecessary
>>   - add new line for better readability
>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>      functions are now completely gated with this config
>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>> New in v2
>> ---
>>   xen/common/domain.c           |  3 ++
>>   xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>   xen/drivers/vpci/vpci.c       | 14 +++++++-
>>   xen/include/xen/pci.h         | 22 +++++++++++++
>>   xen/include/xen/sched.h       |  8 +++++
>>   5 files changed, 106 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 40d67ec34232..e0170087612d 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       INIT_LIST_HEAD(&d->pdev_list);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    INIT_LIST_HEAD(&d->vdev_list);
>> +#endif
>>   #endif
>>   
>>       /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 805ab86ed555..5b963d75d1ba 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>       return ret;
>>   }
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>> +                                                const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +        if ( vdev->pdev == pdev )
>> +            return vdev;
>> +    return NULL;
>> +}
>> +
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>> +
>> +    /* Each PCI bus supports 32 devices/slots at max. */
>> +    if ( d->vpci_dev_next > 31 )
>> +        return -ENOSPC;
>> +
>> +    vdev = xzalloc(struct vpci_dev);
>> +    if ( !vdev )
>> +        return -ENOMEM;
>> +
>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>> +    vdev->seg = 0;
>> +
>> +    /*
>> +     * The bus number is set to 0, so virtual devices are seen
>> +     * as embedded endpoints behind the root complex.
>> +     */
>> +    vdev->bus = 0;
>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
> This would likely be better as a bitmap where you set the bits of
> in-use slots. Then you can use find_first_bit or similar to get a free
> slot.
>
> Long term you might want to allow the caller to provide a pre-selected
> slot, as it's possible for users to request the device to appear at a
> specific slot on the emulated bus.
>
>> +
>> +    vdev->pdev = pdev;
>> +    vdev->domain = d;
>> +
>> +    pcidevs_lock();
>> +    list_add_tail(&vdev->list, &d->vdev_list);
>> +    pcidevs_unlock();
>> +
>> +    return 0;
>> +}
>> +
>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    pcidevs_lock();
>> +    vdev = pci_find_virtual_device(d, pdev);
>> +    if ( vdev )
>> +        list_del(&vdev->list);
>> +    pcidevs_unlock();
>> +    xfree(vdev);
>> +    return 0;
>> +}
>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> +
>>   /* Caller should hold the pcidevs_lock */
>>   static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>                              uint8_t devfn)
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 702f7b5d5dda..d787f13e679e 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to assign for hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return vpci_bar_add_handlers(d, dev);
>> +    rc = vpci_bar_add_handlers(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    return pci_add_virtual_device(d, dev);
>>   }
>>   
>>   /* Notify vPCI that device is de-assigned from guest. */
>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> +    rc = pci_remove_virtual_device(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>>       return vpci_bar_remove_handlers(d, dev);
>>   }
>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index 43b8a0817076..33033a3a8f8d 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -137,6 +137,24 @@ struct pci_dev {
>>       struct vpci *vpci;
>>   };
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +struct vpci_dev {
>> +    struct list_head list;
>> +    /* Physical PCI device this virtual device is connected to. */
>> +    const struct pci_dev *pdev;
>> +    /* Virtual SBDF of the device. */
>> +    union {
>> +        struct {
>> +            uint8_t devfn;
>> +            uint8_t bus;
>> +            uint16_t seg;
>> +        };
>> +        pci_sbdf_t sbdf;
>> +    };
>> +    struct domain *domain;
>> +};
>> +#endif
> I wonder whether this is strictly needed. Won't it be enough to store
> the virtual (ie: guest) sbdf inside the existing vpci struct?
>
> It would avoid the overhead of the translation you do from pdev ->
> vdev, and there doesn't seem to be anything relevant stored in
> vpci_dev apart from the virtual sbdf.
TL;DR It seems it might be needed from performance POV. If not implemented
for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
Note: pcidevs' lock is a recursive lock

There are 2 sources of access to virtual devices:
1. During initialization when we add, assign or de-assign a PCI device
2. At run-time when we trap configuration space access and need to
translate virtual SBDF into physical SBDF
3. At least de-assign can run concurrently with MMIO handlers

Now let's see which locks are in use while doing that.

1. No struct vpci_dev is used.
1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
1.2. To protect virtual devices we use pcidevs_{lock|unlock}
1.3. Locking happens on system level

2. struct vpci_dev is used
2.1. We have a per-domain lock vdev_lock
2.2. Locking happens on per domain level

To compare the two:

1. Without vpci_dev
pros: much simpler code
pros/cons: global lock is used during MMIO handling, but it is a recursive lock

2. With vpc_dev
pros: per-domain locking
cons: more code

I have implemented the two methods and we need to decide
which route we go.
> Thanks, Roger.
Thank you,
Oleksandr
Jan Beulich Nov. 3, 2021, 8:41 a.m. UTC | #7
On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
> Hi, Roger
> 
> On 26.10.21 14:33, Roger Pau Monné wrote:
>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>> root complex itself (embedded endpoints).
>>> This implementation is limited to 32 devices which are allowed on
>>> a single PCI bus.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> ---
>>> Since v2:
>>>   - remove casts that are (a) malformed and (b) unnecessary
>>>   - add new line for better readability
>>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>      functions are now completely gated with this config
>>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> New in v2
>>> ---
>>>   xen/common/domain.c           |  3 ++
>>>   xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>   xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>   xen/include/xen/pci.h         | 22 +++++++++++++
>>>   xen/include/xen/sched.h       |  8 +++++
>>>   5 files changed, 106 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 40d67ec34232..e0170087612d 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>   
>>>   #ifdef CONFIG_HAS_PCI
>>>       INIT_LIST_HEAD(&d->pdev_list);
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>> +#endif
>>>   #endif
>>>   
>>>       /* All error paths can depend on the above setup. */
>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>> index 805ab86ed555..5b963d75d1ba 100644
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>       return ret;
>>>   }
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>> +                                                const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +        if ( vdev->pdev == pdev )
>>> +            return vdev;
>>> +    return NULL;
>>> +}
>>> +
>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>> +
>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>> +    if ( d->vpci_dev_next > 31 )
>>> +        return -ENOSPC;
>>> +
>>> +    vdev = xzalloc(struct vpci_dev);
>>> +    if ( !vdev )
>>> +        return -ENOMEM;
>>> +
>>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>>> +    vdev->seg = 0;
>>> +
>>> +    /*
>>> +     * The bus number is set to 0, so virtual devices are seen
>>> +     * as embedded endpoints behind the root complex.
>>> +     */
>>> +    vdev->bus = 0;
>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> This would likely be better as a bitmap where you set the bits of
>> in-use slots. Then you can use find_first_bit or similar to get a free
>> slot.
>>
>> Long term you might want to allow the caller to provide a pre-selected
>> slot, as it's possible for users to request the device to appear at a
>> specific slot on the emulated bus.
>>
>>> +
>>> +    vdev->pdev = pdev;
>>> +    vdev->domain = d;
>>> +
>>> +    pcidevs_lock();
>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>> +    pcidevs_unlock();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    pcidevs_lock();
>>> +    vdev = pci_find_virtual_device(d, pdev);
>>> +    if ( vdev )
>>> +        list_del(&vdev->list);
>>> +    pcidevs_unlock();
>>> +    xfree(vdev);
>>> +    return 0;
>>> +}
>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> +
>>>   /* Caller should hold the pcidevs_lock */
>>>   static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>                              uint8_t devfn)
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 702f7b5d5dda..d787f13e679e 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>   /* Notify vPCI that device is assigned to guest. */
>>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to assign for hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> -    return vpci_bar_add_handlers(d, dev);
>>> +    rc = vpci_bar_add_handlers(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    return pci_add_virtual_device(d, dev);
>>>   }
>>>   
>>>   /* Notify vPCI that device is de-assigned from guest. */
>>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> +    rc = pci_remove_virtual_device(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>>       return vpci_bar_remove_handlers(d, dev);
>>>   }
>>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>> index 43b8a0817076..33033a3a8f8d 100644
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>       struct vpci *vpci;
>>>   };
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +struct vpci_dev {
>>> +    struct list_head list;
>>> +    /* Physical PCI device this virtual device is connected to. */
>>> +    const struct pci_dev *pdev;
>>> +    /* Virtual SBDF of the device. */
>>> +    union {
>>> +        struct {
>>> +            uint8_t devfn;
>>> +            uint8_t bus;
>>> +            uint16_t seg;
>>> +        };
>>> +        pci_sbdf_t sbdf;
>>> +    };
>>> +    struct domain *domain;
>>> +};
>>> +#endif
>> I wonder whether this is strictly needed. Won't it be enough to store
>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>
>> It would avoid the overhead of the translation you do from pdev ->
>> vdev, and there doesn't seem to be anything relevant stored in
>> vpci_dev apart from the virtual sbdf.
> TL;DR It seems it might be needed from performance POV. If not implemented
> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
> Note: pcidevs' lock is a recursive lock
> 
> There are 2 sources of access to virtual devices:
> 1. During initialization when we add, assign or de-assign a PCI device
> 2. At run-time when we trap configuration space access and need to
> translate virtual SBDF into physical SBDF
> 3. At least de-assign can run concurrently with MMIO handlers
> 
> Now let's see which locks are in use while doing that.
> 
> 1. No struct vpci_dev is used.
> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
> 1.3. Locking happens on system level
> 
> 2. struct vpci_dev is used
> 2.1. We have a per-domain lock vdev_lock
> 2.2. Locking happens on per domain level
> 
> To compare the two:
> 
> 1. Without vpci_dev
> pros: much simpler code
> pros/cons: global lock is used during MMIO handling, but it is a recursive lock

Could you point out to me in which way the recursive nature of the lock
is relevant here? Afaict that aspect is of no interest when considering
the performance effects of using a global lock vs one with more narrow
scope.

Jan
Roger Pau Monné Nov. 3, 2021, 8:52 a.m. UTC | #8
On Wed, Nov 03, 2021 at 06:34:16AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger
> 
> On 26.10.21 14:33, Roger Pau Monné wrote:
> > On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
> >> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> >> index 43b8a0817076..33033a3a8f8d 100644
> >> --- a/xen/include/xen/pci.h
> >> +++ b/xen/include/xen/pci.h
> >> @@ -137,6 +137,24 @@ struct pci_dev {
> >>       struct vpci *vpci;
> >>   };
> >>   
> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> >> +struct vpci_dev {
> >> +    struct list_head list;
> >> +    /* Physical PCI device this virtual device is connected to. */
> >> +    const struct pci_dev *pdev;
> >> +    /* Virtual SBDF of the device. */
> >> +    union {
> >> +        struct {
> >> +            uint8_t devfn;
> >> +            uint8_t bus;
> >> +            uint16_t seg;
> >> +        };
> >> +        pci_sbdf_t sbdf;
> >> +    };
> >> +    struct domain *domain;
> >> +};
> >> +#endif
> > I wonder whether this is strictly needed. Won't it be enough to store
> > the virtual (ie: guest) sbdf inside the existing vpci struct?
> >
> > It would avoid the overhead of the translation you do from pdev ->
> > vdev, and there doesn't seem to be anything relevant stored in
> > vpci_dev apart from the virtual sbdf.
> TL;DR It seems it might be needed from performance POV. If not implemented
> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
> Note: pcidevs' lock is a recursive lock
> 
> There are 2 sources of access to virtual devices:
> 1. During initialization when we add, assign or de-assign a PCI device
> 2. At run-time when we trap configuration space access and need to
> translate virtual SBDF into physical SBDF
> 3. At least de-assign can run concurrently with MMIO handlers
> 
> Now let's see which locks are in use while doing that.
> 
> 1. No struct vpci_dev is used.
> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
> 1.3. Locking happens on system level
> 
> 2. struct vpci_dev is used
> 2.1. We have a per-domain lock vdev_lock
> 2.2. Locking happens on per domain level
> 
> To compare the two:
> 
> 1. Without vpci_dev
> pros: much simpler code
> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
> 
> 2. With vpc_dev
> pros: per-domain locking
> cons: more code
> 
> I have implemented the two methods and we need to decide
> which route we go.

We could always see about converting the pcidevs lock into a rw one if
it turns out there's too much contention. PCI config space accesses
shouldn't be that common or performance critical, so having some
contention might not be noticeable.

TBH I would start with the simpler solution (add guest_sbdf and use
pci lock) and move to something more complex once issues are
identified.

Regards, Roger.
Oleksandr Andrushchenko Nov. 3, 2021, 8:57 a.m. UTC | #9
On 03.11.21 10:41, Jan Beulich wrote:
> On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
>> Hi, Roger
>>
>> On 26.10.21 14:33, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>>> root complex itself (embedded endpoints).
>>>> This implementation is limited to 32 devices which are allowed on
>>>> a single PCI bus.
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> ---
>>>> Since v2:
>>>>    - remove casts that are (a) malformed and (b) unnecessary
>>>>    - add new line for better readability
>>>>    - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>>       functions are now completely gated with this config
>>>>    - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> New in v2
>>>> ---
>>>>    xen/common/domain.c           |  3 ++
>>>>    xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>>    xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>>    xen/include/xen/pci.h         | 22 +++++++++++++
>>>>    xen/include/xen/sched.h       |  8 +++++
>>>>    5 files changed, 106 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>> index 40d67ec34232..e0170087612d 100644
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>>    
>>>>    #ifdef CONFIG_HAS_PCI
>>>>        INIT_LIST_HEAD(&d->pdev_list);
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>>> +#endif
>>>>    #endif
>>>>    
>>>>        /* All error paths can depend on the above setup. */
>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>>> index 805ab86ed555..5b963d75d1ba 100644
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>>        return ret;
>>>>    }
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>>> +                                                const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +        if ( vdev->pdev == pdev )
>>>> +            return vdev;
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>>> +
>>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>>> +    if ( d->vpci_dev_next > 31 )
>>>> +        return -ENOSPC;
>>>> +
>>>> +    vdev = xzalloc(struct vpci_dev);
>>>> +    if ( !vdev )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>>>> +    vdev->seg = 0;
>>>> +
>>>> +    /*
>>>> +     * The bus number is set to 0, so virtual devices are seen
>>>> +     * as embedded endpoints behind the root complex.
>>>> +     */
>>>> +    vdev->bus = 0;
>>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>>> This would likely be better as a bitmap where you set the bits of
>>> in-use slots. Then you can use find_first_bit or similar to get a free
>>> slot.
>>>
>>> Long term you might want to allow the caller to provide a pre-selected
>>> slot, as it's possible for users to request the device to appear at a
>>> specific slot on the emulated bus.
>>>
>>>> +
>>>> +    vdev->pdev = pdev;
>>>> +    vdev->domain = d;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>>> +    pcidevs_unlock();
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    vdev = pci_find_virtual_device(d, pdev);
>>>> +    if ( vdev )
>>>> +        list_del(&vdev->list);
>>>> +    pcidevs_unlock();
>>>> +    xfree(vdev);
>>>> +    return 0;
>>>> +}
>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> +
>>>>    /* Caller should hold the pcidevs_lock */
>>>>    static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>>                               uint8_t devfn)
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 702f7b5d5dda..d787f13e679e 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>>    /* Notify vPCI that device is assigned to guest. */
>>>>    int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to assign for hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> -    return vpci_bar_add_handlers(d, dev);
>>>> +    rc = vpci_bar_add_handlers(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>> +    return pci_add_virtual_device(d, dev);
>>>>    }
>>>>    
>>>>    /* Notify vPCI that device is de-assigned from guest. */
>>>>    int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to de-assign from hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> +    rc = pci_remove_virtual_device(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>>        return vpci_bar_remove_handlers(d, dev);
>>>>    }
>>>>    #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>>> index 43b8a0817076..33033a3a8f8d 100644
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>>        struct vpci *vpci;
>>>>    };
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +struct vpci_dev {
>>>> +    struct list_head list;
>>>> +    /* Physical PCI device this virtual device is connected to. */
>>>> +    const struct pci_dev *pdev;
>>>> +    /* Virtual SBDF of the device. */
>>>> +    union {
>>>> +        struct {
>>>> +            uint8_t devfn;
>>>> +            uint8_t bus;
>>>> +            uint16_t seg;
>>>> +        };
>>>> +        pci_sbdf_t sbdf;
>>>> +    };
>>>> +    struct domain *domain;
>>>> +};
>>>> +#endif
>>> I wonder whether this is strictly needed. Won't it be enough to store
>>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>>
>>> It would avoid the overhead of the translation you do from pdev ->
>>> vdev, and there doesn't seem to be anything relevant stored in
>>> vpci_dev apart from the virtual sbdf.
>> TL;DR It seems it might be needed from performance POV. If not implemented
>> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
>> Note: pcidevs' lock is a recursive lock
>>
>> There are 2 sources of access to virtual devices:
>> 1. During initialization when we add, assign or de-assign a PCI device
>> 2. At run-time when we trap configuration space access and need to
>> translate virtual SBDF into physical SBDF
>> 3. At least de-assign can run concurrently with MMIO handlers
>>
>> Now let's see which locks are in use while doing that.
>>
>> 1. No struct vpci_dev is used.
>> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
>> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
>> 1.3. Locking happens on system level
>>
>> 2. struct vpci_dev is used
>> 2.1. We have a per-domain lock vdev_lock
>> 2.2. Locking happens on per domain level
>>
>> To compare the two:
>>
>> 1. Without vpci_dev
>> pros: much simpler code
>> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
> Could you point out to me in which way the recursive nature of the lock
> is relevant here? Afaict that aspect is of no interest when considering
> the performance effects of using a global lock vs one with more narrow
> scope.
I just tried to find some excuses and defend pcidev's global lock,
so even lock's recursion could be an argument here. Weak.
Besides that I do agree that this is still a global lock.
>
> Jan
>
Oleksandr Andrushchenko Nov. 3, 2021, 8:59 a.m. UTC | #10
On 03.11.21 10:52, Roger Pau Monné wrote:
> On Wed, Nov 03, 2021 at 06:34:16AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger
>>
>> On 26.10.21 14:33, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>>> index 43b8a0817076..33033a3a8f8d 100644
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>>        struct vpci *vpci;
>>>>    };
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +struct vpci_dev {
>>>> +    struct list_head list;
>>>> +    /* Physical PCI device this virtual device is connected to. */
>>>> +    const struct pci_dev *pdev;
>>>> +    /* Virtual SBDF of the device. */
>>>> +    union {
>>>> +        struct {
>>>> +            uint8_t devfn;
>>>> +            uint8_t bus;
>>>> +            uint16_t seg;
>>>> +        };
>>>> +        pci_sbdf_t sbdf;
>>>> +    };
>>>> +    struct domain *domain;
>>>> +};
>>>> +#endif
>>> I wonder whether this is strictly needed. Won't it be enough to store
>>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>>
>>> It would avoid the overhead of the translation you do from pdev ->
>>> vdev, and there doesn't seem to be anything relevant stored in
>>> vpci_dev apart from the virtual sbdf.
>> TL;DR It seems it might be needed from performance POV. If not implemented
>> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
>> Note: pcidevs' lock is a recursive lock
>>
>> There are 2 sources of access to virtual devices:
>> 1. During initialization when we add, assign or de-assign a PCI device
>> 2. At run-time when we trap configuration space access and need to
>> translate virtual SBDF into physical SBDF
>> 3. At least de-assign can run concurrently with MMIO handlers
>>
>> Now let's see which locks are in use while doing that.
>>
>> 1. No struct vpci_dev is used.
>> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
>> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
>> 1.3. Locking happens on system level
>>
>> 2. struct vpci_dev is used
>> 2.1. We have a per-domain lock vdev_lock
>> 2.2. Locking happens on per domain level
>>
>> To compare the two:
>>
>> 1. Without vpci_dev
>> pros: much simpler code
>> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
>>
>> 2. With vpc_dev
>> pros: per-domain locking
>> cons: more code
>>
>> I have implemented the two methods and we need to decide
>> which route we go.
> We could always see about converting the pcidevs lock into a rw one if
> it turns out there's too much contention. PCI config space accesses
> shouldn't be that common or performance critical, so having some
> contention might not be noticeable.
>
> TBH I would start with the simpler solution (add guest_sbdf and use
> pci lock) and move to something more complex once issues are
> identified.
Ok, the code is indeed way simpler with guest_sbdf and pci lock
So, I'll use this approach for now
>
> Regards, Roger.
Thank you,
Oleksandr
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 40d67ec34232..e0170087612d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -601,6 +601,9 @@  struct domain *domain_create(domid_t domid,
 
 #ifdef CONFIG_HAS_PCI
     INIT_LIST_HEAD(&d->pdev_list);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    INIT_LIST_HEAD(&d->vdev_list);
+#endif
 #endif
 
     /* All error paths can depend on the above setup. */
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 805ab86ed555..5b963d75d1ba 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -831,6 +831,66 @@  int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     return ret;
 }
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
+                                                const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    list_for_each_entry ( vdev, &d->vdev_list, list )
+        if ( vdev->pdev == pdev )
+            return vdev;
+    return NULL;
+}
+
+int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    ASSERT(!pci_find_virtual_device(d, pdev));
+
+    /* Each PCI bus supports 32 devices/slots at max. */
+    if ( d->vpci_dev_next > 31 )
+        return -ENOSPC;
+
+    vdev = xzalloc(struct vpci_dev);
+    if ( !vdev )
+        return -ENOMEM;
+
+    /* We emulate a single host bridge for the guest, so segment is always 0. */
+    vdev->seg = 0;
+
+    /*
+     * The bus number is set to 0, so virtual devices are seen
+     * as embedded endpoints behind the root complex.
+     */
+    vdev->bus = 0;
+    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
+
+    vdev->pdev = pdev;
+    vdev->domain = d;
+
+    pcidevs_lock();
+    list_add_tail(&vdev->list, &d->vdev_list);
+    pcidevs_unlock();
+
+    return 0;
+}
+
+int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    pcidevs_lock();
+    vdev = pci_find_virtual_device(d, pdev);
+    if ( vdev )
+        list_del(&vdev->list);
+    pcidevs_unlock();
+    xfree(vdev);
+    return 0;
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 /* Caller should hold the pcidevs_lock */
 static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
                            uint8_t devfn)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 702f7b5d5dda..d787f13e679e 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -91,20 +91,32 @@  int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
 {
+    int rc;
+
     /* It only makes sense to assign for hwdom or guest domain. */
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
-    return vpci_bar_add_handlers(d, dev);
+    rc = vpci_bar_add_handlers(d, dev);
+    if ( rc )
+        return rc;
+
+    return pci_add_virtual_device(d, dev);
 }
 
 /* Notify vPCI that device is de-assigned from guest. */
 int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
 {
+    int rc;
+
     /* It only makes sense to de-assign from hwdom or guest domain. */
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
+    rc = pci_remove_virtual_device(d, dev);
+    if ( rc )
+        return rc;
+
     return vpci_bar_remove_handlers(d, dev);
 }
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 43b8a0817076..33033a3a8f8d 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -137,6 +137,24 @@  struct pci_dev {
     struct vpci *vpci;
 };
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+struct vpci_dev {
+    struct list_head list;
+    /* Physical PCI device this virtual device is connected to. */
+    const struct pci_dev *pdev;
+    /* Virtual SBDF of the device. */
+    union {
+        struct {
+            uint8_t devfn;
+            uint8_t bus;
+            uint16_t seg;
+        };
+        pci_sbdf_t sbdf;
+    };
+    struct domain *domain;
+};
+#endif
+
 #define for_each_pdev(domain, pdev) \
     list_for_each_entry(pdev, &(domain)->pdev_list, domain_list)
 
@@ -167,6 +185,10 @@  const unsigned long *pci_get_ro_map(u16 seg);
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
                    const struct pci_dev_info *, nodeid_t node);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
+int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
+#endif
 int pci_ro_device(int seg, int bus, int devfn);
 int pci_hide_device(unsigned int seg, unsigned int bus, unsigned int devfn);
 struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 28146ee404e6..ecdb04b4f7fc 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -444,6 +444,14 @@  struct domain
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    struct list_head vdev_list;
+    /*
+     * Current device number used by the virtual PCI bus topology
+     * to assign a unique SBDF to a passed through virtual PCI device.
+     */
+    int vpci_dev_next;
+#endif
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH