diff mbox series

[XEN,v12,1/7] xen/pci: Add hypercall to support reset of pcidev

Message ID 20240708114124.407797-2-Jiqian.Chen@amd.com (mailing list archive)
State Superseded
Headers show
Series Support device passthrough when dom0 is PVH on Xen | expand

Commit Message

Jiqian Chen July 8, 2024, 11:41 a.m. UTC
When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c      | 10 +++++++
 xen/include/public/physdev.h | 16 +++++++++++
 xen/include/xen/vpci.h       |  8 ++++++
 5 files changed, 87 insertions(+)

Comments

Jan Beulich July 8, 2024, 2:56 p.m. UTC | #1
On 08.07.2024 13:41, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Just to double check: You're sure the other two R-b are still applicable,
despite the various changes that have been made?

As a purely cosmetic remark: I think I would have preferred if the new
identifiers didn't have "state" as a part; I simply don't think this adds
much value, while at the same time making these pretty long.

Jan
Jiqian Chen July 9, 2024, 2:47 a.m. UTC | #2
On 2024/7/8 22:56, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thank you very much!

> 
> Just to double check: You're sure the other two R-b are still applicable,
> despite the various changes that have been made?
Will remove in next version.

> 
> As a purely cosmetic remark: I think I would have preferred if the new
> identifiers didn't have "state" as a part; I simply don't think this adds
> much value, while at the same time making these pretty long.
Do you mean: remove "state" identifier on all the new codes?

> 
> Jan
Jan Beulich July 9, 2024, 6:01 a.m. UTC | #3
On 09.07.2024 04:47, Chen, Jiqian wrote:
> On 2024/7/8 22:56, Jan Beulich wrote:
>> On 08.07.2024 13:41, Jiqian Chen wrote:
>>> When a device has been reset on dom0 side, the Xen hypervisor
>>> doesn't get notification, so the cached state in vpci is all
>>> out of date compare with the real device state.
>>>
>>> To solve that problem, add a new hypercall to support the reset
>>> of pcidev and clear the vpci state of device. So that once the
>>> state of device is reset on dom0 side, dom0 can call this
>>> hypercall to notify hypervisor.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Thank you very much!
> 
>>
>> Just to double check: You're sure the other two R-b are still applicable,
>> despite the various changes that have been made?
> Will remove in next version.
> 
>>
>> As a purely cosmetic remark: I think I would have preferred if the new
>> identifiers didn't have "state" as a part; I simply don't think this adds
>> much value, while at the same time making these pretty long.
> Do you mean: remove "state" identifier on all the new codes?

"part of identifiers", yes. As that's a personal view, I wouldn't insist
though, unless others shared my perspective.

Jan
Roger Pau Monné July 31, 2024, 3:55 p.m. UTC | #4
On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Thanks, just a couple of nits.

This is missing a changelog between versions, and I haven't been
following all the versions, so some of my questions might have been
answered in previous revisions.

> ---
>  xen/arch/x86/hvm/hypercall.c |  1 +
>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c      | 10 +++++++
>  xen/include/public/physdev.h | 16 +++++++++++
>  xen/include/xen/vpci.h       |  8 ++++++
>  5 files changed, 87 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 7fb3136f0c7c..0fab670a4871 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case PHYSDEVOP_pci_mmcfg_reserved:
>      case PHYSDEVOP_pci_device_add:
>      case PHYSDEVOP_pci_device_remove:
> +    case PHYSDEVOP_pci_device_state_reset:
>      case PHYSDEVOP_dbgp_op:
>          if ( !is_hardware_domain(currd) )
>              return -ENOSYS;
> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
> index 42db3e6d133c..c0f47945d955 100644
> --- a/xen/drivers/pci/physdev.c
> +++ b/xen/drivers/pci/physdev.c
> @@ -2,6 +2,7 @@
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
>  #include <xen/init.h>
> +#include <xen/vpci.h>
>  
>  #ifndef COMPAT
>  typedef long ret_t;
> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case PHYSDEVOP_pci_device_state_reset:
> +    {
> +        struct pci_device_state_reset dev_reset;
> +        struct pci_dev *pdev;
> +        pci_sbdf_t sbdf;
> +
> +        ret = -EOPNOTSUPP;
> +        if ( !is_pci_passthrough_enabled() )
> +            break;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> +            break;
> +
> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
> +                        dev_reset.dev.bus,
> +                        dev_reset.dev.devfn);
> +
> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +        if ( ret )
> +            break;
> +
> +        pcidevs_lock();
> +        pdev = pci_get_pdev(NULL, sbdf);
> +        if ( !pdev )
> +        {
> +            pcidevs_unlock();
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        write_lock(&pdev->domain->pci_lock);
> +        pcidevs_unlock();
> +        switch ( dev_reset.reset_type )
> +        {
> +        case PCI_DEVICE_STATE_RESET_COLD:
> +        case PCI_DEVICE_STATE_RESET_WARM:
> +        case PCI_DEVICE_STATE_RESET_HOT:
> +        case PCI_DEVICE_STATE_RESET_FLR:
> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> +            break;
> +
> +        default:
> +            ret = -EOPNOTSUPP;
> +            break;
> +        }
> +        write_unlock(&pdev->domain->pci_lock);
> +
> +        break;
> +    }
> +
>      default:
>          ret = -ENOSYS;
>          break;
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 1e6aa5d799b9..7e914d1eff9f 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +int vpci_reset_device_state(struct pci_dev *pdev,
> +                            uint32_t reset_type)

There's probably no use in passing reset_type to
vpci_reset_device_state() if it's ignored?

> +{
> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> +
> +    vpci_deassign_device(pdev);
> +    return vpci_assign_device(pdev);
> +}
> +
>  #endif /* __XEN__ */
>  
>  static int vpci_register_cmp(const struct vpci_register *r1,
> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> index f0c0d4727c0b..3cfde3fd2389 100644
> --- a/xen/include/public/physdev.h
> +++ b/xen/include/public/physdev.h
> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>   */
>  #define PHYSDEVOP_prepare_msix          30
>  #define PHYSDEVOP_release_msix          31
> +/*
> + * Notify the hypervisor that a PCI device has been reset, so that any
> + * internally cached state is regenerated.  Should be called after any
> + * device reset performed by the hardware domain.
> + */
> +#define PHYSDEVOP_pci_device_state_reset 32
> +
>  struct physdev_pci_device {
>      /* IN */
>      uint16_t seg;
> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>  typedef struct physdev_pci_device physdev_pci_device_t;
>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>  
> +struct pci_device_state_reset {
> +    physdev_pci_device_t dev;
> +#define PCI_DEVICE_STATE_RESET_COLD 0
> +#define PCI_DEVICE_STATE_RESET_WARM 1
> +#define PCI_DEVICE_STATE_RESET_HOT  2
> +#define PCI_DEVICE_STATE_RESET_FLR  3
> +    uint32_t reset_type;

This might want to be a flags field, with the low 2 bits (or maybe 3
bits to cope if more rest modes are added in the future) being used to
signal the reset type.  We can always do that later if flags need to
be added.

Seeing as reset_type has no impact on the hypercall, I would like to
ask for some reasoning for it's presence to be added to the commit
message, otherwise it feels like pointless code churn.

> +};
> +
>  #define PHYSDEVOP_DBGP_RESET_PREPARE    1
>  #define PHYSDEVOP_DBGP_RESET_DONE       2
>  
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index da8d0f41e6f4..6be812dbc04a 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -38,6 +38,8 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>  
>  /* Remove all handlers and free vpci related structures. */
>  void vpci_deassign_device(struct pci_dev *pdev);
> +int __must_check vpci_reset_device_state(struct pci_dev *pdev,
> +                                         uint32_t reset_type);
>  
>  /* Add/remove a register handler. */
>  int __must_check vpci_add_register_mask(struct vpci *vpci,
> @@ -282,6 +284,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>  
>  static inline void vpci_deassign_device(struct pci_dev *pdev) { }
>  
> +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
> +                                                       uint32_t reset_type)
> +{
> +    return 0;
> +}
> +

Maybe it turns out to be more complicated than the current approach,
but vpci_reset_device_state() could be an static inline function in
vpci.h defined regardless of whether CONFIG_HAS_VPCI is selected or
not, as the underlying functions vpci_{de}assign_device() are always
defined.

Thanks, Roger.
Jan Beulich July 31, 2024, 3:58 p.m. UTC | #5
On 31.07.2024 17:55, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev,
>> +                            uint32_t reset_type)
> 
> There's probably no use in passing reset_type to
> vpci_reset_device_state() if it's ignored?

I consider this forward-looking. It seems rather unlikely that in the
longer run the reset type doesn't matter.

Jan
Roger Pau Monné July 31, 2024, 4:13 p.m. UTC | #6
On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
> On 31.07.2024 17:55, Roger Pau Monné wrote:
> > On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
> >>  
> >>      return rc;
> >>  }
> >> +
> >> +int vpci_reset_device_state(struct pci_dev *pdev,
> >> +                            uint32_t reset_type)
> > 
> > There's probably no use in passing reset_type to
> > vpci_reset_device_state() if it's ignored?
> 
> I consider this forward-looking. It seems rather unlikely that in the
> longer run the reset type doesn't matter.

I'm fine with having it in the hypercall interface, but passing it to
vpci_reset_device_state() can be done once there's a purpose for it,
and it won't change any public facing interface.

Thanks, Roger.
Jan Beulich Aug. 1, 2024, 6:49 a.m. UTC | #7
On 31.07.2024 18:13, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
>> On 31.07.2024 17:55, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>      return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>> +                            uint32_t reset_type)
>>>
>>> There's probably no use in passing reset_type to
>>> vpci_reset_device_state() if it's ignored?
>>
>> I consider this forward-looking. It seems rather unlikely that in the
>> longer run the reset type doesn't matter.
> 
> I'm fine with having it in the hypercall interface, but passing it to
> vpci_reset_device_state() can be done once there's a purpose for it,
> and it won't change any public facing interface.

Jiqian, just to clarify: I'm okay either way.

Jan
Jiqian Chen Aug. 2, 2024, 2:55 a.m. UTC | #8
On 2024/7/31 23:55, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Thanks, just a couple of nits.
> 
> This is missing a changelog between versions, and I haven't been
> following all the versions, so some of my questions might have been
> answered in previous revisions.
Sorry, I will add changelogs here in next version.

> 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>  xen/include/public/physdev.h | 16 +++++++++++
>>  xen/include/xen/vpci.h       |  8 ++++++
>>  5 files changed, 87 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 7fb3136f0c7c..0fab670a4871 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>      case PHYSDEVOP_pci_device_add:
>>      case PHYSDEVOP_pci_device_remove:
>> +    case PHYSDEVOP_pci_device_state_reset:
>>      case PHYSDEVOP_dbgp_op:
>>          if ( !is_hardware_domain(currd) )
>>              return -ENOSYS;
>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>> index 42db3e6d133c..c0f47945d955 100644
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -2,6 +2,7 @@
>>  #include <xen/guest_access.h>
>>  #include <xen/hypercall.h>
>>  #include <xen/init.h>
>> +#include <xen/vpci.h>
>>  
>>  #ifndef COMPAT
>>  typedef long ret_t;
>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case PHYSDEVOP_pci_device_state_reset:
>> +    {
>> +        struct pci_device_state_reset dev_reset;
>> +        struct pci_dev *pdev;
>> +        pci_sbdf_t sbdf;
>> +
>> +        ret = -EOPNOTSUPP;
>> +        if ( !is_pci_passthrough_enabled() )
>> +            break;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>> +            break;
>> +
>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>> +                        dev_reset.dev.bus,
>> +                        dev_reset.dev.devfn);
>> +
>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +        if ( ret )
>> +            break;
>> +
>> +        pcidevs_lock();
>> +        pdev = pci_get_pdev(NULL, sbdf);
>> +        if ( !pdev )
>> +        {
>> +            pcidevs_unlock();
>> +            ret = -ENODEV;
>> +            break;
>> +        }
>> +
>> +        write_lock(&pdev->domain->pci_lock);
>> +        pcidevs_unlock();
>> +        switch ( dev_reset.reset_type )
>> +        {
>> +        case PCI_DEVICE_STATE_RESET_COLD:
>> +        case PCI_DEVICE_STATE_RESET_WARM:
>> +        case PCI_DEVICE_STATE_RESET_HOT:
>> +        case PCI_DEVICE_STATE_RESET_FLR:
>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>> +            break;
>> +
>> +        default:
>> +            ret = -EOPNOTSUPP;
>> +            break;
>> +        }
>> +        write_unlock(&pdev->domain->pci_lock);
>> +
>> +        break;
>> +    }
>> +
>>      default:
>>          ret = -ENOSYS;
>>          break;
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 1e6aa5d799b9..7e914d1eff9f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev,
>> +                            uint32_t reset_type)
> 
> There's probably no use in passing reset_type to
> vpci_reset_device_state() if it's ignored?
> 
>> +{
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> +    vpci_deassign_device(pdev);
>> +    return vpci_assign_device(pdev);
>> +}
>> +
>>  #endif /* __XEN__ */
>>  
>>  static int vpci_register_cmp(const struct vpci_register *r1,
>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>> index f0c0d4727c0b..3cfde3fd2389 100644
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix          30
>>  #define PHYSDEVOP_release_msix          31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached state is regenerated.  Should be called after any
>> + * device reset performed by the hardware domain.
>> + */
>> +#define PHYSDEVOP_pci_device_state_reset 32
>> +
>>  struct physdev_pci_device {
>>      /* IN */
>>      uint16_t seg;
>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>  
>> +struct pci_device_state_reset {
>> +    physdev_pci_device_t dev;
>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>> +    uint32_t reset_type;
> 
> This might want to be a flags field, with the low 2 bits (or maybe 3
> bits to cope if more rest modes are added in the future) being used to
> signal the reset type.  We can always do that later if flags need to
> be added.
Do you mean this?
+struct pci_device_state_reset {
+    physdev_pci_device_t dev;
+#define _PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
+#define _PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
+#define _PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
+#define _PCI_DEVICE_STATE_RESET_FLR  3
+#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
+    uint32_t reset_type;
+};

> 
> Seeing as reset_type has no impact on the hypercall, I would like to
> ask for some reasoning for it's presence to be added to the commit
> message, otherwise it feels like pointless code churn.
OK, will add some commit messages to illustrate that this is for the forward-looking implementation of different reset types of processing situations in the future.

> 
>> +};
>> +
>>  #define PHYSDEVOP_DBGP_RESET_PREPARE    1
>>  #define PHYSDEVOP_DBGP_RESET_DONE       2
>>  
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index da8d0f41e6f4..6be812dbc04a 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -38,6 +38,8 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>  
>>  /* Remove all handlers and free vpci related structures. */
>>  void vpci_deassign_device(struct pci_dev *pdev);
>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev,
>> +                                         uint32_t reset_type);
>>  
>>  /* Add/remove a register handler. */
>>  int __must_check vpci_add_register_mask(struct vpci *vpci,
>> @@ -282,6 +284,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>>  
>>  static inline void vpci_deassign_device(struct pci_dev *pdev) { }
>>  
>> +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
>> +                                                       uint32_t reset_type)
>> +{
>> +    return 0;
>> +}
>> +
> 
> Maybe it turns out to be more complicated than the current approach,
> but vpci_reset_device_state() could be an static inline function in
> vpci.h defined regardless of whether CONFIG_HAS_VPCI is selected or
> not, as the underlying functions vpci_{de}assign_device() are always
> defined.
OK, will change to this in next version.
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}

> 
> Thanks, Roger.
Jiqian Chen Aug. 2, 2024, 2:56 a.m. UTC | #9
On 2024/8/1 14:49, Jan Beulich wrote:
> On 31.07.2024 18:13, Roger Pau Monné wrote:
>> On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
>>> On 31.07.2024 17:55, Roger Pau Monné wrote:
>>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>>> --- a/xen/drivers/vpci/vpci.c
>>>>> +++ b/xen/drivers/vpci/vpci.c
>>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>>  
>>>>>      return rc;
>>>>>  }
>>>>> +
>>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>>> +                            uint32_t reset_type)
>>>>
>>>> There's probably no use in passing reset_type to
>>>> vpci_reset_device_state() if it's ignored?
>>>
>>> I consider this forward-looking. It seems rather unlikely that in the
>>> longer run the reset type doesn't matter.
>>
>> I'm fine with having it in the hypercall interface, but passing it to
>> vpci_reset_device_state() can be done once there's a purpose for it,
>> and it won't change any public facing interface.
> 
> Jiqian, just to clarify: I'm okay either way.
Thank you very much! You dispelled my concerns.
I will remove reset_type in next version.

> 
> Jan
Jan Beulich Aug. 2, 2024, 6:25 a.m. UTC | #10
On 02.08.2024 04:55, Chen, Jiqian wrote:
> On 2024/7/31 23:55, Roger Pau Monné wrote:
>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>> When a device has been reset on dom0 side, the Xen hypervisor
>>> doesn't get notification, so the cached state in vpci is all
>>> out of date compare with the real device state.
>>>
>>> To solve that problem, add a new hypercall to support the reset
>>> of pcidev and clear the vpci state of device. So that once the
>>> state of device is reset on dom0 side, dom0 can call this
>>> hypercall to notify hypervisor.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> Thanks, just a couple of nits.
>>
>> This is missing a changelog between versions, and I haven't been
>> following all the versions, so some of my questions might have been
>> answered in previous revisions.
> Sorry, I will add changelogs here in next version.
> 
>>
>>> ---
>>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>>  xen/include/public/physdev.h | 16 +++++++++++
>>>  xen/include/xen/vpci.h       |  8 ++++++
>>>  5 files changed, 87 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>>> index 7fb3136f0c7c..0fab670a4871 100644
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>>      case PHYSDEVOP_pci_device_add:
>>>      case PHYSDEVOP_pci_device_remove:
>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>      case PHYSDEVOP_dbgp_op:
>>>          if ( !is_hardware_domain(currd) )
>>>              return -ENOSYS;
>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>> index 42db3e6d133c..c0f47945d955 100644
>>> --- a/xen/drivers/pci/physdev.c
>>> +++ b/xen/drivers/pci/physdev.c
>>> @@ -2,6 +2,7 @@
>>>  #include <xen/guest_access.h>
>>>  #include <xen/hypercall.h>
>>>  #include <xen/init.h>
>>> +#include <xen/vpci.h>
>>>  
>>>  #ifndef COMPAT
>>>  typedef long ret_t;
>>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          break;
>>>      }
>>>  
>>> +    case PHYSDEVOP_pci_device_state_reset:
>>> +    {
>>> +        struct pci_device_state_reset dev_reset;
>>> +        struct pci_dev *pdev;
>>> +        pci_sbdf_t sbdf;
>>> +
>>> +        ret = -EOPNOTSUPP;
>>> +        if ( !is_pci_passthrough_enabled() )
>>> +            break;
>>> +
>>> +        ret = -EFAULT;
>>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>>> +            break;
>>> +
>>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>>> +                        dev_reset.dev.bus,
>>> +                        dev_reset.dev.devfn);
>>> +
>>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>> +        if ( ret )
>>> +            break;
>>> +
>>> +        pcidevs_lock();
>>> +        pdev = pci_get_pdev(NULL, sbdf);
>>> +        if ( !pdev )
>>> +        {
>>> +            pcidevs_unlock();
>>> +            ret = -ENODEV;
>>> +            break;
>>> +        }
>>> +
>>> +        write_lock(&pdev->domain->pci_lock);
>>> +        pcidevs_unlock();
>>> +        switch ( dev_reset.reset_type )
>>> +        {
>>> +        case PCI_DEVICE_STATE_RESET_COLD:
>>> +        case PCI_DEVICE_STATE_RESET_WARM:
>>> +        case PCI_DEVICE_STATE_RESET_HOT:
>>> +        case PCI_DEVICE_STATE_RESET_FLR:
>>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>>> +            break;
>>> +
>>> +        default:
>>> +            ret = -EOPNOTSUPP;
>>> +            break;
>>> +        }
>>> +        write_unlock(&pdev->domain->pci_lock);
>>> +
>>> +        break;
>>> +    }
>>> +
>>>      default:
>>>          ret = -ENOSYS;
>>>          break;
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 1e6aa5d799b9..7e914d1eff9f 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>  
>>>      return rc;
>>>  }
>>> +
>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>> +                            uint32_t reset_type)
>>
>> There's probably no use in passing reset_type to
>> vpci_reset_device_state() if it's ignored?
>>
>>> +{
>>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>> +
>>> +    vpci_deassign_device(pdev);
>>> +    return vpci_assign_device(pdev);
>>> +}
>>> +
>>>  #endif /* __XEN__ */
>>>  
>>>  static int vpci_register_cmp(const struct vpci_register *r1,
>>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>>> index f0c0d4727c0b..3cfde3fd2389 100644
>>> --- a/xen/include/public/physdev.h
>>> +++ b/xen/include/public/physdev.h
>>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>>   */
>>>  #define PHYSDEVOP_prepare_msix          30
>>>  #define PHYSDEVOP_release_msix          31
>>> +/*
>>> + * Notify the hypervisor that a PCI device has been reset, so that any
>>> + * internally cached state is regenerated.  Should be called after any
>>> + * device reset performed by the hardware domain.
>>> + */
>>> +#define PHYSDEVOP_pci_device_state_reset 32
>>> +
>>>  struct physdev_pci_device {
>>>      /* IN */
>>>      uint16_t seg;
>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>  
>>> +struct pci_device_state_reset {
>>> +    physdev_pci_device_t dev;
>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>> +    uint32_t reset_type;
>>
>> This might want to be a flags field, with the low 2 bits (or maybe 3
>> bits to cope if more rest modes are added in the future) being used to
>> signal the reset type.  We can always do that later if flags need to
>> be added.
> Do you mean this?
> +struct pci_device_state_reset {
> +    physdev_pci_device_t dev;
> +#define _PCI_DEVICE_STATE_RESET_COLD 0
> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
> +#define _PCI_DEVICE_STATE_RESET_WARM 1
> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
> +#define _PCI_DEVICE_STATE_RESET_HOT  2
> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
> +#define _PCI_DEVICE_STATE_RESET_FLR  3
> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
> +    uint32_t reset_type;
> +};

That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
like #define-s, but additionally define a 2-bit mask constant (0x3). I
don't think it needs to be three bits right away - we can decide what to
do there when any of the higher bits are to be assigned a meaning.

Jan
Jiqian Chen Aug. 2, 2024, 7:41 a.m. UTC | #11
On 2024/8/2 14:25, Jan Beulich wrote:
> On 02.08.2024 04:55, Chen, Jiqian wrote:
>> On 2024/7/31 23:55, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>> When a device has been reset on dom0 side, the Xen hypervisor
>>>> doesn't get notification, so the cached state in vpci is all
>>>> out of date compare with the real device state.
>>>>
>>>> To solve that problem, add a new hypercall to support the reset
>>>> of pcidev and clear the vpci state of device. So that once the
>>>> state of device is reset on dom0 side, dom0 can call this
>>>> hypercall to notify hypervisor.
>>>>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>
>>> Thanks, just a couple of nits.
>>>
>>> This is missing a changelog between versions, and I haven't been
>>> following all the versions, so some of my questions might have been
>>> answered in previous revisions.
>> Sorry, I will add changelogs here in next version.
>>
>>>
>>>> ---
>>>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>>>  xen/include/public/physdev.h | 16 +++++++++++
>>>>  xen/include/xen/vpci.h       |  8 ++++++
>>>>  5 files changed, 87 insertions(+)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>>>> index 7fb3136f0c7c..0fab670a4871 100644
>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>>>      case PHYSDEVOP_pci_device_add:
>>>>      case PHYSDEVOP_pci_device_remove:
>>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>>      case PHYSDEVOP_dbgp_op:
>>>>          if ( !is_hardware_domain(currd) )
>>>>              return -ENOSYS;
>>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>>> index 42db3e6d133c..c0f47945d955 100644
>>>> --- a/xen/drivers/pci/physdev.c
>>>> +++ b/xen/drivers/pci/physdev.c
>>>> @@ -2,6 +2,7 @@
>>>>  #include <xen/guest_access.h>
>>>>  #include <xen/hypercall.h>
>>>>  #include <xen/init.h>
>>>> +#include <xen/vpci.h>
>>>>  
>>>>  #ifndef COMPAT
>>>>  typedef long ret_t;
>>>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>          break;
>>>>      }
>>>>  
>>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>> +    {
>>>> +        struct pci_device_state_reset dev_reset;
>>>> +        struct pci_dev *pdev;
>>>> +        pci_sbdf_t sbdf;
>>>> +
>>>> +        ret = -EOPNOTSUPP;
>>>> +        if ( !is_pci_passthrough_enabled() )
>>>> +            break;
>>>> +
>>>> +        ret = -EFAULT;
>>>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>>>> +            break;
>>>> +
>>>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>>>> +                        dev_reset.dev.bus,
>>>> +                        dev_reset.dev.devfn);
>>>> +
>>>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>>> +        if ( ret )
>>>> +            break;
>>>> +
>>>> +        pcidevs_lock();
>>>> +        pdev = pci_get_pdev(NULL, sbdf);
>>>> +        if ( !pdev )
>>>> +        {
>>>> +            pcidevs_unlock();
>>>> +            ret = -ENODEV;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        write_lock(&pdev->domain->pci_lock);
>>>> +        pcidevs_unlock();
>>>> +        switch ( dev_reset.reset_type )
>>>> +        {
>>>> +        case PCI_DEVICE_STATE_RESET_COLD:
>>>> +        case PCI_DEVICE_STATE_RESET_WARM:
>>>> +        case PCI_DEVICE_STATE_RESET_HOT:
>>>> +        case PCI_DEVICE_STATE_RESET_FLR:
>>>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>>>> +            break;
>>>> +
>>>> +        default:
>>>> +            ret = -EOPNOTSUPP;
>>>> +            break;
>>>> +        }
>>>> +        write_unlock(&pdev->domain->pci_lock);
>>>> +
>>>> +        break;
>>>> +    }
>>>> +
>>>>      default:
>>>>          ret = -ENOSYS;
>>>>          break;
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 1e6aa5d799b9..7e914d1eff9f 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>      return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>> +                            uint32_t reset_type)
>>>
>>> There's probably no use in passing reset_type to
>>> vpci_reset_device_state() if it's ignored?
>>>
>>>> +{
>>>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>>> +
>>>> +    vpci_deassign_device(pdev);
>>>> +    return vpci_assign_device(pdev);
>>>> +}
>>>> +
>>>>  #endif /* __XEN__ */
>>>>  
>>>>  static int vpci_register_cmp(const struct vpci_register *r1,
>>>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>>>> index f0c0d4727c0b..3cfde3fd2389 100644
>>>> --- a/xen/include/public/physdev.h
>>>> +++ b/xen/include/public/physdev.h
>>>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>>>   */
>>>>  #define PHYSDEVOP_prepare_msix          30
>>>>  #define PHYSDEVOP_release_msix          31
>>>> +/*
>>>> + * Notify the hypervisor that a PCI device has been reset, so that any
>>>> + * internally cached state is regenerated.  Should be called after any
>>>> + * device reset performed by the hardware domain.
>>>> + */
>>>> +#define PHYSDEVOP_pci_device_state_reset 32
>>>> +
>>>>  struct physdev_pci_device {
>>>>      /* IN */
>>>>      uint16_t seg;
>>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>>  
>>>> +struct pci_device_state_reset {
>>>> +    physdev_pci_device_t dev;
>>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>>> +    uint32_t reset_type;
>>>
>>> This might want to be a flags field, with the low 2 bits (or maybe 3
>>> bits to cope if more rest modes are added in the future) being used to
>>> signal the reset type.  We can always do that later if flags need to
>>> be added.
>> Do you mean this?
>> +struct pci_device_state_reset {
>> +    physdev_pci_device_t dev;
>> +#define _PCI_DEVICE_STATE_RESET_COLD 0
>> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
>> +#define _PCI_DEVICE_STATE_RESET_WARM 1
>> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
>> +#define _PCI_DEVICE_STATE_RESET_HOT  2
>> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
>> +#define _PCI_DEVICE_STATE_RESET_FLR  3
>> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
>> +    uint32_t reset_type;
>> +};
> 
> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
> like #define-s, but additionally define a 2-bit mask constant (0x3). I
> don't think it needs to be three bits right away - we can decide what to
> do there when any of the higher bits are to be assigned a meaning.
Like this?
struct pci_device_state_reset {
    physdev_pci_device_t dev;
#define PCI_DEVICE_STATE_RESET_COLD 0x0
#define PCI_DEVICE_STATE_RESET_WARM 0x1
#define PCI_DEVICE_STATE_RESET_HOT  0x2
#define PCI_DEVICE_STATE_RESET_FLR  0x3
#define PCI_DEVICE_STATE_RESET_MASK  0x3
    uint32_t flags;
};

> 
> Jan
Jan Beulich Aug. 2, 2024, 7:43 a.m. UTC | #12
On 02.08.2024 09:41, Chen, Jiqian wrote:
> On 2024/8/2 14:25, Jan Beulich wrote:
>> On 02.08.2024 04:55, Chen, Jiqian wrote:
>>> On 2024/7/31 23:55, Roger Pau Monné wrote:
>>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>>>  
>>>>> +struct pci_device_state_reset {
>>>>> +    physdev_pci_device_t dev;
>>>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>>>> +    uint32_t reset_type;
>>>>
>>>> This might want to be a flags field, with the low 2 bits (or maybe 3
>>>> bits to cope if more rest modes are added in the future) being used to
>>>> signal the reset type.  We can always do that later if flags need to
>>>> be added.
>>> Do you mean this?
>>> +struct pci_device_state_reset {
>>> +    physdev_pci_device_t dev;
>>> +#define _PCI_DEVICE_STATE_RESET_COLD 0
>>> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
>>> +#define _PCI_DEVICE_STATE_RESET_WARM 1
>>> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
>>> +#define _PCI_DEVICE_STATE_RESET_HOT  2
>>> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
>>> +#define _PCI_DEVICE_STATE_RESET_FLR  3
>>> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
>>> +    uint32_t reset_type;
>>> +};
>>
>> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
>> like #define-s, but additionally define a 2-bit mask constant (0x3). I
>> don't think it needs to be three bits right away - we can decide what to
>> do there when any of the higher bits are to be assigned a meaning.
> Like this?
> struct pci_device_state_reset {
>     physdev_pci_device_t dev;
> #define PCI_DEVICE_STATE_RESET_COLD 0x0
> #define PCI_DEVICE_STATE_RESET_WARM 0x1
> #define PCI_DEVICE_STATE_RESET_HOT  0x2
> #define PCI_DEVICE_STATE_RESET_FLR  0x3
> #define PCI_DEVICE_STATE_RESET_MASK  0x3
>     uint32_t flags;
> };

Yes, with the last #define adjusted such that columns align.

Jan
Roger Pau Monné Aug. 2, 2024, 7:44 a.m. UTC | #13
On Fri, Aug 02, 2024 at 08:25:58AM +0200, Jan Beulich wrote:
> On 02.08.2024 04:55, Chen, Jiqian wrote:
> > On 2024/7/31 23:55, Roger Pau Monné wrote:
> >> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> >>> When a device has been reset on dom0 side, the Xen hypervisor
> >>> doesn't get notification, so the cached state in vpci is all
> >>> out of date compare with the real device state.
> >>>
> >>> To solve that problem, add a new hypercall to support the reset
> >>> of pcidev and clear the vpci state of device. So that once the
> >>> state of device is reset on dom0 side, dom0 can call this
> >>> hypercall to notify hypervisor.
> >>>
> >>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> >>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> >>
> >> Thanks, just a couple of nits.
> >>
> >> This is missing a changelog between versions, and I haven't been
> >> following all the versions, so some of my questions might have been
> >> answered in previous revisions.
> > Sorry, I will add changelogs here in next version.
> > 
> >>
> >>> ---
> >>>  xen/arch/x86/hvm/hypercall.c |  1 +
> >>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
> >>>  xen/drivers/vpci/vpci.c      | 10 +++++++
> >>>  xen/include/public/physdev.h | 16 +++++++++++
> >>>  xen/include/xen/vpci.h       |  8 ++++++
> >>>  5 files changed, 87 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> >>> index 7fb3136f0c7c..0fab670a4871 100644
> >>> --- a/xen/arch/x86/hvm/hypercall.c
> >>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>      case PHYSDEVOP_pci_mmcfg_reserved:
> >>>      case PHYSDEVOP_pci_device_add:
> >>>      case PHYSDEVOP_pci_device_remove:
> >>> +    case PHYSDEVOP_pci_device_state_reset:
> >>>      case PHYSDEVOP_dbgp_op:
> >>>          if ( !is_hardware_domain(currd) )
> >>>              return -ENOSYS;
> >>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
> >>> index 42db3e6d133c..c0f47945d955 100644
> >>> --- a/xen/drivers/pci/physdev.c
> >>> +++ b/xen/drivers/pci/physdev.c
> >>> @@ -2,6 +2,7 @@
> >>>  #include <xen/guest_access.h>
> >>>  #include <xen/hypercall.h>
> >>>  #include <xen/init.h>
> >>> +#include <xen/vpci.h>
> >>>  
> >>>  #ifndef COMPAT
> >>>  typedef long ret_t;
> >>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>          break;
> >>>      }
> >>>  
> >>> +    case PHYSDEVOP_pci_device_state_reset:
> >>> +    {
> >>> +        struct pci_device_state_reset dev_reset;
> >>> +        struct pci_dev *pdev;
> >>> +        pci_sbdf_t sbdf;
> >>> +
> >>> +        ret = -EOPNOTSUPP;
> >>> +        if ( !is_pci_passthrough_enabled() )
> >>> +            break;
> >>> +
> >>> +        ret = -EFAULT;
> >>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> >>> +            break;
> >>> +
> >>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
> >>> +                        dev_reset.dev.bus,
> >>> +                        dev_reset.dev.devfn);
> >>> +
> >>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> >>> +        if ( ret )
> >>> +            break;
> >>> +
> >>> +        pcidevs_lock();
> >>> +        pdev = pci_get_pdev(NULL, sbdf);
> >>> +        if ( !pdev )
> >>> +        {
> >>> +            pcidevs_unlock();
> >>> +            ret = -ENODEV;
> >>> +            break;
> >>> +        }
> >>> +
> >>> +        write_lock(&pdev->domain->pci_lock);
> >>> +        pcidevs_unlock();
> >>> +        switch ( dev_reset.reset_type )
> >>> +        {
> >>> +        case PCI_DEVICE_STATE_RESET_COLD:
> >>> +        case PCI_DEVICE_STATE_RESET_WARM:
> >>> +        case PCI_DEVICE_STATE_RESET_HOT:
> >>> +        case PCI_DEVICE_STATE_RESET_FLR:
> >>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> >>> +            break;
> >>> +
> >>> +        default:
> >>> +            ret = -EOPNOTSUPP;
> >>> +            break;
> >>> +        }
> >>> +        write_unlock(&pdev->domain->pci_lock);
> >>> +
> >>> +        break;
> >>> +    }
> >>> +
> >>>      default:
> >>>          ret = -ENOSYS;
> >>>          break;
> >>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >>> index 1e6aa5d799b9..7e914d1eff9f 100644
> >>> --- a/xen/drivers/vpci/vpci.c
> >>> +++ b/xen/drivers/vpci/vpci.c
> >>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
> >>>  
> >>>      return rc;
> >>>  }
> >>> +
> >>> +int vpci_reset_device_state(struct pci_dev *pdev,
> >>> +                            uint32_t reset_type)
> >>
> >> There's probably no use in passing reset_type to
> >> vpci_reset_device_state() if it's ignored?
> >>
> >>> +{
> >>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> >>> +
> >>> +    vpci_deassign_device(pdev);
> >>> +    return vpci_assign_device(pdev);
> >>> +}
> >>> +
> >>>  #endif /* __XEN__ */
> >>>  
> >>>  static int vpci_register_cmp(const struct vpci_register *r1,
> >>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> >>> index f0c0d4727c0b..3cfde3fd2389 100644
> >>> --- a/xen/include/public/physdev.h
> >>> +++ b/xen/include/public/physdev.h
> >>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
> >>>   */
> >>>  #define PHYSDEVOP_prepare_msix          30
> >>>  #define PHYSDEVOP_release_msix          31
> >>> +/*
> >>> + * Notify the hypervisor that a PCI device has been reset, so that any
> >>> + * internally cached state is regenerated.  Should be called after any
> >>> + * device reset performed by the hardware domain.
> >>> + */
> >>> +#define PHYSDEVOP_pci_device_state_reset 32
> >>> +
> >>>  struct physdev_pci_device {
> >>>      /* IN */
> >>>      uint16_t seg;
> >>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
> >>>  typedef struct physdev_pci_device physdev_pci_device_t;
> >>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
> >>>  
> >>> +struct pci_device_state_reset {
> >>> +    physdev_pci_device_t dev;
> >>> +#define PCI_DEVICE_STATE_RESET_COLD 0
> >>> +#define PCI_DEVICE_STATE_RESET_WARM 1
> >>> +#define PCI_DEVICE_STATE_RESET_HOT  2
> >>> +#define PCI_DEVICE_STATE_RESET_FLR  3
> >>> +    uint32_t reset_type;
> >>
> >> This might want to be a flags field, with the low 2 bits (or maybe 3
> >> bits to cope if more rest modes are added in the future) being used to
> >> signal the reset type.  We can always do that later if flags need to
> >> be added.
> > Do you mean this?
> > +struct pci_device_state_reset {
> > +    physdev_pci_device_t dev;
> > +#define _PCI_DEVICE_STATE_RESET_COLD 0
> > +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
> > +#define _PCI_DEVICE_STATE_RESET_WARM 1
> > +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
> > +#define _PCI_DEVICE_STATE_RESET_HOT  2
> > +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
> > +#define _PCI_DEVICE_STATE_RESET_FLR  3
> > +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
> > +    uint32_t reset_type;
> > +};
> 
> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
> like #define-s, but additionally define a 2-bit mask constant (0x3). I
> don't think it needs to be three bits right away - we can decide what to
> do there when any of the higher bits are to be assigned a meaning.

Indeed, what I was requesting is just a cosmetic change, it doesn't
result in the values on the enum changing at all.

The field however should be better named "flags" or something more
generic so in the future it can accommodate other flags not related to
the reset type.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@  long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_pci_device_state_reset:
     case PHYSDEVOP_dbgp_op:
         if ( !is_hardware_domain(currd) )
             return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..c0f47945d955 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@ 
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/vpci.h>
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@  ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case PHYSDEVOP_pci_device_state_reset:
+    {
+        struct pci_device_state_reset dev_reset;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        ret = -EOPNOTSUPP;
+        if ( !is_pci_passthrough_enabled() )
+            break;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+            break;
+
+        sbdf = PCI_SBDF(dev_reset.dev.seg,
+                        dev_reset.dev.bus,
+                        dev_reset.dev.devfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        switch ( dev_reset.reset_type )
+        {
+        case PCI_DEVICE_STATE_RESET_COLD:
+        case PCI_DEVICE_STATE_RESET_WARM:
+        case PCI_DEVICE_STATE_RESET_HOT:
+        case PCI_DEVICE_STATE_RESET_FLR:
+            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+            break;
+
+        default:
+            ret = -EOPNOTSUPP;
+            break;
+        }
+        write_unlock(&pdev->domain->pci_lock);
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@  int vpci_assign_device(struct pci_dev *pdev)
 
     return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+                            uint32_t reset_type)
+{
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..3cfde3fd2389 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix          30
 #define PHYSDEVOP_release_msix          31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
     /* IN */
     uint16_t seg;
@@ -305,6 +312,15 @@  struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+    physdev_pci_device_t dev;
+#define PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_FLR  3
+    uint32_t reset_type;
+};
+
 #define PHYSDEVOP_DBGP_RESET_PREPARE    1
 #define PHYSDEVOP_DBGP_RESET_DONE       2
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..6be812dbc04a 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -38,6 +38,8 @@  int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev,
+                                         uint32_t reset_type);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -282,6 +284,12 @@  static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
+                                                       uint32_t reset_type)
+{
+    return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,