mbox series

[v3,0/6] pci-hyper: Fix race condition bugs for fast device hotplug

Message ID 20230420024037.5921-1-decui@microsoft.com (mailing list archive)
Headers show
Series pci-hyper: Fix race condition bugs for fast device hotplug | expand

Message

Dexuan Cui April 20, 2023, 2:40 a.m. UTC
Before the guest finishes probing a device, the host may be already starting
to remove the device. Currently there are multiple race condition bugs in the
pci-hyperv driver, which can cause the guest to panic.  The patchset fixes
the crashes.

The patchset also does some cleanup work: patch 3 removes the useless
hv_pcichild_state, and patch 4 reverts an old patch which is not really
useful (without patch 4, it would be hard to make patch 5 clean).

Patch 6 removes the use of a global mutex lock, and enables async-probing
to allow concurrent device probing for faster boot.

v3 is based on v6.3-rc5. No code change since v2. I just added Michael's
and Long Li's Reviewed-by.

The patchset is also availsble in my github branch:
https://github.com/dcui/tdx/commits/decui/vpci/v6.3-rc5-v3

v2 can be found here:
https://lwn.net/ml/linux-kernel/20230404020545.32359-1-decui@microsoft.com/

Please review. Thanks!


Dexuan Cui (6):
  PCI: hv: Fix a race condition bug in hv_pci_query_relations()
  PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
  PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
  Revert "PCI: hv: Fix a timing issue which causes kdump to fail
    occasionally"
  PCI: hv: Add a per-bus mutex state_lock
  PCI: hv: Use async probing to reduce boot time

 drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++-----------
 1 file changed, 86 insertions(+), 59 deletions(-)

Comments

Dexuan Cui April 21, 2023, 2:04 a.m. UTC | #1
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, April 19, 2023 7:41 PM
> ...
> Before the guest finishes probing a device, the host may be already starting
> to remove the device. Currently there are multiple race condition bugs in the
> pci-hyperv driver, which can cause the guest to panic.  The patchset fixes
> the crashes.
> 
> The patchset also does some cleanup work: patch 3 removes the useless
> hv_pcichild_state, and patch 4 reverts an old patch which is not really
> useful (without patch 4, it would be hard to make patch 5 clean).
> 
> Patch 6 removes the use of a global mutex lock, and enables async-probing
> to allow concurrent device probing for faster boot.
> 
> v3 is based on v6.3-rc5. No code change since v2. I just added Michael's
> and Long Li's Reviewed-by.
> ...
> 
> Dexuan Cui (6):
>   PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>   PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
>   PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
>   Revert "PCI: hv: Fix a timing issue which causes kdump to fail
>     occasionally"
>   PCI: hv: Add a per-bus mutex state_lock
>   PCI: hv: Use async probing to reduce boot time
> 
>  drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++-----------
>  1 file changed, 86 insertions(+), 59 deletions(-)

Hi Bjorn, Lorenzo, since basically this patchset is Hyper-V stuff, I would
like it to go through the hyper-v tree if you have no objection.

The hyper-v tree already has one big PCI patch from Michael:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-next&id=2c6ba4216844ca7918289b49ed5f3f7138ee2402

Thanks,
Dexuan
Dexuan Cui April 21, 2023, 10:23 p.m. UTC | #2
> From: Dexuan Cui
> Sent: Thursday, April 20, 2023 7:04 PM
> > ...
> >
> > Dexuan Cui (6):
> >   PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> >   PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
> >   PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
> >   Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> >     occasionally"
> >   PCI: hv: Add a per-bus mutex state_lock
> >   PCI: hv: Use async probing to reduce boot time
> >
> >  drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++-----------
> >  1 file changed, 86 insertions(+), 59 deletions(-)
> 
> Hi Bjorn, Lorenzo, since basically this patchset is Hyper-V stuff, I would
> like it to go through the hyper-v tree if you have no objection.
> 
> The hyper-v tree already has one big PCI patch from Michael:
> https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=
> hyperv-next&id=2c6ba4216844ca7918289b49ed5f3f7138ee2402
> 
> Thanks,
> Dexuan

Hi Lorenzo, thanks for Ack'ing the patch:
  Re: [PATCH v2] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg

It would be great if you and/or Bjorn can Ack this patchset as well :-)

v1 of this patchset was posted on 3/28:
https://lwn.net/ml/linux-kernel/20230328045122.25850-1-decui%40microsoft.com/
and v3 got Michael Kelley's and Long Li's Reviewed-by.

I have done a long-haul testing against the patchset and it worked
reliably without causing any issue: without the patchset, usually the VM
can crash within 1~2 days; with the patchset, the VM is still running fine 
after 2 weeks.

Thanks,
Dexuan
Wei Liu May 8, 2023, 4:52 p.m. UTC | #3
On Fri, Apr 21, 2023 at 10:23:03PM +0000, Dexuan Cui wrote:
> > From: Dexuan Cui
> > Sent: Thursday, April 20, 2023 7:04 PM
> > > ...
> > >
> > > Dexuan Cui (6):
> > >   PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> > >   PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
> > >   PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
> > >   Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> > >     occasionally"
> > >   PCI: hv: Add a per-bus mutex state_lock
> > >   PCI: hv: Use async probing to reduce boot time
> > >
> > >  drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++-----------
> > >  1 file changed, 86 insertions(+), 59 deletions(-)
> > 
> > Hi Bjorn, Lorenzo, since basically this patchset is Hyper-V stuff, I would
> > like it to go through the hyper-v tree if you have no objection.
> > 
> > The hyper-v tree already has one big PCI patch from Michael:
> > https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=
> > hyperv-next&id=2c6ba4216844ca7918289b49ed5f3f7138ee2402
> > 
> > Thanks,
> > Dexuan
> 
> Hi Lorenzo, thanks for Ack'ing the patch:
>   Re: [PATCH v2] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
> 
> It would be great if you and/or Bjorn can Ack this patchset as well :-)
> 

Lorenzo and Bjorn, are you happy with these patches? I can collect them
via the hyperv-fixes tree.

Thanks,
Wei.

> v1 of this patchset was posted on 3/28:
> https://lwn.net/ml/linux-kernel/20230328045122.25850-1-decui%40microsoft.com/
> and v3 got Michael Kelley's and Long Li's Reviewed-by.
> 
> I have done a long-haul testing against the patchset and it worked
> reliably without causing any issue: without the patchset, usually the VM
> can crash within 1~2 days; with the patchset, the VM is still running fine 
> after 2 weeks.
> 
> Thanks,
> Dexuan