diff mbox series

PCI: hv: Fix a crash in hv_pci_restore_msi_msg() during hibernation

Message ID 20230813001218.19716-1-decui@microsoft.com (mailing list archive)
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series PCI: hv: Fix a crash in hv_pci_restore_msi_msg() during hibernation | expand

Commit Message

Dexuan Cui Aug. 13, 2023, 12:12 a.m. UTC
For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU driver
is installed, hibernating the VM will trigger a panic: if the GPU driver
is not installed and loaded, MSI-X/MSI is not enabled on the device, so
pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
NULL pointer dereference. Fix this by checking pdev->dev.msi.data.

Fixes: dc2b453290c4 ("PCI: hv: Rework MSI handling")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Michael Kelley (LINUX) Aug. 16, 2023, 12:35 a.m. UTC | #1
From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, August 12, 2023 5:12 PM
> 
> For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU driver
> is installed, hibernating the VM will trigger a panic: if the GPU driver
> is not installed and loaded, MSI-X/MSI is not enabled on the device, so
> pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
> NULL pointer dereference. Fix this by checking pdev->dev.msi.data.

Is the scenario here a little broader than just the NVIDIA GPU driver?  For
any virtual PCI device that is presented in the guest VM as a VMBus device,
the driver might not be installed.  There could have been some initial
problem getting the driver installed, or it might have been manually
uninstalled later in the life of the VM.  Also the host might have rescinded
the virtual PCI device and added it back later, creating another opportunity
where the driver might not be loaded.  In any case, it seems like we could
have the VMBus aspects of the device setup, but not the driver for the
device.  This suspend/resume code in pci-hyperv.c is all about handling
the VMBus aspects of the device anyway.

Assuming my thinking is correct, is there some Hyper-V/VMBus setting
owned by the pci-hyperv.c driver that would be better to test here than
the low-level dev.msi.data pointer?  The MSI code rework that added
the descriptor lock encapsulates the internals with appropriate accessor
functions, and reaching in to directly test dev.msi.data violates that
encapsulation.

That said, I think this code works as written.  I'm picking at details.

Michael

> 
> Fixes: dc2b453290c4 ("PCI: hv: Rework MSI handling")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 2d93d0c4f10d..fdd01bfb8e10 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3983,6 +3983,9 @@ static int hv_pci_restore_msi_msg(struct pci_dev *pdev,
> void *arg)
>  	struct msi_desc *entry;
>  	int ret = 0;
> 
> +	if (!pdev->dev.msi.data)
> +		return 0;
> +
>  	msi_lock_descs(&pdev->dev);
>  	msi_for_each_desc(entry, &pdev->dev, MSI_DESC_ASSOCIATED) {
>  		irq_data = irq_get_irq_data(entry->irq);
> --
> 2.25.1
Dexuan Cui Aug. 16, 2023, 3:59 a.m. UTC | #2
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Tuesday, August 15, 2023 5:35 PM
> > [...]
> > For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU
> driver
> > is installed, hibernating the VM will trigger a panic: if the GPU driver
> > is not installed and loaded, MSI-X/MSI is not enabled on the device, so
> > pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
> > NULL pointer dereference. Fix this by checking pdev->dev.msi.data.
> 
> Is the scenario here a little broader than just the NVIDIA GPU driver?  For
> any virtual PCI device that is presented in the guest VM as a VMBus device,
> the driver might not be installed.  There could have been some initial
> problem getting the driver installed, or it might have been manually
> uninstalled later in the life of the VM.  Also the host might have rescinded
> the virtual PCI device and added it back later, creating another opportunity
> where the driver might not be loaded.  In any case, it seems like we could
> have the VMBus aspects of the device setup, but not the driver for the
> device.  This suspend/resume code in pci-hyperv.c is all about handling
> the VMBus aspects of the device anyway.

Good point! The bug also affects other PCI devices, e.g. if I unload mlx5_core
and let the VM with a Mellanox VF NIC hibernate, I hit the same NULL
pointer dereference.

> Assuming my thinking is correct, is there some Hyper-V/VMBus setting
> owned by the pci-hyperv.c driver that would be better to test here than
> the low-level dev.msi.data pointer?  The MSI code rework that added

IMO there is no easy and reliable way in Hyper-V/VMBus/pci-hyperv to 
tell if MSI/MSI-X is enabled for a PCI device. We can potentially track the
MSI/MSI-X irqs in hv_compose_msi_msg() and hv_irq_unmask(), but
IMO that's not very easy and may be inaccurate.

> the descriptor lock encapsulates the internals with appropriate accessor
> functions, and reaching in to directly test dev.msi.data violates that
> encapsulation.

I agree. 

Compared with:
	if (!pdev->dev.msi.data)
		return 0;

I think it's better to use this:
            if (!pdev->msi_enabled && !pdev->msix_enabled)
		return 0;

pdev-> msix_enabled has been used in many drivers, e.g.

"arch/x86/pci/xen.c": xen_pv_teardown_msi_irqs()
"drivers/hid/intel-ish-hid/ipc/pci-ish.c": ish_probe()
"drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c": pvrdma_intr0_handler()
"drivers/scsi/vmw_pvscsi.c": pvscsi_probe()
and more.

So it looks like pdev-> msix_enabled is a legit and stable API.
I'll post v2 with it. I'll update the changelog accordingly.
Please let me know if you have concerns about it.
Michael Kelley (LINUX) Aug. 16, 2023, 4:55 a.m. UTC | #3
From: Dexuan Cui <decui@microsoft.com> Sent: Tuesday, August 15, 2023 9:00 PM
> 
> > From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> > Sent: Tuesday, August 15, 2023 5:35 PM
> > > [...]
> > > For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU driver
> > > is installed, hibernating the VM will trigger a panic: if the GPU driver
> > > is not installed and loaded, MSI-X/MSI is not enabled on the device, so
> > > pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
> > > NULL pointer dereference. Fix this by checking pdev->dev.msi.data.
> >
> > Is the scenario here a little broader than just the NVIDIA GPU driver?  For
> > any virtual PCI device that is presented in the guest VM as a VMBus device,
> > the driver might not be installed.  There could have been some initial
> > problem getting the driver installed, or it might have been manually
> > uninstalled later in the life of the VM.  Also the host might have rescinded
> > the virtual PCI device and added it back later, creating another opportunity
> > where the driver might not be loaded.  In any case, it seems like we could
> > have the VMBus aspects of the device setup, but not the driver for the
> > device.  This suspend/resume code in pci-hyperv.c is all about handling
> > the VMBus aspects of the device anyway.
> 
> Good point! The bug also affects other PCI devices, e.g. if I unload mlx5_core
> and let the VM with a Mellanox VF NIC hibernate, I hit the same NULL
> pointer dereference.
> 
> > Assuming my thinking is correct, is there some Hyper-V/VMBus setting
> > owned by the pci-hyperv.c driver that would be better to test here than
> > the low-level dev.msi.data pointer?  The MSI code rework that added
> 
> IMO there is no easy and reliable way in Hyper-V/VMBus/pci-hyperv to
> tell if MSI/MSI-X is enabled for a PCI device. We can potentially track the
> MSI/MSI-X irqs in hv_compose_msi_msg() and hv_irq_unmask(), but
> IMO that's not very easy and may be inaccurate.
> 
> > the descriptor lock encapsulates the internals with appropriate accessor
> > functions, and reaching in to directly test dev.msi.data violates that
> > encapsulation.
> 
> I agree.
> 
> Compared with:
> 	if (!pdev->dev.msi.data)
> 		return 0;
> 
> I think it's better to use this:
>             if (!pdev->msi_enabled && !pdev->msix_enabled)
> 		return 0;
> 
> pdev-> msix_enabled has been used in many drivers, e.g.
> 
> "arch/x86/pci/xen.c": xen_pv_teardown_msi_irqs()
> "drivers/hid/intel-ish-hid/ipc/pci-ish.c": ish_probe()
> "drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c": pvrdma_intr0_handler()
> "drivers/scsi/vmw_pvscsi.c": pvscsi_probe()
> and more.
> 
> So it looks like pdev-> msix_enabled is a legit and stable API.
> I'll post v2 with it. I'll update the changelog accordingly.
> Please let me know if you have concerns about it.

Sounds good.  I agree that what you propose is a better approach.

Michael
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2d93d0c4f10d..fdd01bfb8e10 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3983,6 +3983,9 @@  static int hv_pci_restore_msi_msg(struct pci_dev *pdev, void *arg)
 	struct msi_desc *entry;
 	int ret = 0;
 
+	if (!pdev->dev.msi.data)
+		return 0;
+
 	msi_lock_descs(&pdev->dev);
 	msi_for_each_desc(entry, &pdev->dev, MSI_DESC_ASSOCIATED) {
 		irq_data = irq_get_irq_data(entry->irq);