Message ID | 20200905025450.45528-1-decui@microsoft.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Lorenzo Pieralisi |
Headers | show |
Series | PCI: hv: Fix hibernation in case interrupts are not re-created | expand |
> -----Original Message----- > From: Dexuan Cui <decui@microsoft.com> > Sent: Friday, September 4, 2020 7:55 PM > To: wei.liu@kernel.org; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang > <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; > lorenzo.pieralisi@arm.com; bhelgaas@google.com; linux-hyperv@vger.kernel.org; > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Michael Kelley > <mikelley@microsoft.com> > Cc: Dexuan Cui <decui@microsoft.com>; Jake Oshins <jakeo@microsoft.com> > Subject: [PATCH] PCI: hv: Fix hibernation in case interrupts are not re-created > > Hyper-V doesn't trap and emulate the accesses to the MSI/MSI-X registers, and we > must use hv_compose_msi_msg() to ask Hyper-V to create the IOMMU Interrupt > Remapping Table Entries. This is not an issue for a lot of PCI device drivers (e.g. > NVMe driver, Mellanox NIC drivers), which destroy and re-create the interrupts > across hibernation, so > hv_compose_msi_msg() is called automatically. However, some other PCI device > drivers (e.g. the Nvidia driver) may not destroy and re-create the interrupts across > hibernation, so hv_pci_resume() has to call hv_compose_msi_msg(), otherwise the > PCI device drivers can no longer receive MSI/MSI-X interrupts after hibernation. > > Fixes: ac82fc832708 ("PCI: hv: Add hibernation support") > Cc: Jake Oshins <jakeo@microsoft.com> > Signed-off-by: Dexuan Cui <decui@microsoft.com> > --- > drivers/pci/controller/pci-hyperv.c | 44 +++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index fc4c3a15e570..abefff9a20e1 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -1211,6 +1211,21 @@ static void hv_irq_unmask(struct irq_data *data) > pbus = pdev->bus; > hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); > > + if (hbus->state == hv_pcibus_removing) { > + /* > + * During hibernatin, when a CPU is offlined, the kernel tries > + * to move the interrupt to the remaining CPUs that haven't > + * been offlined yet. In this case, the below hv_do_hypercall() > + * always fails since the vmbus channel has been closed, so we > + * should not call the hypercall, but we still need > + * pci_msi_unmask_irq() to reset the mask bit in desc->masked: > + * see cpu_disable_common() -> fixup_irqs() -> > + * irq_migrate_all_off_this_cpu() -> migrate_one_irq(). > + */ > + pci_msi_unmask_irq(data); > + return; > + } > + > spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags); > > params = &hbus->retarget_msi_interrupt_params; > @@ -3372,6 +3387,33 @@ static int hv_pci_suspend(struct hv_device *hdev) > return 0; > } > > +static int hv_pci_restore_msi_msg(struct pci_dev *pdev, void *arg) { > + struct msi_desc *entry; > + struct irq_data *irq_data; > + > + for_each_pci_msi_entry(entry, pdev) { > + irq_data = irq_get_irq_data(entry->irq); > + if (WARN_ON_ONCE(!irq_data)) > + return -EINVAL; > + > + hv_compose_msi_msg(irq_data, &entry->msg); > + } > + > + return 0; > +} > + > +/* > + * Upon resume, pci_restore_msi_state() -> ... -> > +__pci_write_msi_msg() > + * re-writes the MSI/MSI-X registers, but since Hyper-V doesn't trap > +and > + * emulate the accesses, we have to call hv_compose_msi_msg() to ask > + * Hyper-V to re-create the IOMMU Interrupt Remapping Table Entries. > + */ > +static void hv_pci_restore_msi_state(struct hv_pcibus_device *hbus) { > + pci_walk_bus(hbus->pci_bus, hv_pci_restore_msi_msg, NULL); } > + > static int hv_pci_resume(struct hv_device *hdev) { > struct hv_pcibus_device *hbus = hv_get_drvdata(hdev); @@ -3405,6 > +3447,8 @@ static int hv_pci_resume(struct hv_device *hdev) > > prepopulate_bars(hbus); > > + hv_pci_restore_msi_state(hbus); > + > hbus->state = hv_pcibus_installed; > return 0; > out: > -- > 2.19.1 Reviewed-by: Jake Oshins (jakeo@microsoft.com)
From: Dexuan Cui <decui@microsoft.com> Sent: Friday, September 4, 2020 7:55 PM > > Hyper-V doesn't trap and emulate the accesses to the MSI/MSI-X registers, > and we must use hv_compose_msi_msg() to ask Hyper-V to create the IOMMU > Interrupt Remapping Table Entries. This is not an issue for a lot of > PCI device drivers (e.g. NVMe driver, Mellanox NIC drivers), which > destroy and re-create the interrupts across hibernation, so > hv_compose_msi_msg() is called automatically. However, some other PCI > device drivers (e.g. the Nvidia driver) may not destroy and re-create > the interrupts across hibernation, so hv_pci_resume() has to call > hv_compose_msi_msg(), otherwise the PCI device drivers can no longer > receive MSI/MSI-X interrupts after hibernation. > > Fixes: ac82fc832708 ("PCI: hv: Add hibernation support") > Cc: Jake Oshins <jakeo@microsoft.com> > Signed-off-by: Dexuan Cui <decui@microsoft.com> > --- > drivers/pci/controller/pci-hyperv.c | 44 +++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index fc4c3a15e570..abefff9a20e1 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -1211,6 +1211,21 @@ static void hv_irq_unmask(struct irq_data *data) > pbus = pdev->bus; > hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); > > + if (hbus->state == hv_pcibus_removing) { > + /* > + * During hibernatin, when a CPU is offlined, the kernel tries s/hiberatin/hibernation/ > + * to move the interrupt to the remaining CPUs that haven't > + * been offlined yet. In this case, the below hv_do_hypercall() > + * always fails since the vmbus channel has been closed, so we > + * should not call the hypercall, but we still need > + * pci_msi_unmask_irq() to reset the mask bit in desc->masked: > + * see cpu_disable_common() -> fixup_irqs() -> > + * irq_migrate_all_off_this_cpu() -> migrate_one_irq(). > + */ > + pci_msi_unmask_irq(data); > + return; > + } > + > spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags); > > params = &hbus->retarget_msi_interrupt_params; > @@ -3372,6 +3387,33 @@ static int hv_pci_suspend(struct hv_device *hdev) > return 0; > } > > +static int hv_pci_restore_msi_msg(struct pci_dev *pdev, void *arg) > +{ > + struct msi_desc *entry; > + struct irq_data *irq_data; > + > + for_each_pci_msi_entry(entry, pdev) { > + irq_data = irq_get_irq_data(entry->irq); > + if (WARN_ON_ONCE(!irq_data)) > + return -EINVAL; > + > + hv_compose_msi_msg(irq_data, &entry->msg); > + } > + > + return 0; > +} > + > +/* > + * Upon resume, pci_restore_msi_state() -> ... -> __pci_write_msi_msg() > + * re-writes the MSI/MSI-X registers, but since Hyper-V doesn't trap and > + * emulate the accesses, we have to call hv_compose_msi_msg() to ask > + * Hyper-V to re-create the IOMMU Interrupt Remapping Table Entries. > + */ > +static void hv_pci_restore_msi_state(struct hv_pcibus_device *hbus) > +{ > + pci_walk_bus(hbus->pci_bus, hv_pci_restore_msi_msg, NULL); > +} > + > static int hv_pci_resume(struct hv_device *hdev) > { > struct hv_pcibus_device *hbus = hv_get_drvdata(hdev); > @@ -3405,6 +3447,8 @@ static int hv_pci_resume(struct hv_device *hdev) > > prepopulate_bars(hbus); > > + hv_pci_restore_msi_state(hbus); > + > hbus->state = hv_pcibus_installed; > return 0; > out: > -- > 2.19.1
> From: Michael Kelley <mikelley@microsoft.com> > Sent: Tuesday, September 8, 2020 2:16 PM > > @@ -1211,6 +1211,21 @@ static void hv_irq_unmask(struct irq_data *data) > > pbus = pdev->bus; > > hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); > > > > + if (hbus->state == hv_pcibus_removing) { > > + /* > > + * During hibernatin, when a CPU is offlined, the kernel tries > > s/hiberatin/hibernation/ Thanks! I'll post a v2 shortly with this typo fixed, and with Jake's Reviewed-by. Thanks, -- Dexuan
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index fc4c3a15e570..abefff9a20e1 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1211,6 +1211,21 @@ static void hv_irq_unmask(struct irq_data *data) pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); + if (hbus->state == hv_pcibus_removing) { + /* + * During hibernatin, when a CPU is offlined, the kernel tries + * to move the interrupt to the remaining CPUs that haven't + * been offlined yet. In this case, the below hv_do_hypercall() + * always fails since the vmbus channel has been closed, so we + * should not call the hypercall, but we still need + * pci_msi_unmask_irq() to reset the mask bit in desc->masked: + * see cpu_disable_common() -> fixup_irqs() -> + * irq_migrate_all_off_this_cpu() -> migrate_one_irq(). + */ + pci_msi_unmask_irq(data); + return; + } + spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags); params = &hbus->retarget_msi_interrupt_params; @@ -3372,6 +3387,33 @@ static int hv_pci_suspend(struct hv_device *hdev) return 0; } +static int hv_pci_restore_msi_msg(struct pci_dev *pdev, void *arg) +{ + struct msi_desc *entry; + struct irq_data *irq_data; + + for_each_pci_msi_entry(entry, pdev) { + irq_data = irq_get_irq_data(entry->irq); + if (WARN_ON_ONCE(!irq_data)) + return -EINVAL; + + hv_compose_msi_msg(irq_data, &entry->msg); + } + + return 0; +} + +/* + * Upon resume, pci_restore_msi_state() -> ... -> __pci_write_msi_msg() + * re-writes the MSI/MSI-X registers, but since Hyper-V doesn't trap and + * emulate the accesses, we have to call hv_compose_msi_msg() to ask + * Hyper-V to re-create the IOMMU Interrupt Remapping Table Entries. + */ +static void hv_pci_restore_msi_state(struct hv_pcibus_device *hbus) +{ + pci_walk_bus(hbus->pci_bus, hv_pci_restore_msi_msg, NULL); +} + static int hv_pci_resume(struct hv_device *hdev) { struct hv_pcibus_device *hbus = hv_get_drvdata(hdev); @@ -3405,6 +3447,8 @@ static int hv_pci_resume(struct hv_device *hdev) prepopulate_bars(hbus); + hv_pci_restore_msi_state(hbus); + hbus->state = hv_pcibus_installed; return 0; out:
Hyper-V doesn't trap and emulate the accesses to the MSI/MSI-X registers, and we must use hv_compose_msi_msg() to ask Hyper-V to create the IOMMU Interrupt Remapping Table Entries. This is not an issue for a lot of PCI device drivers (e.g. NVMe driver, Mellanox NIC drivers), which destroy and re-create the interrupts across hibernation, so hv_compose_msi_msg() is called automatically. However, some other PCI device drivers (e.g. the Nvidia driver) may not destroy and re-create the interrupts across hibernation, so hv_pci_resume() has to call hv_compose_msi_msg(), otherwise the PCI device drivers can no longer receive MSI/MSI-X interrupts after hibernation. Fixes: ac82fc832708 ("PCI: hv: Add hibernation support") Cc: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> --- drivers/pci/controller/pci-hyperv.c | 44 +++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)