Message ID | 20191008034238.2503-1-jian-hong@endlessm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/MSI: Fix restoring of MSI-X vector control's mask bit | expand |
On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote: > MSI-X vector control's bit 0 is the mask bit, which masks the > corresponding interrupt request, or not. Other reserved bits might be > used for other purpose by device vendors. For example, the values of > Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other > values [1]. > > The original restoring logic in system resuming uses the whole MSI-X > vector control value as the flag to set/clear the mask bit. However, > this logic conflicts the idea mentioned above. It may mislead system to > disable the MSI-X vector entries. That makes system get no interrupt > from Kingston NVMe SSD after resume and usually get NVMe I/O timeout > error. > > [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled > > This patch takes only the mask bit of original MSI-X vector control > value as the flag to fix this issue. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8 > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 > Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> > --- > drivers/pci/msi.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 0884bedcfc7a..deae3d5acaf6 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) > static void __pci_restore_msix_state(struct pci_dev *dev) > { > struct msi_desc *entry; > + u32 flag; > > if (!dev->msix_enabled) > return; > @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev) > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); > > arch_restore_msi_irqs(dev); > - for_each_pci_msi_entry(entry, dev) > - msix_mask_irq(entry, entry->masked); > + for_each_pci_msi_entry(entry, dev) { > + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT; > + msix_mask_irq(entry, flag); This makes good sense: before your patch, when we restore, we set the mask bit if *any* bits in the Vector Control register are set. There are other paths leading to __pci_msix_desc_mask_irq(), so I think it would be better to do the masking there, e.g., if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; I think the other paths all pass either 0 or 1, so they're all safe today. But doing the masking in __pci_msix_desc_mask_irq() removes that assumption from the callers. I applied the patch below to pci/msi, let me know if it doesn't work for you. > + } > > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); > } commit 1a828a554650 Author: Jian-Hong Pan <jian-hong@endlessm.com> Date: Tue Oct 8 11:42:39 2019 +0800 PCI/MSI: Fix incorrect MSI-X masking on resume When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector Control register for each vector and saves it in desc->masked. Each register is 32 bits and bit 0 is the actual Mask bit. When we restored these registers during resume, we previously set the Mask bit if *any* bit in desc->masked was set instead of when the Mask bit itself was set: pci_restore_state pci_restore_msi_state __pci_restore_msix_state for_each_pci_msi_entry msix_mask_irq(entry, entry->masked) <-- entire u32 word __pci_msix_desc_mask_irq(desc, flag) mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT if (flag) <-- testing entire u32, not just bit 0 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) This means that after resume, MSI-X vectors were masked when they shouldn't be, which leads to timeouts like this: nvme nvme0: I/O 978 QID 3 timeout, completion polled On resume, set the Mask bit only when the saved Mask bit from suspend was set. This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"). [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 0884bedcfc7a..771041784e64 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) if (pci_msi_ignore_mask) return 0; + desc_addr = pci_msix_desc_addr(desc); if (!desc_addr) return 0; mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; - if (flag) + if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
On Wed, Oct 23, 2019 at 04:12:35PM -0500, Bjorn Helgaas wrote: > On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote: > > MSI-X vector control's bit 0 is the mask bit, which masks the > > corresponding interrupt request, or not. Other reserved bits might be > > used for other purpose by device vendors. For example, the values of > > Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other > > values [1]. > > > > The original restoring logic in system resuming uses the whole MSI-X > > vector control value as the flag to set/clear the mask bit. However, > > this logic conflicts the idea mentioned above. It may mislead system to > > disable the MSI-X vector entries. That makes system get no interrupt > > from Kingston NVMe SSD after resume and usually get NVMe I/O timeout > > error. > > > > [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled > > > > This patch takes only the mask bit of original MSI-X vector control > > value as the flag to fix this issue. > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8 > > > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 > > Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") > > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> > > --- > > drivers/pci/msi.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index 0884bedcfc7a..deae3d5acaf6 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) > > static void __pci_restore_msix_state(struct pci_dev *dev) > > { > > struct msi_desc *entry; > > + u32 flag; > > > > if (!dev->msix_enabled) > > return; > > @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev) > > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); > > > > arch_restore_msi_irqs(dev); > > - for_each_pci_msi_entry(entry, dev) > > - msix_mask_irq(entry, entry->masked); > > + for_each_pci_msi_entry(entry, dev) { > > + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT; > > + msix_mask_irq(entry, flag); > > This makes good sense: before your patch, when we restore, we set the > mask bit if *any* bits in the Vector Control register are set. > > There are other paths leading to __pci_msix_desc_mask_irq(), so I > think it would be better to do the masking there, e.g., > > if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > > I think the other paths all pass either 0 or 1, so they're all safe > today. But doing the masking in __pci_msix_desc_mask_irq() removes > that assumption from the callers. > > I applied the patch below to pci/msi, let me know if it doesn't work > for you. > > > + } > > > > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); > > } > > commit 1a828a554650 > Author: Jian-Hong Pan <jian-hong@endlessm.com> > Date: Tue Oct 8 11:42:39 2019 +0800 > > PCI/MSI: Fix incorrect MSI-X masking on resume > > When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector > Control register for each vector and saves it in desc->masked. Each > register is 32 bits and bit 0 is the actual Mask bit. > > When we restored these registers during resume, we previously set the Mask > bit if *any* bit in desc->masked was set instead of when the Mask bit > itself was set: > > pci_restore_state > pci_restore_msi_state > __pci_restore_msix_state > for_each_pci_msi_entry > msix_mask_irq(entry, entry->masked) <-- entire u32 word > __pci_msix_desc_mask_irq(desc, flag) > mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT > if (flag) <-- testing entire u32, not just bit 0 > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) > > This means that after resume, MSI-X vectors were masked when they shouldn't > be, which leads to timeouts like this: > > nvme nvme0: I/O 978 QID 3 timeout, completion polled > > On resume, set the Mask bit only when the saved Mask bit from suspend was > set. > > This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston > NVME SSD running FW E8FK11.T"). > > [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] > Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 > Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> I forgot; this probably should be marked for stable, too. > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 0884bedcfc7a..771041784e64 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) > > if (pci_msi_ignore_mask) > return 0; > + > desc_addr = pci_msix_desc_addr(desc); > if (!desc_addr) > return 0; > > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; > - if (flag) > + if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
Bjorn Helgaas <helgaas@kernel.org> 於 2019年10月24日 週四 上午5:43寫道: > > On Wed, Oct 23, 2019 at 04:12:35PM -0500, Bjorn Helgaas wrote: > > On Tue, Oct 08, 2019 at 11:42:39AM +0800, Jian-Hong Pan wrote: > > > MSI-X vector control's bit 0 is the mask bit, which masks the > > > corresponding interrupt request, or not. Other reserved bits might be > > > used for other purpose by device vendors. For example, the values of > > > Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other > > > values [1]. > > > > > > The original restoring logic in system resuming uses the whole MSI-X > > > vector control value as the flag to set/clear the mask bit. However, > > > this logic conflicts the idea mentioned above. It may mislead system to > > > disable the MSI-X vector entries. That makes system get no interrupt > > > from Kingston NVMe SSD after resume and usually get NVMe I/O timeout > > > error. > > > > > > [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled > > > > > > This patch takes only the mask bit of original MSI-X vector control > > > value as the flag to fix this issue. > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8 > > > > > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 > > > Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") > > > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> > > > --- > > > drivers/pci/msi.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > > index 0884bedcfc7a..deae3d5acaf6 100644 > > > --- a/drivers/pci/msi.c > > > +++ b/drivers/pci/msi.c > > > @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) > > > static void __pci_restore_msix_state(struct pci_dev *dev) > > > { > > > struct msi_desc *entry; > > > + u32 flag; > > > > > > if (!dev->msix_enabled) > > > return; > > > @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev) > > > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); > > > > > > arch_restore_msi_irqs(dev); > > > - for_each_pci_msi_entry(entry, dev) > > > - msix_mask_irq(entry, entry->masked); > > > + for_each_pci_msi_entry(entry, dev) { > > > + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT; > > > + msix_mask_irq(entry, flag); > > > > This makes good sense: before your patch, when we restore, we set the > > mask bit if *any* bits in the Vector Control register are set. > > > > There are other paths leading to __pci_msix_desc_mask_irq(), so I > > think it would be better to do the masking there, e.g., > > > > if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) > > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > > > > I think the other paths all pass either 0 or 1, so they're all safe > > today. But doing the masking in __pci_msix_desc_mask_irq() removes > > that assumption from the callers. > > > > I applied the patch below to pci/msi, let me know if it doesn't work > > for you. > > > > > + } > > > > > > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); > > > } > > > > commit 1a828a554650 > > Author: Jian-Hong Pan <jian-hong@endlessm.com> > > Date: Tue Oct 8 11:42:39 2019 +0800 > > > > PCI/MSI: Fix incorrect MSI-X masking on resume > > > > When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector > > Control register for each vector and saves it in desc->masked. Each > > register is 32 bits and bit 0 is the actual Mask bit. > > > > When we restored these registers during resume, we previously set the Mask > > bit if *any* bit in desc->masked was set instead of when the Mask bit > > itself was set: > > > > pci_restore_state > > pci_restore_msi_state > > __pci_restore_msix_state > > for_each_pci_msi_entry > > msix_mask_irq(entry, entry->masked) <-- entire u32 word > > __pci_msix_desc_mask_irq(desc, flag) > > mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT > > if (flag) <-- testing entire u32, not just bit 0 > > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT > > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) > > > > This means that after resume, MSI-X vectors were masked when they shouldn't > > be, which leads to timeouts like this: > > > > nvme nvme0: I/O 978 QID 3 timeout, completion polled > > > > On resume, set the Mask bit only when the saved Mask bit from suspend was > > set. > > > > This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston > > NVME SSD running FW E8FK11.T"). > > > > [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] > > Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 > > Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com > > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> The modified patch also fixes the issue. > I forgot; this probably should be marked for stable, too. Oh! Yes, please! Thank you, Jian-Hong Pan > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index 0884bedcfc7a..771041784e64 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -213,12 +213,13 @@ u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) > > > > if (pci_msi_ignore_mask) > > return 0; > > + > > desc_addr = pci_msix_desc_addr(desc); > > if (!desc_addr) > > return 0; > > > > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; > > - if (flag) > > + if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) > > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > > > > writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); > > > > _______________________________________________ > > Linux-nvme mailing list > > Linux-nvme@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-nvme
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 0884bedcfc7a..deae3d5acaf6 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -433,6 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) static void __pci_restore_msix_state(struct pci_dev *dev) { struct msi_desc *entry; + u32 flag; if (!dev->msix_enabled) return; @@ -444,8 +445,10 @@ static void __pci_restore_msix_state(struct pci_dev *dev) PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); arch_restore_msi_irqs(dev); - for_each_pci_msi_entry(entry, dev) - msix_mask_irq(entry, entry->masked); + for_each_pci_msi_entry(entry, dev) { + flag = entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT; + msix_mask_irq(entry, flag); + } pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); }
MSI-X vector control's bit 0 is the mask bit, which masks the corresponding interrupt request, or not. Other reserved bits might be used for other purpose by device vendors. For example, the values of Kingston NVMe SSD's MSI-X vector control are neither 0, nor 1, but other values [1]. The original restoring logic in system resuming uses the whole MSI-X vector control value as the flag to set/clear the mask bit. However, this logic conflicts the idea mentioned above. It may mislead system to disable the MSI-X vector entries. That makes system get no interrupt from Kingston NVMe SSD after resume and usually get NVMe I/O timeout error. [ 174.715534] nvme nvme0: I/O 978 QID 3 timeout, completion polled This patch takes only the mask bit of original MSI-X vector control value as the flag to fix this issue. [1] https://bugzilla.kernel.org/show_bug.cgi?id=204887#c8 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Fixed: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> --- drivers/pci/msi.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)