diff mbox series

[v7,7/8] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs

Message ID 20210608054857.18963-8-ameynarkhede03@gmail.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Expose and manage PCI device reset | expand

Commit Message

Amey Narkhede June 8, 2021, 5:48 a.m. UTC
From: Shanker Donthineni <sdonthineni@nvidia.com>

On select platforms, some Nvidia GPU devices do not work with SBR.
Triggering SBR would leave the device inoperable for the current
system boot. It requires a system hard-reboot to get the GPU device
back to normal operating condition post-SBR. For the affected
devices, enable NO_BUS_RESET quirk to fix the issue.

This issue will be fixed in the next generation of hardware.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
---
 drivers/pci/quirks.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Bjorn Helgaas June 10, 2021, 11:16 p.m. UTC | #1
On Tue, Jun 08, 2021 at 11:18:56AM +0530, Amey Narkhede wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
> 
> On select platforms, some Nvidia GPU devices do not work with SBR.

Interesting that you say "on select platforms."  Apparently SBR does
work for some of these GPUs, but not on all platforms?  If you have
any clarification here, I can still update the commit log.

> Triggering SBR would leave the device inoperable for the current
> system boot. It requires a system hard-reboot to get the GPU device
> back to normal operating condition post-SBR. For the affected
> devices, enable NO_BUS_RESET quirk to fix the issue.
> 
> This issue will be fixed in the next generation of hardware.
> 
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Reviewed-by: Sinan Kaya <okaya@kernel.org>

This patch doesn't seem to have any dependencies or particular
connection to the rest of the reset series, so I applied this patch by
itself to for-linus for v5.13 and marked it for stable.

If that's not right, let me know.

> ---
>  drivers/pci/quirks.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index e86cf4a3b..45a8c3caa 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3546,6 +3546,18 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
>  	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
>  }
>  
> +/*
> + * Some Nvidia GPU devices do not work with bus reset, SBR needs to be
> + * prevented for those affected devices.
> + */
> +static void quirk_nvidia_no_bus_reset(struct pci_dev *dev)
> +{
> +	if ((dev->device & 0xffc0) == 0x2340)
> +		quirk_no_bus_reset(dev);
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> +			 quirk_nvidia_no_bus_reset);
> +
>  /*
>   * Some Atheros AR9xxx and QCA988x chips do not behave after a bus reset.
>   * The device will throw a Link Down error on AER-capable systems and
> -- 
> 2.31.1
>
Shanker R Donthineni June 10, 2021, 11:33 p.m. UTC | #2
Hi Bjorn,

On 6/10/21 6:16 PM, Bjorn Helgaas wrote:
>> Triggering SBR would leave the device inoperable for the current
>> system boot. It requires a system hard-reboot to get the GPU device
>> back to normal operating condition post-SBR. For the affected
>> devices, enable NO_BUS_RESET quirk to fix the issue.
>>
>> This issue will be fixed in the next generation of hardware.
>>
>> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
>> Reviewed-by: Sinan Kaya <okaya@kernel.org>
> This patch doesn't seem to have any dependencies or particular
> connection to the rest of the reset series, so I applied this patch by
> itself to for-linus for v5.13 and marked it for stable.
>
> If that's not right, let me know.
>

Yes, you're right this patch no dependency on reset method series.
Shanker R Donthineni June 10, 2021, 11:43 p.m. UTC | #3
Hi Bjorn,

On 6/10/21 6:16 PM, Bjorn Helgaas wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> On select platforms, some Nvidia GPU devices do not work with SBR.
> Interesting that you say "on select platforms."  Apparently SBR does
> work for some of these GPUs, but not on all platforms?  If you have
> any clarification here, I can still update the commit log.
>
Yes, SBR works for some GPUs but GPUs which are listed in this quirk will
not work and these GPUs are available only on selected server platforms.
I believe commit text reflects the issue but please update if needed. 

-
Bjorn Helgaas June 10, 2021, 11:53 p.m. UTC | #4
On Thu, Jun 10, 2021 at 06:43:26PM -0500, Shanker R Donthineni wrote:
> On 6/10/21 6:16 PM, Bjorn Helgaas wrote:
> >> From: Shanker Donthineni <sdonthineni@nvidia.com>
> >>
> >> On select platforms, some Nvidia GPU devices do not work with SBR.
> > Interesting that you say "on select platforms."  Apparently SBR does
> > work for some of these GPUs, but not on all platforms?  If you have
> > any clarification here, I can still update the commit log.
> >
> Yes, SBR works for some GPUs but GPUs which are listed in this quirk will
> not work and these GPUs are available only on selected server platforms.
> I believe commit text reflects the issue but please update if needed. 

It sounds like there is no actual dependency on the platform.  So even
though these GPUs are only available on certain platforms, if one were
to move one of them to a different, non-supported platform, SBR would
still not work.

So I think I'll remove the reference to "select platforms" since it
doesn't add any useful information and might suggest that SBR should
work on some platforms, if you could only find the right ones.

Bjorn
Shanker R Donthineni June 11, 2021, 4:15 a.m. UTC | #5
Hi Bjorn,

On 6/10/21 6:53 PM, Bjorn Helgaas wrote:
> It sounds like there is no actual dependency on the platform.  So even
> though these GPUs are only available on certain platforms, if one were
> to move one of them to a different, non-supported platform, SBR would
> still not work.
>
> So I think I'll remove the reference to "select platforms" since it
> doesn't add any useful information and might suggest that SBR should
> work on some platforms, if you could only find the right ones.

Appreciate your time on code review, providing better text, and picking patch
for v5.14. Please let us know if any code improvements or suggestions for the
remaining reset patch series to be considered for v5.14
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index e86cf4a3b..45a8c3caa 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3546,6 +3546,18 @@  static void quirk_no_bus_reset(struct pci_dev *dev)
 	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 }
 
+/*
+ * Some Nvidia GPU devices do not work with bus reset, SBR needs to be
+ * prevented for those affected devices.
+ */
+static void quirk_nvidia_no_bus_reset(struct pci_dev *dev)
+{
+	if ((dev->device & 0xffc0) == 0x2340)
+		quirk_no_bus_reset(dev);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
+			 quirk_nvidia_no_bus_reset);
+
 /*
  * Some Atheros AR9xxx and QCA988x chips do not behave after a bus reset.
  * The device will throw a Link Down error on AER-capable systems and