diff mbox series

[RFC,net-next,04/12] PCI: Add no PM reset quirk for NVIDIA Spectrum devices

Message ID 20231017074257.3389177-5-idosch@nvidia.com (mailing list archive)
State Handled Elsewhere
Delegated to: Bjorn Helgaas
Headers show
Series mlxsw: Add support for new reset flow | expand

Commit Message

Ido Schimmel Oct. 17, 2023, 7:42 a.m. UTC
Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a
reset (i.e., they advertise NoSoftRst-). However, this transition seems
to have no effect on the device: It continues to be operational and
network ports remain up. Advertising this support makes it seem as if a
PM reset is viable for these devices. Mark it as unavailable to skip it
when testing reset methods.

Before:

 # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
 pm bus

After:

 # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
 bus

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/pci/quirks.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Bjorn Helgaas Oct. 18, 2023, 7:40 p.m. UTC | #1
On Tue, Oct 17, 2023 at 10:42:49AM +0300, Ido Schimmel wrote:
> Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a
> reset (i.e., they advertise NoSoftRst-). However, this transition seems
> to have no effect on the device: It continues to be operational and
> network ports remain up. Advertising this support makes it seem as if a
> PM reset is viable for these devices. Mark it as unavailable to skip it
> when testing reset methods.
> 
> Before:
> 
>  # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
>  pm bus
> 
> After:
> 
>  # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
>  bus
> 
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Hopefully since these are NVIDIA parts and you work at NVIDIA, this is
stronger than "this transition *seems* to have no effect" :)

The spec actually says NoSoftRst- means internal state is "undefined"
after a D3hot->D0 transition, so preserving it would not be a defect
per spec.  The kernel assumption that NoSoftRst- means the device will
do a reset is perhaps a little too aggressive.

> ---
>  drivers/pci/quirks.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index eeec1d6f9023..23f6bd2184e2 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3784,6 +3784,19 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
>  			       PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>  
> +/*
> + * Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a reset
> + * (i.e., they advertise NoSoftRst-). However, this transition seems to have no
> + * effect on the device: It continues to be operational and network ports
> + * remain up. Advertising this support makes it seem as if a PM reset is viable
> + * for these devices. Mark it as unavailable to skip it when testing reset
> + * methods.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcb84, quirk_no_pm_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf6c, quirk_no_pm_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf70, quirk_no_pm_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf80, quirk_no_pm_reset);
> +
>  /*
>   * Thunderbolt controllers with broken MSI hotplug signaling:
>   * Entire 1st generation (Light Ridge, Eagle Ridge, Light Peak) and part
> -- 
> 2.40.1
>
Ido Schimmel Oct. 22, 2023, 8:23 a.m. UTC | #2
On Wed, Oct 18, 2023 at 02:40:41PM -0500, Bjorn Helgaas wrote:
> On Tue, Oct 17, 2023 at 10:42:49AM +0300, Ido Schimmel wrote:
> > Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a
> > reset (i.e., they advertise NoSoftRst-). However, this transition seems
> > to have no effect on the device: It continues to be operational and
> > network ports remain up. Advertising this support makes it seem as if a
> > PM reset is viable for these devices. Mark it as unavailable to skip it
> > when testing reset methods.
> > 
> > Before:
> > 
> >  # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
> >  pm bus
> > 
> > After:
> > 
> >  # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
> >  bus
> > 
> > Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Hopefully since these are NVIDIA parts and you work at NVIDIA, this is
> stronger than "this transition *seems* to have no effect" :)

Yes. Reworded to "this transition does not have any effect on the
device" and kept your tag.

FYI, new devices will not advertise support for PM reset so I don't
expect this list to grow.
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index eeec1d6f9023..23f6bd2184e2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3784,6 +3784,19 @@  static void quirk_no_pm_reset(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
 			       PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
 
+/*
+ * Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a reset
+ * (i.e., they advertise NoSoftRst-). However, this transition seems to have no
+ * effect on the device: It continues to be operational and network ports
+ * remain up. Advertising this support makes it seem as if a PM reset is viable
+ * for these devices. Mark it as unavailable to skip it when testing reset
+ * methods.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcb84, quirk_no_pm_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf6c, quirk_no_pm_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf70, quirk_no_pm_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf80, quirk_no_pm_reset);
+
 /*
  * Thunderbolt controllers with broken MSI hotplug signaling:
  * Entire 1st generation (Light Ridge, Eagle Ridge, Light Peak) and part