diff mbox series

[2/2] PCI: NVMe device specific reset quirk

Message ID 20180723222431.4371.25962.stgit@gimli.home (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: NVMe reset quirk | expand

Commit Message

Alex Williamson July 23, 2018, 10:24 p.m. UTC
Take advantage of NVMe devices using a standard interface to quiesce
the controller prior to reset, including device specific delays before
and after that reset.  This resolves several NVMe device assignment
scenarios with two different vendors.  The Intel DC P3700 controller
has been shown to only work as a VM boot device on the initial VM
startup, failing after reset or reboot, and also fails to initialize
after hot-plug into a VM.  Adding a delay after FLR resolves these
cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
from FLR with the PCI config space reading back as -1.  A reproducible
instance of this behavior is resolved by clearing the enable bit in
the configuration register and waiting for the ready status to clear
(disabling the NVMe controller) prior to FLR.

As all NVMe devices make use of this standard interface and the NVMe
specification also requires PCIe FLR support, we can apply this quirk
to all devices with matching class code.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/quirks.c |  112 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)

Comments

Keith Busch July 23, 2018, 10:45 p.m. UTC | #1
On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> Take advantage of NVMe devices using a standard interface to quiesce
> the controller prior to reset, including device specific delays before
> and after that reset.  This resolves several NVMe device assignment
> scenarios with two different vendors.  The Intel DC P3700 controller
> has been shown to only work as a VM boot device on the initial VM
> startup, failing after reset or reboot, and also fails to initialize
> after hot-plug into a VM.  Adding a delay after FLR resolves these
> cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> from FLR with the PCI config space reading back as -1.  A reproducible
> instance of this behavior is resolved by clearing the enable bit in
> the configuration register and waiting for the ready status to clear
> (disabling the NVMe controller) prior to FLR.
> 
> As all NVMe devices make use of this standard interface and the NVMe
> specification also requires PCIe FLR support, we can apply this quirk
> to all devices with matching class code.

Shouldn't this go in the nvme driver's reset_prepare/reset_done callbacks?
Bjorn Helgaas July 23, 2018, 10:45 p.m. UTC | #2
On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> Take advantage of NVMe devices using a standard interface to quiesce
> the controller prior to reset, including device specific delays before
> and after that reset.  This resolves several NVMe device assignment
> scenarios with two different vendors.  The Intel DC P3700 controller
> has been shown to only work as a VM boot device on the initial VM
> startup, failing after reset or reboot, and also fails to initialize
> after hot-plug into a VM.  Adding a delay after FLR resolves these
> cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> from FLR with the PCI config space reading back as -1.  A reproducible
> instance of this behavior is resolved by clearing the enable bit in
> the configuration register and waiting for the ready status to clear
> (disabling the NVMe controller) prior to FLR.
> 
> As all NVMe devices make use of this standard interface and the NVMe
> specification also requires PCIe FLR support, we can apply this quirk
> to all devices with matching class code.

Do you have any pointers to problem reports or bugzilla entries that
we could include here?

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/quirks.c |  112 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index e72c8742aafa..83853562f220 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -28,6 +28,7 @@
>  #include <linux/platform_data/x86/apple.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/switchtec.h>
> +#include <linux/nvme.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include "pci.h"
>  
> @@ -3669,6 +3670,116 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
>  #define PCI_DEVICE_ID_INTEL_IVB_M_VGA      0x0156
>  #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA     0x0166
>  
> +/* NVMe controller needs delay before testing ready status */
> +#define NVME_QUIRK_CHK_RDY_DELAY	(1 << 0)
> +/* NVMe controller needs post-FLR delay */
> +#define NVME_QUIRK_POST_FLR_DELAY	(1 << 1)
> +
> +static const struct pci_device_id nvme_reset_tbl[] = {
> +	{ PCI_DEVICE(0x1bb1, 0x0100),   /* Seagate Nytro Flash Storage */
> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(0x1c58, 0x0003),   /* HGST adapter */
> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(0x1c58, 0x0023),   /* WDC SN200 adapter */
> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(0x1c5f, 0x0540),   /* Memblaze Pblaze4 adapter */
> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */

We do have PCI_VENDOR_ID_SAMSUNG if you want to use it here.  I
don't see Seagate, HGST, etc.

> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
> +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0953),   /* Intel DC P3700 */
> +		.driver_data = NVME_QUIRK_POST_FLR_DELAY, },
> +	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
> +	{ 0 }
> +};
> +
> +/*
> + * The NVMe specification requires that controllers support PCIe FLR, but
> + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> + * config space) unless the device is quiesced prior to FLR.  Do this for
> + * all NVMe devices by disabling the controller before reset.  Some Intel
> + * controllers also require an additional post-FLR delay or else attempts
> + * to re-enable will timeout, do that here as well with heuristically
> + * determined delay value.  Also maintain the delay between disabling and
> + * checking ready status as used by the native NVMe driver.
> + */
> +static int reset_nvme(struct pci_dev *dev, int probe)
> +{
> +	const struct pci_device_id *id;
> +	void __iomem *bar;
> +	u16 cmd;
> +	u32 cfg;
> +
> +	id = pci_match_id(nvme_reset_tbl, dev);
> +	if (!id || !pcie_has_flr(dev) || !pci_resource_start(dev, 0))
> +		return -ENOTTY;
> +
> +	if (probe)
> +		return 0;
> +
> +	bar = pci_iomap(dev, 0, NVME_REG_CC + sizeof(cfg));
> +	if (!bar)
> +		return -ENOTTY;
> +
> +	pci_read_config_word(dev, PCI_COMMAND, &cmd);
> +	pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY);
> +
> +	cfg = readl(bar + NVME_REG_CC);

Apparently this is part of some NVMe spec and all controllers support
this?  Is there a public reference you could cite for the details?

> +
> +	/* Disable controller if enabled */
> +	if (cfg & NVME_CC_ENABLE) {
> +		u64 cap = readq(bar + NVME_REG_CAP);
> +		unsigned long timeout;
> +
> +		/*
> +		 * Per nvme_disable_ctrl() skip shutdown notification as it
> +		 * could complete commands to the admin queue.  We only intend
> +		 * to quiesce the device before reset.
> +		 */
> +		cfg &= ~(NVME_CC_SHN_MASK | NVME_CC_ENABLE);
> +
> +		writel(cfg, bar + NVME_REG_CC);
> +
> +		/* A heuristic value, matches NVME_QUIRK_DELAY_AMOUNT */
> +		if (id->driver_data & NVME_QUIRK_CHK_RDY_DELAY)
> +			msleep(2300);
> +
> +		/* Cap register provides max timeout in 500ms increments */
> +		timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
> +
> +		for (;;) {
> +			u32 status = readl(bar + NVME_REG_CSTS);
> +
> +			/* Ready status becomes zero on disable complete */
> +			if (!(status & NVME_CSTS_RDY))
> +				break;
> +
> +			msleep(100);
> +
> +			if (time_after(jiffies, timeout)) {
> +				pci_warn(dev, "Timeout waiting for NVMe ready status to clear after disable\n");
> +				break;
> +			}
> +		}
> +	}
> +
> +	pci_iounmap(dev, bar);
> +
> +	/*
> +	 * We could use the optional NVM Subsystem Reset here, hardware
> +	 * supporting this is simply unavailable at the time of this code
> +	 * to validate in comparison to PCIe FLR.  NVMe spec dictates that
> +	 * NVMe devices shall implement PCIe FLR.
> +	 */
> +	pcie_flr(dev);
> +
> +	if (id->driver_data & NVME_QUIRK_POST_FLR_DELAY)
> +		msleep(250); /* Heuristic based on Intel DC P3700 */
> +
> +	return 0;
> +}
> +
>  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
>  		 reset_intel_82599_sfp_virtfn },
> @@ -3678,6 +3789,7 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_ivb_igd },
>  	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
>  		reset_chelsio_generic_dev },
> +	{ PCI_ANY_ID, PCI_ANY_ID, reset_nvme },
>  	{ 0 }
>  };
>  
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
Alex Williamson July 23, 2018, 11:08 p.m. UTC | #3
On Mon, 23 Jul 2018 16:45:08 -0600
Keith Busch <keith.busch@intel.com> wrote:

> On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> > Take advantage of NVMe devices using a standard interface to quiesce
> > the controller prior to reset, including device specific delays before
> > and after that reset.  This resolves several NVMe device assignment
> > scenarios with two different vendors.  The Intel DC P3700 controller
> > has been shown to only work as a VM boot device on the initial VM
> > startup, failing after reset or reboot, and also fails to initialize
> > after hot-plug into a VM.  Adding a delay after FLR resolves these
> > cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> > from FLR with the PCI config space reading back as -1.  A reproducible
> > instance of this behavior is resolved by clearing the enable bit in
> > the configuration register and waiting for the ready status to clear
> > (disabling the NVMe controller) prior to FLR.
> > 
> > As all NVMe devices make use of this standard interface and the NVMe
> > specification also requires PCIe FLR support, we can apply this quirk
> > to all devices with matching class code.  
> 
> Shouldn't this go in the nvme driver's reset_prepare/reset_done callbacks?

The scenario I'm trying to fix is device assignment, the nvme driver
isn't in play there.  The device is bound to the vfio-pci driver at the
time of these resets.  Thanks,

Alex
Alex Williamson July 24, 2018, 12:11 a.m. UTC | #4
On Mon, 23 Jul 2018 17:45:33 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Mon, Jul 23, 2018 at 04:24:31PM -0600, Alex Williamson wrote:
> > Take advantage of NVMe devices using a standard interface to quiesce
> > the controller prior to reset, including device specific delays before
> > and after that reset.  This resolves several NVMe device assignment
> > scenarios with two different vendors.  The Intel DC P3700 controller
> > has been shown to only work as a VM boot device on the initial VM
> > startup, failing after reset or reboot, and also fails to initialize
> > after hot-plug into a VM.  Adding a delay after FLR resolves these
> > cases.  The Samsung SM961/PM961 (960 EVO) sometimes fails to return
> > from FLR with the PCI config space reading back as -1.  A reproducible
> > instance of this behavior is resolved by clearing the enable bit in
> > the configuration register and waiting for the ready status to clear
> > (disabling the NVMe controller) prior to FLR.
> > 
> > As all NVMe devices make use of this standard interface and the NVMe
> > specification also requires PCIe FLR support, we can apply this quirk
> > to all devices with matching class code.  
> 
> Do you have any pointers to problem reports or bugzilla entries that
> we could include here?

Yes, https://bugzilla.redhat.com/show_bug.cgi?id=1592654

This only covers the Intel P3700 issue.  The Samsung issue has been
reported via a couple bugs, but was not reproducible until recently.
Those bugs were previously closed due to lack of information.  I'll add
the above link.

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/quirks.c |  112 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 112 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index e72c8742aafa..83853562f220 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -28,6 +28,7 @@
> >  #include <linux/platform_data/x86/apple.h>
> >  #include <linux/pm_runtime.h>
> >  #include <linux/switchtec.h>
> > +#include <linux/nvme.h>
> >  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
> >  #include "pci.h"
> >  
> > @@ -3669,6 +3670,116 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
> >  #define PCI_DEVICE_ID_INTEL_IVB_M_VGA      0x0156
> >  #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA     0x0166
> >  
> > +/* NVMe controller needs delay before testing ready status */
> > +#define NVME_QUIRK_CHK_RDY_DELAY	(1 << 0)
> > +/* NVMe controller needs post-FLR delay */
> > +#define NVME_QUIRK_POST_FLR_DELAY	(1 << 1)
> > +
> > +static const struct pci_device_id nvme_reset_tbl[] = {
> > +	{ PCI_DEVICE(0x1bb1, 0x0100),   /* Seagate Nytro Flash Storage */
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(0x1c58, 0x0003),   /* HGST adapter */
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(0x1c58, 0x0023),   /* WDC SN200 adapter */
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(0x1c5f, 0x0540),   /* Memblaze Pblaze4 adapter */
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */  
> 
> We do have PCI_VENDOR_ID_SAMSUNG if you want to use it here.  I
> don't see Seagate, HGST, etc.

Oops, cut and pasted those from the nvme driver, I'll use the Samsung
macro.
 
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
> > +		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
> > +	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0953),   /* Intel DC P3700 */
> > +		.driver_data = NVME_QUIRK_POST_FLR_DELAY, },
> > +	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
> > +	{ 0 }
> > +};
> > +
> > +/*
> > + * The NVMe specification requires that controllers support PCIe FLR, but
> > + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> > + * config space) unless the device is quiesced prior to FLR.  Do this for
> > + * all NVMe devices by disabling the controller before reset.  Some Intel
> > + * controllers also require an additional post-FLR delay or else attempts
> > + * to re-enable will timeout, do that here as well with heuristically
> > + * determined delay value.  Also maintain the delay between disabling and
> > + * checking ready status as used by the native NVMe driver.
> > + */
> > +static int reset_nvme(struct pci_dev *dev, int probe)
> > +{
> > +	const struct pci_device_id *id;
> > +	void __iomem *bar;
> > +	u16 cmd;
> > +	u32 cfg;
> > +
> > +	id = pci_match_id(nvme_reset_tbl, dev);
> > +	if (!id || !pcie_has_flr(dev) || !pci_resource_start(dev, 0))
> > +		return -ENOTTY;
> > +
> > +	if (probe)
> > +		return 0;
> > +
> > +	bar = pci_iomap(dev, 0, NVME_REG_CC + sizeof(cfg));
> > +	if (!bar)
> > +		return -ENOTTY;
> > +
> > +	pci_read_config_word(dev, PCI_COMMAND, &cmd);
> > +	pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY);
> > +
> > +	cfg = readl(bar + NVME_REG_CC);  
> 
> Apparently this is part of some NVMe spec and all controllers support
> this?  Is there a public reference you could cite for the details?

Yep, I'll add a link in the comments, it's all public.  Thanks,

Alex

> > +
> > +	/* Disable controller if enabled */
> > +	if (cfg & NVME_CC_ENABLE) {
> > +		u64 cap = readq(bar + NVME_REG_CAP);
> > +		unsigned long timeout;
> > +
> > +		/*
> > +		 * Per nvme_disable_ctrl() skip shutdown notification as it
> > +		 * could complete commands to the admin queue.  We only intend
> > +		 * to quiesce the device before reset.
> > +		 */
> > +		cfg &= ~(NVME_CC_SHN_MASK | NVME_CC_ENABLE);
> > +
> > +		writel(cfg, bar + NVME_REG_CC);
> > +
> > +		/* A heuristic value, matches NVME_QUIRK_DELAY_AMOUNT */
> > +		if (id->driver_data & NVME_QUIRK_CHK_RDY_DELAY)
> > +			msleep(2300);
> > +
> > +		/* Cap register provides max timeout in 500ms increments */
> > +		timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
> > +
> > +		for (;;) {
> > +			u32 status = readl(bar + NVME_REG_CSTS);
> > +
> > +			/* Ready status becomes zero on disable complete */
> > +			if (!(status & NVME_CSTS_RDY))
> > +				break;
> > +
> > +			msleep(100);
> > +
> > +			if (time_after(jiffies, timeout)) {
> > +				pci_warn(dev, "Timeout waiting for NVMe ready status to clear after disable\n");
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> > +	pci_iounmap(dev, bar);
> > +
> > +	/*
> > +	 * We could use the optional NVM Subsystem Reset here, hardware
> > +	 * supporting this is simply unavailable at the time of this code
> > +	 * to validate in comparison to PCIe FLR.  NVMe spec dictates that
> > +	 * NVMe devices shall implement PCIe FLR.
> > +	 */
> > +	pcie_flr(dev);
> > +
> > +	if (id->driver_data & NVME_QUIRK_POST_FLR_DELAY)
> > +		msleep(250); /* Heuristic based on Intel DC P3700 */
> > +
> > +	return 0;
> > +}
> > +
> >  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> >  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
> >  		 reset_intel_82599_sfp_virtfn },
> > @@ -3678,6 +3789,7 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> >  		reset_ivb_igd },
> >  	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> >  		reset_chelsio_generic_dev },
> > +	{ PCI_ANY_ID, PCI_ANY_ID, reset_nvme },
> >  	{ 0 }
> >  };
> >  
> > 
> > 
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index e72c8742aafa..83853562f220 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -28,6 +28,7 @@ 
 #include <linux/platform_data/x86/apple.h>
 #include <linux/pm_runtime.h>
 #include <linux/switchtec.h>
+#include <linux/nvme.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
 
@@ -3669,6 +3670,116 @@  static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
 #define PCI_DEVICE_ID_INTEL_IVB_M_VGA      0x0156
 #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA     0x0166
 
+/* NVMe controller needs delay before testing ready status */
+#define NVME_QUIRK_CHK_RDY_DELAY	(1 << 0)
+/* NVMe controller needs post-FLR delay */
+#define NVME_QUIRK_POST_FLR_DELAY	(1 << 1)
+
+static const struct pci_device_id nvme_reset_tbl[] = {
+	{ PCI_DEVICE(0x1bb1, 0x0100),   /* Seagate Nytro Flash Storage */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(0x1c58, 0x0003),   /* HGST adapter */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(0x1c58, 0x0023),   /* WDC SN200 adapter */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(0x1c5f, 0x0540),   /* Memblaze Pblaze4 adapter */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
+		.driver_data = NVME_QUIRK_CHK_RDY_DELAY, },
+	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0953),   /* Intel DC P3700 */
+		.driver_data = NVME_QUIRK_POST_FLR_DELAY, },
+	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
+	{ 0 }
+};
+
+/*
+ * The NVMe specification requires that controllers support PCIe FLR, but
+ * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
+ * config space) unless the device is quiesced prior to FLR.  Do this for
+ * all NVMe devices by disabling the controller before reset.  Some Intel
+ * controllers also require an additional post-FLR delay or else attempts
+ * to re-enable will timeout, do that here as well with heuristically
+ * determined delay value.  Also maintain the delay between disabling and
+ * checking ready status as used by the native NVMe driver.
+ */
+static int reset_nvme(struct pci_dev *dev, int probe)
+{
+	const struct pci_device_id *id;
+	void __iomem *bar;
+	u16 cmd;
+	u32 cfg;
+
+	id = pci_match_id(nvme_reset_tbl, dev);
+	if (!id || !pcie_has_flr(dev) || !pci_resource_start(dev, 0))
+		return -ENOTTY;
+
+	if (probe)
+		return 0;
+
+	bar = pci_iomap(dev, 0, NVME_REG_CC + sizeof(cfg));
+	if (!bar)
+		return -ENOTTY;
+
+	pci_read_config_word(dev, PCI_COMMAND, &cmd);
+	pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY);
+
+	cfg = readl(bar + NVME_REG_CC);
+
+	/* Disable controller if enabled */
+	if (cfg & NVME_CC_ENABLE) {
+		u64 cap = readq(bar + NVME_REG_CAP);
+		unsigned long timeout;
+
+		/*
+		 * Per nvme_disable_ctrl() skip shutdown notification as it
+		 * could complete commands to the admin queue.  We only intend
+		 * to quiesce the device before reset.
+		 */
+		cfg &= ~(NVME_CC_SHN_MASK | NVME_CC_ENABLE);
+
+		writel(cfg, bar + NVME_REG_CC);
+
+		/* A heuristic value, matches NVME_QUIRK_DELAY_AMOUNT */
+		if (id->driver_data & NVME_QUIRK_CHK_RDY_DELAY)
+			msleep(2300);
+
+		/* Cap register provides max timeout in 500ms increments */
+		timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
+
+		for (;;) {
+			u32 status = readl(bar + NVME_REG_CSTS);
+
+			/* Ready status becomes zero on disable complete */
+			if (!(status & NVME_CSTS_RDY))
+				break;
+
+			msleep(100);
+
+			if (time_after(jiffies, timeout)) {
+				pci_warn(dev, "Timeout waiting for NVMe ready status to clear after disable\n");
+				break;
+			}
+		}
+	}
+
+	pci_iounmap(dev, bar);
+
+	/*
+	 * We could use the optional NVM Subsystem Reset here, hardware
+	 * supporting this is simply unavailable at the time of this code
+	 * to validate in comparison to PCIe FLR.  NVMe spec dictates that
+	 * NVMe devices shall implement PCIe FLR.
+	 */
+	pcie_flr(dev);
+
+	if (id->driver_data & NVME_QUIRK_POST_FLR_DELAY)
+		msleep(250); /* Heuristic based on Intel DC P3700 */
+
+	return 0;
+}
+
 static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
 		 reset_intel_82599_sfp_virtfn },
@@ -3678,6 +3789,7 @@  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 		reset_ivb_igd },
 	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
 		reset_chelsio_generic_dev },
+	{ PCI_ANY_ID, PCI_ANY_ID, reset_nvme },
 	{ 0 }
 };