diff mbox series

drivers/pci: wait downstream hierarchy ready instead of slot itself ready, after secondary bus reset

Message ID 20220516173047.123317-1-windy.bi.enflame@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series drivers/pci: wait downstream hierarchy ready instead of slot itself ready, after secondary bus reset | expand

Commit Message

Sheng Bi May 16, 2022, 5:30 p.m. UTC
While I do reset test of a PCIe endpoint device on a server, I find that
the EP device always been removed and re-inserted again by hotplug module,
 after secondary bus reset.

After checking I find:
1> "pciehp_reset_slot()" always disable slot's DLLSC interrupt before
   doing reset and restore after reset, to try to filter the hotplug
   event happened during reset.
2> "pci_bridge_secondary_bus_reset()" sleep 1 seconad and "pci_dev_wait()"
   until device ready with "PCIE_RESET_READY_POLL_MS" timeout.
3> There is a PCIe switch between CPU and the EP devicem the topology as:
   CPU <-> Switch <-> EP.
4> While trigger sbr reset at the switch's downstream port, it needs 1.5
   seconds for internal enumeration.

About why 1.5 seconds ready time is not filtered by "pci_dev_wait()" with
"PCIE_RESET_READY_POLL_MS" timeout, I find it is because in
"pci_bridge_secondary_bus_reset()", the function is operating slot's
config space to trigger sbr and also wait slot itself ready by input same
"dev" parameter. Different from other resets like FLR which is triggered
by operating the config space of EP device itself, sbr is triggered by
up slot but need to wait downstream devices' ready, so I think function
"pci_dev_wait()" works for resets like FLR but not for sbr.

In this proposed patch, I'm changing the waiting function used in sbr to
"pci_bridge_secondary_bus_wait()" which will wait all the downstream
hierarchy ready with the same timeout setting "PCIE_RESET_READY_POLL_MS".
In "pci_bridge_secondary_bus_wait()" the "subordinate" and
"subordinate->devices" will be checked firstly, and then downstream
devices' present state.

Signed-off-by: windy.bi.enflame <windy.bi.enflame@gmail.com>
---
 drivers/pci/pci.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas May 16, 2022, 8:28 p.m. UTC | #1
[+cc Lukas, pciehp expert; Alex, reset person]

Thanks for the testing, analysis, and patch!

Run "git log --oneline drivers/pci/pci.c" and make your subject line
similar.

On Tue, May 17, 2022 at 01:30:47AM +0800, windy.bi.enflame wrote:
> While I do reset test of a PCIe endpoint device on a server, I find that
> the EP device always been removed and re-inserted again by hotplug module,
>  after secondary bus reset.
> 
> After checking I find:
> 1> "pciehp_reset_slot()" always disable slot's DLLSC interrupt before
>    doing reset and restore after reset, to try to filter the hotplug
>    event happened during reset.
> 2> "pci_bridge_secondary_bus_reset()" sleep 1 seconad and "pci_dev_wait()"
>    until device ready with "PCIE_RESET_READY_POLL_MS" timeout.
> 3> There is a PCIe switch between CPU and the EP devicem the topology as:
>    CPU <-> Switch <-> EP.
> 4> While trigger sbr reset at the switch's downstream port, it needs 1.5
>    seconds for internal enumeration.

s/seconad/second/
s/devicem/device/
s/sbr/SBR/
s/"pciehp_reset_slot()"/pciehp_reset_slot()/ also for other functions

> About why 1.5 seconds ready time is not filtered by "pci_dev_wait()" with
> "PCIE_RESET_READY_POLL_MS" timeout, I find it is because in
> "pci_bridge_secondary_bus_reset()", the function is operating slot's
> config space to trigger sbr and also wait slot itself ready by input same
> "dev" parameter. Different from other resets like FLR which is triggered
> by operating the config space of EP device itself, sbr is triggered by
> up slot but need to wait downstream devices' ready, so I think function
> "pci_dev_wait()" works for resets like FLR but not for sbr.
> 
> In this proposed patch, I'm changing the waiting function used in sbr to
> "pci_bridge_secondary_bus_wait()" which will wait all the downstream
> hierarchy ready with the same timeout setting "PCIE_RESET_READY_POLL_MS".
> In "pci_bridge_secondary_bus_wait()" the "subordinate" and
> "subordinate->devices" will be checked firstly, and then downstream
> devices' present state.
> 
> Signed-off-by: windy.bi.enflame <windy.bi.enflame@gmail.com>

See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.17#n407
regarding pseudonyms.

> ---
>  drivers/pci/pci.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9ecce435fb3f..d7ec3859268b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5002,6 +5002,29 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>  	}
>  }
>  
> +int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
> +{
> +	struct pci_dev *dev;
> +	int delay = 1;
> +
> +	if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
> +		return 0;
> +
> +	list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
> +		while (!pci_device_is_present(dev)) {
> +			if (delay > timeout) {
> +				pci_warn(dev, "secondary bus not ready after %dms\n", delay);
> +				return -ENOTTY;
> +			}
> +
> +			msleep(delay);
> +			delay *= 2;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  void pci_reset_secondary_bus(struct pci_dev *dev)
>  {
>  	u16 ctrl;
> @@ -5045,7 +5068,7 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
>  {
>  	pcibios_reset_secondary_bus(dev);
>  
> -	return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
> +	return pci_bridge_secondary_bus_wait(dev, PCIE_RESET_READY_POLL_MS);
>  }
>  EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
>  
> -- 
> 2.36.1
>
Alex Williamson May 16, 2022, 10:57 p.m. UTC | #2
On Mon, 16 May 2022 15:28:25 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Lukas, pciehp expert; Alex, reset person]
> 
> Thanks for the testing, analysis, and patch!
> 
> Run "git log --oneline drivers/pci/pci.c" and make your subject line
> similar.
> 
> On Tue, May 17, 2022 at 01:30:47AM +0800, windy.bi.enflame wrote:
> > While I do reset test of a PCIe endpoint device on a server, I find that
> > the EP device always been removed and re-inserted again by hotplug module,
> >  after secondary bus reset.
> > 
> > After checking I find:  
> > 1> "pciehp_reset_slot()" always disable slot's DLLSC interrupt before  
> >    doing reset and restore after reset, to try to filter the hotplug
> >    event happened during reset.  
> > 2> "pci_bridge_secondary_bus_reset()" sleep 1 seconad and "pci_dev_wait()"  
> >    until device ready with "PCIE_RESET_READY_POLL_MS" timeout.  
> > 3> There is a PCIe switch between CPU and the EP devicem the topology as:  
> >    CPU <-> Switch <-> EP.  
> > 4> While trigger sbr reset at the switch's downstream port, it needs 1.5  
> >    seconds for internal enumeration.  
> 
> s/seconad/second/
> s/devicem/device/
> s/sbr/SBR/
> s/"pciehp_reset_slot()"/pciehp_reset_slot()/ also for other functions
> 
> > About why 1.5 seconds ready time is not filtered by "pci_dev_wait()" with
> > "PCIE_RESET_READY_POLL_MS" timeout, I find it is because in
> > "pci_bridge_secondary_bus_reset()", the function is operating slot's
> > config space to trigger sbr and also wait slot itself ready by input same
> > "dev" parameter. Different from other resets like FLR which is triggered
> > by operating the config space of EP device itself, sbr is triggered by
> > up slot but need to wait downstream devices' ready, so I think function
> > "pci_dev_wait()" works for resets like FLR but not for sbr.

Is the unexpected hotplug occurring then because the device is not
ready after the 1s sleep after the sbr and we re-trigger the hotplug
controller which then triggers because the link status is still down?

> > In this proposed patch, I'm changing the waiting function used in sbr to
> > "pci_bridge_secondary_bus_wait()" which will wait all the downstream
> > hierarchy ready with the same timeout setting "PCIE_RESET_READY_POLL_MS".
> > In "pci_bridge_secondary_bus_wait()" the "subordinate" and
> > "subordinate->devices" will be checked firstly, and then downstream
> > devices' present state.
> > 
> > Signed-off-by: windy.bi.enflame <windy.bi.enflame@gmail.com>  
> 
> See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.17#n407
> regarding pseudonyms.
> 
> > ---
> >  drivers/pci/pci.c | 25 ++++++++++++++++++++++++-
> >  1 file changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 9ecce435fb3f..d7ec3859268b 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5002,6 +5002,29 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
> >  	}
> >  }
> >  
> > +int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
> > +{
> > +	struct pci_dev *dev;
> > +	int delay = 1;
> > +
> > +	if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
> > +		return 0;
> > +
> > +	list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
> > +		while (!pci_device_is_present(dev)) {
> > +			if (delay > timeout) {
> > +				pci_warn(dev, "secondary bus not ready after %dms\n", delay);
> > +				return -ENOTTY;
> > +			}
> > +
> > +			msleep(delay);
> > +			delay *= 2;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  void pci_reset_secondary_bus(struct pci_dev *dev)
> >  {
> >  	u16 ctrl;
> > @@ -5045,7 +5068,7 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
> >  {
> >  	pcibios_reset_secondary_bus(dev);
> >  
> > -	return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);

I assume pci_dev_wait here was always a no-op because we're testing the
wrong device, maybe this should be marked as:

Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")

> > +	return pci_bridge_secondary_bus_wait(dev, PCIE_RESET_READY_POLL_MS);

The theory looks reasonable to me, but I'd hope we cold get a better
commit log and improve the dev_warn message.  It seems to make sense to
use pci_device_is_present() since we shouldn't be dealing with VFs
after a bus reset, but I wonder if we want to enumerate all the missing
devices.  Since the timeout has passed, we shouldn't incur any extra
delays beyond the first device that doesn't re-appear.  Thanks,

Alex

> >  }
> >  EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
> >  
> > -- 
> > 2.36.1
> >   
>
kernel test robot May 17, 2022, 5:34 a.m. UTC | #3
Hi "windy.bi.enflame",

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on helgaas-pci/next]
[also build test WARNING on v5.18-rc7 next-20220516]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/windy-bi-enflame/drivers-pci-wait-downstream-hierarchy-ready-instead-of-slot-itself-ready-after-secondary-bus-reset/20220517-013158
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: alpha-defconfig (https://download.01.org/0day-ci/archive/20220517/202205171330.ye71SisD-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/99d829ca818d01cbd8bd4f95353f58a01723fe21
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review windy-bi-enflame/drivers-pci-wait-downstream-hierarchy-ready-instead-of-slot-itself-ready-after-secondary-bus-reset/20220517-013158
        git checkout 99d829ca818d01cbd8bd4f95353f58a01723fe21
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=alpha SHELL=/bin/bash drivers/pci/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/pci/pci.c:5052:5: warning: no previous prototype for 'pci_bridge_secondary_bus_wait' [-Wmissing-prototypes]
    5052 | int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/pci_bridge_secondary_bus_wait +5052 drivers/pci/pci.c

  5051	
> 5052	int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
  5053	{
  5054		struct pci_dev *dev;
  5055		int delay = 1;
  5056	
  5057		if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
  5058			return 0;
  5059	
  5060		list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
  5061			while (!pci_device_is_present(dev)) {
  5062				if (delay > timeout) {
  5063					pci_warn(dev, "secondary bus not ready after %dms\n", delay);
  5064					return -ENOTTY;
  5065				}
  5066	
  5067				msleep(delay);
  5068				delay *= 2;
  5069			}
  5070		}
  5071	
  5072		return 0;
  5073	}
  5074
Sheng Bi May 17, 2022, 2:56 p.m. UTC | #4
Hi Bjorn, Alex

Thank you for reviewing the patch and comments below, I will amend the
violation of
submission rule in patch V2.

Thanks

On Tue, May 17, 2022 at 6:57 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Mon, 16 May 2022 15:28:25 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> > [+cc Lukas, pciehp expert; Alex, reset person]
> >
> > Thanks for the testing, analysis, and patch!
> >
> > Run "git log --oneline drivers/pci/pci.c" and make your subject line
> > similar.
> >
> > On Tue, May 17, 2022 at 01:30:47AM +0800, windy.bi.enflame wrote:
> > > While I do reset test of a PCIe endpoint device on a server, I find that
> > > the EP device always been removed and re-inserted again by hotplug module,
> > >  after secondary bus reset.
> > >
> > > After checking I find:
> > > 1> "pciehp_reset_slot()" always disable slot's DLLSC interrupt before
> > >    doing reset and restore after reset, to try to filter the hotplug
> > >    event happened during reset.
> > > 2> "pci_bridge_secondary_bus_reset()" sleep 1 seconad and "pci_dev_wait()"
> > >    until device ready with "PCIE_RESET_READY_POLL_MS" timeout.
> > > 3> There is a PCIe switch between CPU and the EP devicem the topology as:
> > >    CPU <-> Switch <-> EP.
> > > 4> While trigger sbr reset at the switch's downstream port, it needs 1.5
> > >    seconds for internal enumeration.
> >
> > s/seconad/second/
> > s/devicem/device/
> > s/sbr/SBR/
> > s/"pciehp_reset_slot()"/pciehp_reset_slot()/ also for other functions
> >
> > > About why 1.5 seconds ready time is not filtered by "pci_dev_wait()" with
> > > "PCIE_RESET_READY_POLL_MS" timeout, I find it is because in
> > > "pci_bridge_secondary_bus_reset()", the function is operating slot's
> > > config space to trigger sbr and also wait slot itself ready by input same
> > > "dev" parameter. Different from other resets like FLR which is triggered
> > > by operating the config space of EP device itself, sbr is triggered by
> > > up slot but need to wait downstream devices' ready, so I think function
> > > "pci_dev_wait()" works for resets like FLR but not for sbr.
>
> Is the unexpected hotplug occurring then because the device is not
> ready after the 1s sleep after the sbr and we re-trigger the hotplug
> controller which then triggers because the link status is still down?

Yes, the device becomes accessible at ~1.5s after SBR while hotplug
interrupt was re-enabled after 1s sleep. Then the hotplug event at 1.5s
was been judged as real hotplug.

>
> > > In this proposed patch, I'm changing the waiting function used in sbr to
> > > "pci_bridge_secondary_bus_wait()" which will wait all the downstream
> > > hierarchy ready with the same timeout setting "PCIE_RESET_READY_POLL_MS".
> > > In "pci_bridge_secondary_bus_wait()" the "subordinate" and
> > > "subordinate->devices" will be checked firstly, and then downstream
> > > devices' present state.
> > >
> > > Signed-off-by: windy.bi.enflame <windy.bi.enflame@gmail.com>
> >
> > See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.17#n407
> > regarding pseudonyms.
> >
> > > ---
> > >  drivers/pci/pci.c | 25 ++++++++++++++++++++++++-
> > >  1 file changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 9ecce435fb3f..d7ec3859268b 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5002,6 +5002,29 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
> > >     }
> > >  }
> > >
> > > +int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
> > > +{
> > > +   struct pci_dev *dev;
> > > +   int delay = 1;
> > > +
> > > +   if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
> > > +           return 0;
> > > +
> > > +   list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
> > > +           while (!pci_device_is_present(dev)) {
> > > +                   if (delay > timeout) {
> > > +                           pci_warn(dev, "secondary bus not ready after %dms\n", delay);
> > > +                           return -ENOTTY;
> > > +                   }
> > > +
> > > +                   msleep(delay);
> > > +                   delay *= 2;
> > > +           }
> > > +   }
> > > +
> > > +   return 0;
> > > +}
> > > +
> > >  void pci_reset_secondary_bus(struct pci_dev *dev)
> > >  {
> > >     u16 ctrl;
> > > @@ -5045,7 +5068,7 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
> > >  {
> > >     pcibios_reset_secondary_bus(dev);
> > >
> > > -   return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
>
> I assume pci_dev_wait here was always a no-op because we're testing the
> wrong device, maybe this should be marked as:
>
> Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")

I think so too, will mark it if we all aligned.

>
> > > +   return pci_bridge_secondary_bus_wait(dev, PCIE_RESET_READY_POLL_MS);
>
> The theory looks reasonable to me, but I'd hope we cold get a better
> commit log and improve the dev_warn message.  It seems to make sense to
> use pci_device_is_present() since we shouldn't be dealing with VFs
> after a bus reset, but I wonder if we want to enumerate all the missing
> devices.  Since the timeout has passed, we shouldn't incur any extra
> delays beyond the first device that doesn't re-appear.  Thanks,
>
> Alex

Thanks for your suggestion. I thought to enumerate all the missing
devices because SBR affects all the downstream hierarchy and
devices need to be re-enumerated as possible as we can.
I agree that we shouldn't incur any extra delays once the timeout has
already passed, since SBR fails as long as one device fails.

>
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
> > >
> > > --
> > > 2.36.1
> > >
> >
>
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ecce435fb3f..d7ec3859268b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5002,6 +5002,29 @@  void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	}
 }
 
+int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
+{
+	struct pci_dev *dev;
+	int delay = 1;
+
+	if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
+		return 0;
+
+	list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
+		while (!pci_device_is_present(dev)) {
+			if (delay > timeout) {
+				pci_warn(dev, "secondary bus not ready after %dms\n", delay);
+				return -ENOTTY;
+			}
+
+			msleep(delay);
+			delay *= 2;
+		}
+	}
+
+	return 0;
+}
+
 void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
@@ -5045,7 +5068,7 @@  int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
 {
 	pcibios_reset_secondary_bus(dev);
 
-	return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
+	return pci_bridge_secondary_bus_wait(dev, PCIE_RESET_READY_POLL_MS);
 }
 EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);