diff mbox

[v3,3/3] PCI: Avoid slot reset for Cavium cn8xxx root ports

Message ID 20170907074011.GA13490@hc (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Glauber Sept. 7, 2017, 7:40 a.m. UTC
On Thu, Aug 31, 2017 at 10:01:30AM -0600, Alex Williamson wrote:
> On Thu, 31 Aug 2017 11:40:52 +0200
> Jan Glauber <jan.glauber@caviumnetworks.com> wrote:
> 
> > On Wed, Aug 30, 2017 at 08:40:12AM -0600, Alex Williamson wrote:
> > > On Wed, 30 Aug 2017 16:24:54 +0200
> > > Jan Glauber <jglauber@cavium.com> wrote:
> > >   
> > > > Root ports of cn8xxx do not function after a slot reset when used with
> > > > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on
> > > > these root ports.
> > > > 
> > > > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > > > ---
> > > >  drivers/pci/quirks.c | 16 ++++++++++++++++
> > > >  1 file changed, 16 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 85191b8..6679971 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev)
> > > >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link);
> > > >  #endif
> > > >  
> > > > +/*
> > > > + * Root port on some Cavium CN8xxx chips do not successfully complete
> > > > + * a bus reset when used with certain types of child devices. Config
> > > > + * space access to the child may quit responding. Flag all devices under
> > > > + * the secondary bus as non-resettable.
> > > > + */
> > > > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev)
> > > > +{
> > > > +	struct pci_dev *pdev;
> > > > +
> > > > +	dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n");
> > > > +	list_for_each_entry(pdev, &dev->subordinate->devices, bus_list)
> > > > +		pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
> > > > +}
> > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus);
> > > > +
> > > >  /*
> > > >   * Some settings of MMRBC can lead to data corruption so block changes.
> > > >   * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide  
> > > 
> > > 
> > > This doesn't seem reliable, doesn't the user just need to remove and
> > > reprobe the slot and the device would re-appear without this flag set?  
> > 
> > No, I tried before to disable the slot with "echo 0 > /sys/bus/pci/slots/3/power"
> > but that does not work as it is not supported.
> > 
> > I'm not familiar with the quirk types, would another one be better
> > suited here (even if we don't have the problem you descibed)?
> 
> The scenario I'm mentioning is to "echo 1 > /sys/bus/pci/devices/<some
> device under the slot>/remove", then "echo <that device address> >
> /sys/bus/pci/rescan".  This would break the ordering implicit in using
> a fixup defined for the root port.  It seems like it'd make a lot more
> sense to add a test on the parent bridge more similar to how the bus
> reset works.  It's not the subordinate devices imposing the
> no-bus-reset flag, it's the bridge device and the objects and code
> should support and reflect that.  Thanks,

Doing "echo <that device address> > /sys/bus/pci/rescan" after the
remove did not work for me, but maybe the format of the device address
needs to be different. Anyway, the sequence
  echo 1 > /sys/bus/pci/devices/<some device under the slot>/remove
  echo 1 > /sys/bus/pci/rescan
still triggers the panic as you mentioned above.

I agree that the subordinate devices are not causing the issue, still
I need to make pci_slot_resetable() return false in our case.

So what if we add an additional check like:


--Jan

Comments

Jan Glauber Sept. 7, 2017, 7:49 a.m. UTC | #1
On Thu, Sep 07, 2017 at 09:40:11AM +0200, Jan Glauber wrote:
> So what if we add an additional check like:
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index fdf65a6..389db4b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot)
>  {
>         struct pci_dev *dev;
>  
> +       if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET)
> +               return false;
> +
>         list_for_each_entry(dev, &slot->bus->devices, bus_list) {
>                 if (!dev->slot || dev->slot != slot)
>                         continue;

Obviously I meant:
if (slot->bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)

--Jan
Alex Williamson Sept. 7, 2017, 4:52 p.m. UTC | #2
On Thu, 7 Sep 2017 09:49:04 +0200
Jan Glauber <jan.glauber@caviumnetworks.com> wrote:

> On Thu, Sep 07, 2017 at 09:40:11AM +0200, Jan Glauber wrote:
> > So what if we add an additional check like:
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index fdf65a6..389db4b 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot)
> >  {
> >         struct pci_dev *dev;
> >  
> > +       if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET)
> > +               return false;
> > +
> >         list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> >                 if (!dev->slot || dev->slot != slot)
> >                         continue;  
> 
> Obviously I meant:
> if (slot->bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)

Much better, perhaps even incorporate the bus->self check for good
measure... is it possible to have a slot on a root bus?  Taking
different approaches for bus vs slot reset should have been a giant red
flag that something is wrong.  Thanks,

Alex
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index fdf65a6..389db4b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4389,6 +4389,9 @@  static bool pci_slot_resetable(struct pci_slot *slot)
 {
        struct pci_dev *dev;
 
+       if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET)
+               return false;
+
        list_for_each_entry(dev, &slot->bus->devices, bus_list) {
                if (!dev->slot || dev->slot != slot)
                        continue;