diff mbox

[REPOST] Extend PCIE_BUS_PEER2PEER to set MRSS=128 to fix CNS3xxx BM DMA.

Message ID m3h9fghbfh.fsf@t19.piap.pl (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Krzysztof Hałasa March 21, 2016, 9:39 a.m. UTC
I think this bug needs to be fixed, this way or another.

The platform in question is Cavium CNS3xxx, ARMv6.

A recent patch by Arnd Bergmann (498a92d42596 "ARM: cns3xxx: pci: avoid
potential stack overflow") converted an explicit setting of
PCI_EXP_DEVCTL_READRQ = 0 (i.e., max 128 bytes for bus-mastering PCIe DMA
read request) to:
+    pcie_bus_config = PCIE_BUS_PEER2PEER;

with the following commentary:
    "The second part is how the driver sets up the Max_Read_Request_Size
    value for the first device/function on bus 1, i.e. the device
    plugged directly into the PCIe root port.
    For all I can tell, this is in fact incomplete, as it does not
    perform the same setting on devices attached to a PCIe switch,
    or multi-function devices.
    The solution for this part fortunately is even easier: if we
    just set the global pcie_bus_config variable to PCIE_BUS_PEER2PEER,
    all PCIe devices in the system are limited to 128 byte MPS, which
    in turn limits the MRRS to 128 bytes for all devices, and we
    no longer even need to touch any devices."

The problem is the MRRS setting is never written to the hardware.
I propose the following, though I'm not sure if we can do this safely,
especially given the comments in probe.c. OTOH, this change may be
required in other (all?) cases when the user requests
PCIE_BUS_PEER2PEER.

On this Laguna GW-2388 the following patch fixes BM DMA with:
0000:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01)
0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
0000:02:0e.0 (PCI devices behind the bridge, these are doing actual BM xfers)
0001:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01 - this is
	     the second lane from the CPU)

pci 0000:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
pci 0000:01:00.0: Max Payload Size set to  128/ 512 (was  128), Max Read Rq  128
pci 0001:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128

Signed-off-by: Krzysztof Ha?asa <khalasa@piap.pl>
Fixes: 498a92d42596 ("ARM: cns3xxx: pci: avoid potential stack overflow")

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas April 21, 2016, 3:42 p.m. UTC | #1
On Mon, Mar 21, 2016 at 10:39:52AM +0100, Krzysztof Halasa wrote:
> I think this bug needs to be fixed, this way or another.
> 
> The platform in question is Cavium CNS3xxx, ARMv6.
> 
> A recent patch by Arnd Bergmann (498a92d42596 "ARM: cns3xxx: pci: avoid
> potential stack overflow") converted an explicit setting of
> PCI_EXP_DEVCTL_READRQ = 0 (i.e., max 128 bytes for bus-mastering PCIe DMA
> read request) to:
> +    pcie_bus_config = PCIE_BUS_PEER2PEER;

Any thoughts on this, Arnd?  I can (and will) look at this, but it's
a complicated area and it will take me quite a while to dig into it.

> with the following commentary:
>     "The second part is how the driver sets up the Max_Read_Request_Size
>     value for the first device/function on bus 1, i.e. the device
>     plugged directly into the PCIe root port.
>     For all I can tell, this is in fact incomplete, as it does not
>     perform the same setting on devices attached to a PCIe switch,
>     or multi-function devices.
>     The solution for this part fortunately is even easier: if we
>     just set the global pcie_bus_config variable to PCIE_BUS_PEER2PEER,
>     all PCIe devices in the system are limited to 128 byte MPS, which
>     in turn limits the MRRS to 128 bytes for all devices, and we
>     no longer even need to touch any devices."
> 
> The problem is the MRRS setting is never written to the hardware.
> I propose the following, though I'm not sure if we can do this safely,
> especially given the comments in probe.c. OTOH, this change may be
> required in other (all?) cases when the user requests
> PCIE_BUS_PEER2PEER.
> 
> On this Laguna GW-2388 the following patch fixes BM DMA with:
> 0000:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01)
> 0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
> 0000:02:0e.0 (PCI devices behind the bridge, these are doing actual BM xfers)
> 0001:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01 - this is
> 	     the second lane from the CPU)
> 
> pci 0000:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> pci 0000:01:00.0: Max Payload Size set to  128/ 512 (was  128), Max Read Rq  128
> pci 0001:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> 
> Signed-off-by: Krzysztof Ha?asa <khalasa@piap.pl>
> Fixes: 498a92d42596 ("ARM: cns3xxx: pci: avoid potential stack overflow")
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6d7ab9b..91713b6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1919,7 +1919,8 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>  	/* In the "safe" case, do not configure the MRRS.  There appear to be
>  	 * issues with setting MRRS to 0 on a number of devices.
>  	 */
> -	if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
> +	if (pcie_bus_config != PCIE_BUS_PERFORMANCE &&
> +	    pcie_bus_config != PCIE_BUS_PEER2PEER)
>  		return;
>  
>  	/* For Max performance, the MRRS must be set to the largest supported
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2771625..6f5088a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -756,7 +756,7 @@ enum pcie_bus_config_types {
>  	PCIE_BUS_DEFAULT,	/* ensure MPS matches upstream bridge */
>  	PCIE_BUS_SAFE,		/* use largest MPS boot-time devices support */
>  	PCIE_BUS_PERFORMANCE,	/* use MPS and MRRS for best performance */
> -	PCIE_BUS_PEER2PEER,	/* set MPS = 128 for all devices */
> +	PCIE_BUS_PEER2PEER,	/* set MPS and MRSS to 128 for all devices */
>  };
>  
>  extern enum pcie_bus_config_types pcie_bus_config;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 2, 2016, 4:53 p.m. UTC | #2
On Thu, Apr 21, 2016 at 10:42:34AM -0500, Bjorn Helgaas wrote:
> On Mon, Mar 21, 2016 at 10:39:52AM +0100, Krzysztof Halasa wrote:
> > I think this bug needs to be fixed, this way or another.
> > 
> > The platform in question is Cavium CNS3xxx, ARMv6.
> > 
> > A recent patch by Arnd Bergmann (498a92d42596 "ARM: cns3xxx: pci: avoid
> > potential stack overflow") converted an explicit setting of
> > PCI_EXP_DEVCTL_READRQ = 0 (i.e., max 128 bytes for bus-mastering PCIe DMA
> > read request) to:
> > +    pcie_bus_config = PCIE_BUS_PEER2PEER;
> 
> Any thoughts on this, Arnd?  I can (and will) look at this, but it's
> a complicated area and it will take me quite a while to dig into it.

What should we do here?  Should we revert 498a92d42596?  It appeared in
v4.4, so this has been broken for a while.

It looks like 498a92d42596 merely fixed a warning, at the expense of
breaking DMA on Cavium.  Reverting it would bring the warning back, but
that's better than broken DMA.

> > with the following commentary:
> >     "The second part is how the driver sets up the Max_Read_Request_Size
> >     value for the first device/function on bus 1, i.e. the device
> >     plugged directly into the PCIe root port.
> >     For all I can tell, this is in fact incomplete, as it does not
> >     perform the same setting on devices attached to a PCIe switch,
> >     or multi-function devices.
> >     The solution for this part fortunately is even easier: if we
> >     just set the global pcie_bus_config variable to PCIE_BUS_PEER2PEER,
> >     all PCIe devices in the system are limited to 128 byte MPS, which
> >     in turn limits the MRRS to 128 bytes for all devices, and we
> >     no longer even need to touch any devices."
> > 
> > The problem is the MRRS setting is never written to the hardware.
> > I propose the following, though I'm not sure if we can do this safely,
> > especially given the comments in probe.c. OTOH, this change may be
> > required in other (all?) cases when the user requests
> > PCIE_BUS_PEER2PEER.
> > 
> > On this Laguna GW-2388 the following patch fixes BM DMA with:
> > 0000:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01)
> > 0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
> > 0000:02:0e.0 (PCI devices behind the bridge, these are doing actual BM xfers)
> > 0001:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01 - this is
> > 	     the second lane from the CPU)
> > 
> > pci 0000:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> > pci 0000:01:00.0: Max Payload Size set to  128/ 512 (was  128), Max Read Rq  128
> > pci 0001:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> > 
> > Signed-off-by: Krzysztof Ha?asa <khalasa@piap.pl>
> > Fixes: 498a92d42596 ("ARM: cns3xxx: pci: avoid potential stack overflow")
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 6d7ab9b..91713b6 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1919,7 +1919,8 @@ static void pcie_write_mrrs(struct pci_dev *dev)
> >  	/* In the "safe" case, do not configure the MRRS.  There appear to be
> >  	 * issues with setting MRRS to 0 on a number of devices.
> >  	 */
> > -	if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
> > +	if (pcie_bus_config != PCIE_BUS_PERFORMANCE &&
> > +	    pcie_bus_config != PCIE_BUS_PEER2PEER)
> >  		return;
> >  
> >  	/* For Max performance, the MRRS must be set to the largest supported
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 2771625..6f5088a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -756,7 +756,7 @@ enum pcie_bus_config_types {
> >  	PCIE_BUS_DEFAULT,	/* ensure MPS matches upstream bridge */
> >  	PCIE_BUS_SAFE,		/* use largest MPS boot-time devices support */
> >  	PCIE_BUS_PERFORMANCE,	/* use MPS and MRRS for best performance */
> > -	PCIE_BUS_PEER2PEER,	/* set MPS = 128 for all devices */
> > +	PCIE_BUS_PEER2PEER,	/* set MPS and MRSS to 128 for all devices */
> >  };
> >  
> >  extern enum pcie_bus_config_types pcie_bus_config;
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Hałasa May 4, 2016, 1:09 p.m. UTC | #3
Bjorn Helgaas <helgaas@kernel.org> writes:

> It looks like 498a92d42596 merely fixed a warning, at the expense of
> breaking DMA on Cavium.  Reverting it would bring the warning back, but
> that's better than broken DMA.

Perhaps we should change PCIE_BUS_PEER2PEER to also write MRRS anyway.

I realize the CNS3xxx patch is some sort of clever workaround, and that
PCIE_BUS_PEER2PEER (which normally comes from kernel command line
parameter "pcie_bus_peer2peer") was not exactly intended for this. But
if one asks for "peer2peer" (which means limiting transfers to 128
bytes), how could it all work if the bus mastering read requests are
not equally limited?


BTW s/MRSS/MRRS/g
Bjorn Helgaas May 4, 2016, 7:47 p.m. UTC | #4
On Wed, May 04, 2016 at 03:09:27PM +0200, Krzysztof Ha?asa wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
> 
> > It looks like 498a92d42596 merely fixed a warning, at the expense of
> > breaking DMA on Cavium.  Reverting it would bring the warning back, but
> > that's better than broken DMA.
> 
> Perhaps we should change PCIE_BUS_PEER2PEER to also write MRRS anyway.
> 
> I realize the CNS3xxx patch is some sort of clever workaround, and that
> PCIE_BUS_PEER2PEER (which normally comes from kernel command line
> parameter "pcie_bus_peer2peer") was not exactly intended for this. But
> if one asks for "peer2peer" (which means limiting transfers to 128
> bytes), how could it all work if the bus mastering read requests are
> not equally limited?

MPS=128 means a function will never generate a TLP exceeding 128
bytes.  MRRS=128 means a function will never generate a Read Request
with a size exceeding 128 bytes.

The comment in pcie_write_mrrs() claims:

  ... the MRRS ... cannot be configured larger than the MPS the
  device or the bus can support.

I think this comment is wrong, at least from a hardware point of view.
Setting all devices to MRRS=512 and MPS=128 is a legal configuration
that means functions may receive Read Requests for up to 512 bytes,
and they will have to respond with 4 128-byte TLPs.

The spec (PCIe r3.0, 7.8.4 implementation note) says:

  The Max_Read_Request_Size mechanism allows improved control of
  bandwidth allocation in systems where quality of service (QoS) is
  important for the target applications. For example, an arbitration
  scheme based on counting Requests (and not the sizes of those
  Requests) provides imprecise bandwidth allocation when some
  Requesters use much larger sizes than others. The
  Max_Read_Request_Size mechanism can be used to force more uniform
  allocation of bandwidth, by restricting the upper size of Read
  Requests.

The Linux usage of tying it to MRRS to MPS is part of a plan to use
MRRS to restrict the TLP size in one direction and MPS to restrict TLP
size in the other.  See b03e7495a862 ("PCI: Set PCI-E Max Payload Size
on fabric").  But this is not in line with the spec intention, and I'm
not 100% convinced that it works reliably.

So I'm a little wary of tweaking our MPS/MRRS configuration without a
more extensive analysis and (hopefully) some simplification of the
code.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 31, 2016, 8:12 p.m. UTC | #5
On Mon, May 02, 2016 at 11:53:22AM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 21, 2016 at 10:42:34AM -0500, Bjorn Helgaas wrote:
> > On Mon, Mar 21, 2016 at 10:39:52AM +0100, Krzysztof Halasa wrote:
> > > I think this bug needs to be fixed, this way or another.
> > > 
> > > The platform in question is Cavium CNS3xxx, ARMv6.
> > > 
> > > A recent patch by Arnd Bergmann (498a92d42596 "ARM: cns3xxx: pci: avoid
> > > potential stack overflow") converted an explicit setting of
> > > PCI_EXP_DEVCTL_READRQ = 0 (i.e., max 128 bytes for bus-mastering PCIe DMA
> > > read request) to:
> > > +    pcie_bus_config = PCIE_BUS_PEER2PEER;
> > 
> > Any thoughts on this, Arnd?  I can (and will) look at this, but it's
> > a complicated area and it will take me quite a while to dig into it.
> 
> What should we do here?  Should we revert 498a92d42596?  It appeared in
> v4.4, so this has been broken for a while.
> 
> It looks like 498a92d42596 merely fixed a warning, at the expense of
> breaking DMA on Cavium.  Reverting it would bring the warning back, but
> that's better than broken DMA.

I propose that we revert 498a92d42596 ("ARM: cns3xxx: pci: avoid
potential stack overflow").  I haven't had a chance to look into the
problem with it yet.

> > > with the following commentary:
> > >     "The second part is how the driver sets up the Max_Read_Request_Size
> > >     value for the first device/function on bus 1, i.e. the device
> > >     plugged directly into the PCIe root port.
> > >     For all I can tell, this is in fact incomplete, as it does not
> > >     perform the same setting on devices attached to a PCIe switch,
> > >     or multi-function devices.
> > >     The solution for this part fortunately is even easier: if we
> > >     just set the global pcie_bus_config variable to PCIE_BUS_PEER2PEER,
> > >     all PCIe devices in the system are limited to 128 byte MPS, which
> > >     in turn limits the MRRS to 128 bytes for all devices, and we
> > >     no longer even need to touch any devices."
> > > 
> > > The problem is the MRRS setting is never written to the hardware.
> > > I propose the following, though I'm not sure if we can do this safely,
> > > especially given the comments in probe.c. OTOH, this change may be
> > > required in other (all?) cases when the user requests
> > > PCIE_BUS_PEER2PEER.
> > > 
> > > On this Laguna GW-2388 the following patch fixes BM DMA with:
> > > 0000:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01)
> > > 0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
> > > 0000:02:0e.0 (PCI devices behind the bridge, these are doing actual BM xfers)
> > > 0001:00:00.0 PCI bridge: Cavium Networks Device 3400 (rev 01 - this is
> > > 	     the second lane from the CPU)
> > > 
> > > pci 0000:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> > > pci 0000:01:00.0: Max Payload Size set to  128/ 512 (was  128), Max Read Rq  128
> > > pci 0001:00:00.0: Max Payload Size set to  128/ 128 (was  128), Max Read Rq  128
> > > 
> > > Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
> > > Fixes: 498a92d42596 ("ARM: cns3xxx: pci: avoid potential stack overflow")
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 6d7ab9b..91713b6 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -1919,7 +1919,8 @@ static void pcie_write_mrrs(struct pci_dev *dev)
> > >  	/* In the "safe" case, do not configure the MRRS.  There appear to be
> > >  	 * issues with setting MRRS to 0 on a number of devices.
> > >  	 */
> > > -	if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
> > > +	if (pcie_bus_config != PCIE_BUS_PERFORMANCE &&
> > > +	    pcie_bus_config != PCIE_BUS_PEER2PEER)
> > >  		return;
> > >  
> > >  	/* For Max performance, the MRRS must be set to the largest supported
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 2771625..6f5088a 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -756,7 +756,7 @@ enum pcie_bus_config_types {
> > >  	PCIE_BUS_DEFAULT,	/* ensure MPS matches upstream bridge */
> > >  	PCIE_BUS_SAFE,		/* use largest MPS boot-time devices support */
> > >  	PCIE_BUS_PERFORMANCE,	/* use MPS and MRRS for best performance */
> > > -	PCIE_BUS_PEER2PEER,	/* set MPS = 128 for all devices */
> > > +	PCIE_BUS_PEER2PEER,	/* set MPS and MRSS to 128 for all devices */
> > >  };
> > >  
> > >  extern enum pcie_bus_config_types pcie_bus_config;
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d7ab9b..91713b6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1919,7 +1919,8 @@  static void pcie_write_mrrs(struct pci_dev *dev)
 	/* In the "safe" case, do not configure the MRRS.  There appear to be
 	 * issues with setting MRRS to 0 on a number of devices.
 	 */
-	if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
+	if (pcie_bus_config != PCIE_BUS_PERFORMANCE &&
+	    pcie_bus_config != PCIE_BUS_PEER2PEER)
 		return;
 
 	/* For Max performance, the MRRS must be set to the largest supported
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2771625..6f5088a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -756,7 +756,7 @@  enum pcie_bus_config_types {
 	PCIE_BUS_DEFAULT,	/* ensure MPS matches upstream bridge */
 	PCIE_BUS_SAFE,		/* use largest MPS boot-time devices support */
 	PCIE_BUS_PERFORMANCE,	/* use MPS and MRRS for best performance */
-	PCIE_BUS_PEER2PEER,	/* set MPS = 128 for all devices */
+	PCIE_BUS_PEER2PEER,	/* set MPS and MRSS to 128 for all devices */
 };
 
 extern enum pcie_bus_config_types pcie_bus_config;