diff mbox series

PCIe: limit Max Read Request Size on i.MX to 512 bytes

Message ID m37dgp20cr.fsf@t19.piap.pl (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series PCIe: limit Max Read Request Size on i.MX to 512 bytes | expand

Commit Message

Krzysztof Hałasa Aug. 13, 2021, 8:52 a.m. UTC
DWC PCIe controller imposes limits on the Read Request Size that it can
handle. For i.MX6 family it's fixed at 512 bytes by default.

If a memory read larger than the limit is requested, the CPU responds
with Completer Abort (CA) (on i.MX6 Unsupported Request (UR) is returned
instead due to a design error).

The i.MX6 documentation states that the limit can be changed by writing
to the PCIE_PL_MRCCR0 register, however there is a fixed (and
undocumented) maximum (CX_REMOTE_RD_REQ_SIZE constant). Tests indicate
that values larger than 512 bytes don't work, though.

This patch makes the RTL8111 work on i.MX6.

Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>

Comments

Krzysztof Wilczyński Aug. 13, 2021, 10:13 a.m. UTC | #1
Hi Krzysztof,

[...]
> This patch makes the RTL8111 work on i.MX6.

Would it be possible to implement this particular MRRS fix as a quirk
only for the i.MX6 controller?  Unless this is something that we need in
the core, a quirk would be preferred over something that changes the PCI
core.

An example of such quirk might be the one currently implemented for the
Loongson controller, as per:

  https://elixir.bootlin.com/linux/v5.14-rc5/source/drivers/pci/controller/pci-loongson.c#L63

	Krzysztof
Krzysztof Hałasa Aug. 13, 2021, 12:09 p.m. UTC | #2
Krzysztof, :-)

> Would it be possible to implement this particular MRRS fix as a quirk
> only for the i.MX6 controller?  Unless this is something that we need in
> the core, a quirk would be preferred over something that changes the PCI
> core.

I have briefly considered it, but I think it would be *much* more
complicated and error-prone. It also appears that there are more
platforms which need it - the old CNS3xxx, which currently subverts the
PCIE_BUS_PEER2PEER, the loongson, keystone, maybe all DWC PCIe.
Multiplication of the "quirk" code doesn't really look good to me.

TBH I don't think of this as of a "quirk" - all systems have MRRS
limits, it just happens that these ones have their limit lower than 4096
bytes. This isn't a limitation of a particular PCIe device, this is a
common limit of the whole system.

Also I'm not exactly sure the loongson fixup is complete.
It's only done at pci-enable*() time (e.g. for devices which have bigger
MRRS after power-up), while e.g. the r8169 driver changes MRRS well
after pci-enable*().

This means it needs to stay in/below pcie_get_readrq(), and while it
could mean going to ops->write*, it would be a real mess parsing the
devices, PCIE capabilities etc.
Now it's basically a few lines in a seldom called routine in pci.c, and
the loongson case (and others) can be made simpler (and really fixed) as
well.
Rob Herring (Arm) Aug. 13, 2021, 1:45 p.m. UTC | #3
On Fri, Aug 13, 2021 at 3:52 AM Krzysztof Hałasa <khalasa@piap.pl> wrote:
>
> DWC PCIe controller imposes limits on the Read Request Size that it can
> handle. For i.MX6 family it's fixed at 512 bytes by default.
>
> If a memory read larger than the limit is requested, the CPU responds
> with Completer Abort (CA) (on i.MX6 Unsupported Request (UR) is returned
> instead due to a design error).
>
> The i.MX6 documentation states that the limit can be changed by writing
> to the PCIE_PL_MRCCR0 register, however there is a fixed (and
> undocumented) maximum (CX_REMOTE_RD_REQ_SIZE constant). Tests indicate
> that values larger than 512 bytes don't work, though.
>
> This patch makes the RTL8111 work on i.MX6.
>
> Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 0c473d75e625..a11ec93a8cd0 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -34,6 +34,9 @@ config PCI_DOMAINS_GENERIC
>  config PCI_SYSCALL
>         bool
>
> +config NEED_PCIE_MAX_MRRS

We don't need a config option for this. It's not much code and it will
effectively always be enabled with multi-platform kernels.

Rob
Krzysztof Hałasa Aug. 13, 2021, 6:18 p.m. UTC | #4
Rob,

Rob Herring <robh@kernel.org> writes:

>> +++ b/drivers/pci/Kconfig
>> @@ -34,6 +34,9 @@ config PCI_DOMAINS_GENERIC
>>  config PCI_SYSCALL
>>         bool
>>
>> +config NEED_PCIE_MAX_MRRS
>
> We don't need a config option for this. It's not much code and it will
> effectively always be enabled with multi-platform kernels.

But... non-ARM kernels?
Then perhaps #if CONFIG_ARM?
Bjorn Helgaas Aug. 13, 2021, 7:22 p.m. UTC | #5
On Fri, Aug 13, 2021 at 02:09:51PM +0200, Krzysztof Hałasa wrote:
> Krzysztof, :-)
> 
> > Would it be possible to implement this particular MRRS fix as a quirk
> > only for the i.MX6 controller?  Unless this is something that we need in
> > the core, a quirk would be preferred over something that changes the PCI
> > core.
> 
> I have briefly considered it, but I think it would be *much* more
> complicated and error-prone. It also appears that there are more
> platforms which need it - the old CNS3xxx, which currently subverts the
> PCIE_BUS_PEER2PEER, the loongson, keystone, maybe all DWC PCIe.
> Multiplication of the "quirk" code doesn't really look good to me.
> 
> TBH I don't think of this as of a "quirk" - all systems have MRRS
> limits, it just happens that these ones have their limit lower than 4096
> bytes. This isn't a limitation of a particular PCIe device, this is a
> common limit of the whole system.

Do you have a reference for this?  I don't see anything in the PCIe
spec that suggests platforms must limit MRRS, and it seems that only
these ARM-related controllers have this issue.  If there *is* a
platform connection here, we'll need some way to discover it, e.g.,
an ACPI _DSM method or similar.

The only guidance in the spec about setting MRRS is that:

  - Software must set Max_Read_Request_Size of an
    isochronous-configured device with a value that does not exceed
    the Max_Payload_Size set for the device (PCIe r5.0, sec 6.3.4.1)

  - The Max_Read_Request_Size mechanism allows improved control of
    bandwidth allocation in systems where Quality of Service (QoS) is
    important for the target applications. For example, an arbitration
    scheme based on counting Requests (and not the sizes of those
    Requests) provides imprecise bandwidth allocation when some
    Requesters use much larger sizes than others. The
    Max_Read_Request_Size mechanism can be used to force more uniform
    allocation of bandwidth, by restricting the upper size of Read
    Requests (sec 7.5.3.4 implementation note)
Krzysztof Hałasa Aug. 16, 2021, 5:18 a.m. UTC | #6
Bjorn Helgaas <helgaas@kernel.org> writes:

>> TBH I don't think of this as of a "quirk" - all systems have MRRS
>> limits, it just happens that these ones have their limit lower than 4096
>> bytes. This isn't a limitation of a particular PCIe device, this is a
>> common limit of the whole system.
>
> Do you have a reference for this?  I don't see anything in the PCIe
> spec that suggests platforms must limit MRRS, and it seems that only
> these ARM-related controllers have this issue.

I meant there is always a limit - isn't Max_Read_Request_Size a limit?
Device Control Register (Offset 08h) Bit Location 14:12
Max_Read_Request_Size allows for max 4096 bytes, though two values are
reserved, so there is room for some easy extension.

- non-ARM (non-DWC?) systems are limited to 4096 bytes
- DWC-based systems are limited to 128, 256, 512 bytes (are there
  4096-byte ones?)

That's how I understand it, please correct me if I'm wrong.
Hongxing Zhu Aug. 16, 2021, 7:49 a.m. UTC | #7
Hi Krzysztof:
Regarding my understanding, that there should be decomposition operations if the
 Max_Read_Request_Size is larger than the Max_Payload_size specified by RC port.

The bit0 of AMBA Multiple Outbound Decomposed NP Sub-Requests Control Register(Offset:0x700 + 0x24)
 had been set to be 1b1 in default.

Note: The description of this bit.
Enable AMBA Multiple Outbound Decomposed NP Sub- Requests.
This bit when set to '0' disables the possibility of having multiple outstanding non-posted requests that
were derived from decomposition of an outbound AMBA request. See Supported AXI Burst Operations for
more details. You should not clear this register unless your application master is requesting an amount of read data
greater than Max_Read_Request_Size, and the remote device (or switch) is reordering completions that
have different tags

> -----Original Message-----
> From: khalasa@piap.pl <khalasa@piap.pl>
> Sent: Monday, August 16, 2021 1:19 PM
> To: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Krzysztof Wilczyński <kw@linux.com>; Bjorn Helgaas
> <bhelgaas@google.com>; linux-pci@vger.kernel.org; Artem Lapkin
> <email2tema@gmail.com>; Neil Armstrong <narmstrong@baylibre.com>;
> Huacai Chen <chenhuacai@gmail.com>; Rob Herring <robh@kernel.org>;
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>; Richard Zhu
> <hongxing.zhu@nxp.com>; Lucas Stach <l.stach@pengutronix.de>;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] PCIe: limit Max Read Request Size on i.MX to 512 bytes
> 
> Bjorn Helgaas <helgaas@kernel.org> writes:
> 
> >> TBH I don't think of this as of a "quirk" - all systems have MRRS
> >> limits, it just happens that these ones have their limit lower than
> >> 4096 bytes. This isn't a limitation of a particular PCIe device, this
> >> is a common limit of the whole system.
> >
> > Do you have a reference for this?  I don't see anything in the PCIe
> > spec that suggests platforms must limit MRRS, and it seems that only
> > these ARM-related controllers have this issue.
> 
> I meant there is always a limit - isn't Max_Read_Request_Size a limit?
> Device Control Register (Offset 08h) Bit Location 14:12
> Max_Read_Request_Size allows for max 4096 bytes, though two values are
> reserved, so there is room for some easy extension.
> 
> - non-ARM (non-DWC?) systems are limited to 4096 bytes
[Richard] Regarding to the descriptions listed above, I think that there should no limitations of the Max_payload_size of RC port.

> - DWC-based systems are limited to 128, 256, 512 bytes (are there
>   4096-byte ones?)
[Richard] The Max_payload_size can be configured from 128bytes to 4KB when integrate the DWC IP into the SOC.
On i.MX6 PCIe, this parameter is fixed set to 512 bytes.
BR
Richard
> 
> That's how I understand it, please correct me if I'm wrong.
> --
> Krzysztof "Chris" Hałasa
> 
> Sieć Badawcza Łukasiewicz
> Przemysłowy Instytut Automatyki i Pomiarów PIAP Al. Jerozolimskie 202,
> 02-486 Warszawa
Krzysztof Hałasa Aug. 16, 2021, 11:18 a.m. UTC | #8
Hi Richard,

Please correct me if I got something wrong:

> Regarding my understanding, that there should be decomposition operations if the
>  Max_Read_Request_Size is larger than the Max_Payload_size specified
>  by RC port.

I think this means that, for example, a memory read request (a single
short physical TLP packet on PCIe, from the peripheral device to the
CPU) can request more data than Max_Payload_size (128 bytes on i.MX6).
In such case, up to 4 "completion" physical TLP packets are returned by
the CPU (up to 512 bytes, with each individual TLP up to 128 bytes as
per Max_Payload_size).

The device can't request more than 512 bytes, though - the CPU would not
service such request.

> The bit0 of AMBA Multiple Outbound Decomposed NP Sub-Requests Control Register(Offset:0x700 + 0x24)
>  had been set to be 1b1 in default.
>
> Note: The description of this bit.
> Enable AMBA Multiple Outbound Decomposed NP Sub- Requests.
> This bit when set to '0' disables the possibility of having multiple outstanding non-posted requests that
> were derived from decomposition of an outbound AMBA request. See Supported AXI Burst Operations for
> more details.

I think the above means that - when I disable the bit in question - I
can't issue memory read requests longer than 128 bytes (max payload
size).

This is not exactly clear to me:

> You should not clear this register unless your application master is
> requesting an amount of read data greater than Max_Read_Request_Size,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Is "completing" such a request at all possible?
The device shouldn't request more data than its (not CPU's)
Max_Read_Request_Size. I. e. if 512 is written to RTL8111's
Max_Read_Request_Size, then the Realtek chip will request max 512 bytes
at a time.

> and the remote device (or switch) is reordering completions that have
> different tags

Fortunately, such completions don't seem to be reordered.

However, I'm not sure how setting a bit in the CPU registers could help
here. I think the only way *IF* the completions were reordered would be
setting MRRS = MPS (= 128 bytes on i.MX6) - so there is nothing that
could be reordered.
diff mbox series

Patch

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..a11ec93a8cd0 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -34,6 +34,9 @@  config PCI_DOMAINS_GENERIC
 config PCI_SYSCALL
 	bool
 
+config NEED_PCIE_MAX_MRRS
+	bool
+
 source "drivers/pci/pcie/Kconfig"
 
 config PCI_MSI
diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index 423d35872ce4..b59a4c91cb3b 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -98,6 +98,7 @@  config PCI_IMX6
 	depends on ARCH_MXC || COMPILE_TEST
 	depends on PCI_MSI_IRQ_DOMAIN
 	select PCIE_DW_HOST
+	select NEED_PCIE_MAX_MRRS
 
 config PCIE_SPEAR13XX
 	bool "STMicroelectronics SPEAr PCIe controller"
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index 80fc98acf097..7a83f677a632 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1148,6 +1148,7 @@  static int imx6_pcie_probe(struct platform_device *pdev)
 		imx6_pcie->vph = NULL;
 	}
 
+	max_pcie_mrrs = 512;
 	platform_set_drvdata(pdev, imx6_pcie);
 
 	ret = imx6_pcie_attach_pd(dev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aacf575c15cf..8ed8d2e75f4c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -112,6 +112,10 @@  enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PEER2PEER;
 enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_DEFAULT;
 #endif
 
+#ifdef CONFIG_NEED_PCIE_MAX_MRRS
+u16 max_pcie_mrrs = 4096; // no limit
+#endif
+
 /*
  * The default CLS is used if arch didn't set CLS explicitly and not
  * all pci devices agree on the same value.  Arch can override either
@@ -5816,6 +5820,11 @@  int pcie_set_readrq(struct pci_dev *dev, int rq)
 			rq = mps;
 	}
 
+#ifdef CONFIG_NEED_PCIE_MAX_MRRS
+	if (rq > max_pcie_mrrs)
+		rq = max_pcie_mrrs;
+#endif
+
 	v = (ffs(rq) - 8) << 12;
 
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..7368be024c31 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -994,6 +994,7 @@  enum pcie_bus_config_types {
 };
 
 extern enum pcie_bus_config_types pcie_bus_config;
+extern u16 max_pcie_mrrs;
 
 extern struct bus_type pci_bus_type;