diff mbox series

[v4,2/3] PCI: Enable support for 10-bit Tag during device enumeration

Message ID 20230815212043.114913-3-Smita.KoralahalliChannabasappa@amd.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: pciehp: Add support for native AER and DPC handling on async remove | expand

Commit Message

Smita Koralahalli Aug. 15, 2023, 9:20 p.m. UTC
Enable support for PCI Express 10-bit Tag.

A requester may use 10-bit Tag only if its "10-bit Tag Requester Enable"
control bit (PCI_EXP_DEVCTL2_10BIT_TAG_REQ) is set. Enable 10-bit Tag
Requester Enable if the requester supports 10-bit Tag Requester capability
and its completer supports 10-bit Tag Completions.

Platform FW may enable 10-bit Tag Requester during boot for performance
reasons as per PCIe r6.0 sec 2.2.6.2 [1]. It states that "For platforms
where the RC supports 10-Bit Tag Completer capability, it is highly
recommended for platform firmware or operating software that configures
PCIe hierarchies to Set the 10-Bit Tag Requester Enable bit automatically
in Endpoints with 10-Bit Tag Requester capability".

And, failure to enable 10-bit Tag appropriately has led to issues reaching
to the device. The device became inaccessible and the port was not able to
be recovered without a system reset when a device with 10-bit Tag was
removed and replaced with a device that didn't support 10-bit Tag.

PCIe r6.0 sec 2.2.6.2 [1], also implies that:

* If a Requester sends a 10-Bit Tag Request to a Completer that lacks
10-Bit Completer capability, the returned Completion(s) will have Tags with
Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these
Tag values for 10-Bit Tags, such Completions will be handled as Unexpected
Completions, which by default are Advisory Non-Fatal Errors. The Requester
must follow standard PCI Express error handling requirements.

* In configurations where a Requester with 10-Bit Tag Requester capability
needs to target multiple Completers, one needs to ensure that the Requester
sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag Completer
capability.

Hence, ensure whether these capabilities are re-negotiated and enable them
appropriately, especially when a device is surprise removed and replaced
with a new one.

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
    https://members.pcisig.com/wg/PCI-SIG/document/16609

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/pci/pci.c             | 59 +++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h             |  1 +
 drivers/pci/probe.c           |  1 +
 include/uapi/linux/pci_regs.h |  3 ++
 4 files changed, 64 insertions(+)

Comments

Lukas Wunner Aug. 28, 2023, 9:54 a.m. UTC | #1
On Tue, Aug 15, 2023 at 09:20:42PM +0000, Smita Koralahalli wrote:
> +void pci_configure_ten_bit_tag(struct pci_dev *dev)
> +{
> +	struct pci_dev *bridge;
> +	u32 cap;
> +
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	bridge = dev->bus->self;
> +	if (!bridge)
> +		return;

I think you need to use bridge = pcie_find_root_port(dev) because
"dev" may be further down in the hierarchy with several switches
in-between it and the Root Port.

Note that pcie_find_root_port(dev) returns NULL if !pci_is_pcie(dev),
so the check above may become unnecessary.

If pcie_find_root_port(dev) == dev, then dev itself is a Root Port,
in which case you need to bail out.


> +	/*
> +	 * According to PCIe r6.0 sec 7.5.3.15, Requester Supported can only be
> +	 * set if 10-Bit Tag Completer Supported bit is set.
> +	 */
> +	pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
> +	if (!(cap & PCI_EXP_DEVCAP2_10BIT_TAG_COMP))
> +		goto out;
> +
> +	if (cap & PCI_EXP_DEVCAP2_10BIT_TAG_REQ) {

Hm, if Requester Supported cannot be set unless Completer Supported is
also set, why check for Completer Supported at all?


> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2476,6 +2476,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	pci_pm_init(dev);		/* Power Management */
>  	pci_vpd_init(dev);		/* Vital Product Data */
>  	pci_configure_ari(dev);		/* Alternative Routing-ID Forwarding */
> +	pci_configure_ten_bit_tag(dev); /* 10-bit Tag Requester */
>  	pci_iov_init(dev);		/* Single Root I/O Virtualization */
>  	pci_ats_init(dev);		/* Address Translation Services */
>  	pci_pri_init(dev);		/* Page Request Interface */

Hm, isn't this too late to disable 10-bit tags if a hot-plugged device
doesn't support it?  There are plenty of config space reads/writes
happening before pci_configure_ten_bit_tag() and if the Root Port
has 10-bit tags enabled by BIOS because a previously unplugged
device supported it, I assume the Root Port may use 10-bit tags for
those config space accesses, despite the newly hotplugged device not
supporting them?

If so, you may indeed have to unconditionally disable 10-bit tags
upon device removal and re-enable them once a 10-bit capable device
is hotplugged.

I'm wondering what happens if there are switches between the hotplugged
device and the Root Port.  In that case, there may be further devices
in the hierarchy below the Root Port.  I assume 10-bit tags can only be
enabled if *all* devices below the Root Port support them, is that correct?

The corollary would be that if there's an unoccupied hotplug port somewhere
in the hierarchy below a Root Port, 10-bit tags cannot be enabled at all
on the Root Port.  Maybe we can leave 10-bit tags enabled on hot-removal
and only disable them on hot-add?  That wouldn't work however if TLPs
are sent to the hot-added device without operating system involvement
prior to enumeration by the operating system.  Don't CXL devices
autonomously send PM messages upstream on hot-add?

There's another quagmire:  Endpoint devices may talk to each other via
p2pdma (see drivers/pci/p2pdma.c) and if either of them doesn't support
10-bit tags, we need to disable 10-bit tags on them upon commencing
p2pdma.  We may re-enable 10-bit tags once either of the devices is
hot-removed or p2pdma between them is stopped.

Finally, PCIe r6.0 added 14-bit tag support.  It may be worth adding
10-bit tag support in a way that 14-bit tag support can easily be added
later on (or is added together with 10-bit tag support).

Thanks,

Lukas
Smita Koralahalli Sept. 12, 2023, 10:39 p.m. UTC | #2
On 8/28/2023 2:54 AM, Lukas Wunner wrote:
> On Tue, Aug 15, 2023 at 09:20:42PM +0000, Smita Koralahalli wrote:
>> +void pci_configure_ten_bit_tag(struct pci_dev *dev)
>> +{
>> +	struct pci_dev *bridge;
>> +	u32 cap;
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return;
>> +
>> +	bridge = dev->bus->self;
>> +	if (!bridge)
>> +		return;
> 
> I think you need to use bridge = pcie_find_root_port(dev) because
> "dev" may be further down in the hierarchy with several switches
> in-between it and the Root Port.
> 
> Note that pcie_find_root_port(dev) returns NULL if !pci_is_pcie(dev),
> so the check above may become unnecessary.
> 
> If pcie_find_root_port(dev) == dev, then dev itself is a Root Port,
> in which case you need to bail out.

Will fix thanks!

> 
> 
>> +	/*
>> +	 * According to PCIe r6.0 sec 7.5.3.15, Requester Supported can only be
>> +	 * set if 10-Bit Tag Completer Supported bit is set.
>> +	 */
>> +	pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>> +	if (!(cap & PCI_EXP_DEVCAP2_10BIT_TAG_COMP))
>> +		goto out;
>> +
>> +	if (cap & PCI_EXP_DEVCAP2_10BIT_TAG_REQ) {
> 
> Hm, if Requester Supported cannot be set unless Completer Supported is
> also set, why check for Completer Supported at all?

Makes sense to me. Will change.
> 
> 
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2476,6 +2476,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	pci_pm_init(dev);		/* Power Management */
>>   	pci_vpd_init(dev);		/* Vital Product Data */
>>   	pci_configure_ari(dev);		/* Alternative Routing-ID Forwarding */
>> +	pci_configure_ten_bit_tag(dev); /* 10-bit Tag Requester */
>>   	pci_iov_init(dev);		/* Single Root I/O Virtualization */
>>   	pci_ats_init(dev);		/* Address Translation Services */
>>   	pci_pri_init(dev);		/* Page Request Interface */
> 
> Hm, isn't this too late to disable 10-bit tags if a hot-plugged device
> doesn't support it?  There are plenty of config space reads/writes
> happening before pci_configure_ten_bit_tag() and if the Root Port
> has 10-bit tags enabled by BIOS because a previously unplugged
> device supported it, I assume the Root Port may use 10-bit tags for
> those config space accesses, despite the newly hotplugged device not
> supporting them?
> 
> If so, you may indeed have to unconditionally disable 10-bit tags
> upon device removal and re-enable them once a 10-bit capable device
> is hotplugged.
> 
> I'm wondering what happens if there are switches between the hotplugged
> device and the Root Port.  In that case, there may be further devices
> in the hierarchy below the Root Port.  I assume 10-bit tags can only be
> enabled if *all* devices below the Root Port support them, is that correct?

You are right! I understand I missed the consideration of involving 
switches and hierarchical PCIe structures.

> 
> The corollary would be that if there's an unoccupied hotplug port somewhere
> in the hierarchy below a Root Port, 10-bit tags cannot be enabled at all
> on the Root Port.

Yes, but the BIOS might have already enabled 10-bit tags on root port 
before this hotplug port becomes unoccupied on hot-remove.

   Maybe we can leave 10-bit tags enabled on hot-removal
> and only disable them on hot-add?

Considering all the challenges, would you think we should 
unconditionally clear 10-bit tags on remove and enable them on add? 
Because the TLPs issue will exist even if we leave the tags enabled on 
removal. Disabling unconditionally would atleast resolve config space 
read/writes and p2pdma issues. What do you think?

  That wouldn't work however if TLPs
> are sent to the hot-added device without operating system involvement
> prior to enumeration by the operating system.  Don't CXL devices
> autonomously send PM messages upstream on hot-add?
> 
> There's another quagmire:  Endpoint devices may talk to each other via
> p2pdma (see drivers/pci/p2pdma.c) and if either of them doesn't support
> 10-bit tags, we need to disable 10-bit tags on them upon commencing
> p2pdma.  We may re-enable 10-bit tags once either of the devices is
> hot-removed or p2pdma between them is stopped.
> 
> Finally, PCIe r6.0 added 14-bit tag support.  It may be worth adding
> 10-bit tag support in a way that 14-bit tag support can easily be added
> later on (or is added together with 10-bit tag support).

I agree.

Thanks,
Smita

> 
> Thanks,
> 
> Lukas
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0..7e640694fa03 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3795,6 +3795,65 @@  int pci_rebar_set_size(struct pci_dev *pdev, int bar, int size)
 	return 0;
 }
 
+/*
+ * pci_configure_ten_bit_tag - enable or disable 10-bit Tag Requester
+ * @dev: the PCI device
+ */
+void pci_configure_ten_bit_tag(struct pci_dev *dev)
+{
+	struct pci_dev *bridge;
+	u32 cap;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	bridge = dev->bus->self;
+	if (!bridge)
+		return;
+
+	/*
+	 * According to PCIe r6.0 sec 7.5.3.16, the result is undefined if
+	 * the value of this bit is changed while the Function has outstanding
+	 * Non-Posted Requests.
+	 */
+	if (!pci_wait_for_pending_transaction(dev)) {
+		pci_info(dev, "Transaction in progress, 10-bit Tag not configured properly\n");
+		return;
+	}
+
+	/*
+	 * According to PCIe r6.0 sec 7.5.3.15, Requester Supported can only be
+	 * set if 10-Bit Tag Completer Supported bit is set.
+	 */
+	pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
+	if (!(cap & PCI_EXP_DEVCAP2_10BIT_TAG_COMP))
+		goto out;
+
+	if (cap & PCI_EXP_DEVCAP2_10BIT_TAG_REQ) {
+		pcie_capability_read_dword(dev, PCI_EXP_DEVCAP2, &cap);
+
+		if (!(cap & PCI_EXP_DEVCAP2_10BIT_TAG_COMP))
+			goto out;
+
+		pcie_capability_set_word(bridge, PCI_EXP_DEVCTL2,
+					 PCI_EXP_DEVCTL2_10BIT_TAG_REQ);
+
+		if (cap & PCI_EXP_DEVCAP2_10BIT_TAG_REQ)
+			pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
+						 PCI_EXP_DEVCTL2_10BIT_TAG_REQ);
+		else
+			pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2,
+						   PCI_EXP_DEVCTL2_10BIT_TAG_REQ);
+		return;
+	}
+
+out:
+	pcie_capability_clear_word(bridge, PCI_EXP_DEVCTL2,
+				   PCI_EXP_DEVCTL2_10BIT_TAG_REQ);
+	pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2,
+				   PCI_EXP_DEVCTL2_10BIT_TAG_REQ);
+}
+
 /**
  * pci_enable_atomic_ops_to_root - enable AtomicOp requests to root port
  * @dev: the PCI device
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a4c397434057..dee6241878fc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -239,6 +239,7 @@  int pci_setup_device(struct pci_dev *dev);
 int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 		    struct resource *res, unsigned int reg);
 void pci_configure_ari(struct pci_dev *dev);
+void pci_configure_ten_bit_tag(struct pci_dev *dev);
 void __pci_bus_size_bridges(struct pci_bus *bus,
 			struct list_head *realloc_head);
 void __pci_bus_assign_resources(const struct pci_bus *bus,
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8bac3ce02609..5a3c1ec6fad6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2476,6 +2476,7 @@  static void pci_init_capabilities(struct pci_dev *dev)
 	pci_pm_init(dev);		/* Power Management */
 	pci_vpd_init(dev);		/* Vital Product Data */
 	pci_configure_ari(dev);		/* Alternative Routing-ID Forwarding */
+	pci_configure_ten_bit_tag(dev); /* 10-bit Tag Requester */
 	pci_iov_init(dev);		/* Single Root I/O Virtualization */
 	pci_ats_init(dev);		/* Address Translation Services */
 	pci_pri_init(dev);		/* Page Request Interface */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index e5f558d96493..b0a41c987ac5 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -656,6 +656,8 @@ 
 #define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion */
 #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
 #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
+#define  PCI_EXP_DEVCAP2_10BIT_TAG_COMP 0x00010000 /* 10-bit Tag Completer */
+#define  PCI_EXP_DEVCAP2_10BIT_TAG_REQ  0x00020000 /* 10-bit Tag Requester */
 #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
 #define  PCI_EXP_DEVCAP2_OBFF_WAKE	0x00080000 /* Re-use WAKE# for OBFF */
 #define  PCI_EXP_DEVCAP2_EE_PREFIX	0x00200000 /* End-End TLP Prefix */
@@ -668,6 +670,7 @@ 
 #define  PCI_EXP_DEVCTL2_IDO_REQ_EN	0x0100	/* Allow IDO for requests */
 #define  PCI_EXP_DEVCTL2_IDO_CMP_EN	0x0200	/* Allow IDO for completions */
 #define  PCI_EXP_DEVCTL2_LTR_EN		0x0400	/* Enable LTR mechanism */
+#define  PCI_EXP_DEVCTL2_10BIT_TAG_REQ  0x1000  /* Enable 10-bit Tag Requester */
 #define  PCI_EXP_DEVCTL2_OBFF_MSGA_EN	0x2000	/* Enable OBFF Message type A */
 #define  PCI_EXP_DEVCTL2_OBFF_MSGB_EN	0x4000	/* Enable OBFF Message type B */
 #define  PCI_EXP_DEVCTL2_OBFF_WAKE_EN	0x6000	/* OBFF using WAKE# signaling */