diff mbox

[29/37] PCI: Add pci_enable_atomic_ops_to_root

Message ID 1512792555-26042-30-git-send-email-Felix.Kuehling@amd.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Felix Kuehling Dec. 9, 2017, 4:09 a.m. UTC
From: Jay Cornwall <Jay.Cornwall@amd.com>

The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be
requested by, routed through and completed by PCIe components. Routing and
completion do not require software support. Component support for each is
detectable via the DEVCAP2 register.

AtomicOp requests are permitted only if a component's
DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
detected but is a no-op if set on a component with no support. These
requests can only be serviced if the upstream components support AtomicOp
completion and/or routing to a component which does.

A concrete example is the AMD Fiji-class GPU, which is specified to
support AtomicOp requests, routed through a PLX 8747 switch (advertising
AtomicOp routing) to a Haswell host bridge (advertising AtomicOp
completion support). When AtomicOp requests are disabled the GPU logs
attempts to initiate requests to an MMIO register for debugging.

Add pci_enable_atomic_ops_to_root for per-device control over AtomicOp
requests. Upstream bridges are checked for AtomicOp routing capability and
the call fails if any lack this capability. The root port is checked for
AtomicOp completion capabilities and the call fails if it does not support
any. Routes to other PCIe components are not checked for AtomicOp routing
and completion capabilities.

v2: Check for AtomicOp route to root port with AtomicOp completion
v3: Style fixes
v4: Endpoint to root port only, check upstream egress blocking
v5: Rebase, use existing PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK define

CC: linux-pci@vger.kernel.org
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/pci/pci.c             | 81 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  2 ++
 3 files changed, 84 insertions(+)

Comments

Bjorn Helgaas Dec. 12, 2017, 11:27 p.m. UTC | #1
[+cc Ram, Michal, Ariel, Doug, Jason]

The [29/37] in the subject makes it look like this is part of a larger
series, but I can't find the rest of it on linux-pci or linux-kernel.

I don't want to merge a new interface unless there's an in-tree user
of it.  I assume the rest of the series includes a user.

On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
> From: Jay Cornwall <Jay.Cornwall@amd.com>
> 
> The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be
> requested by, routed through and completed by PCIe components. Routing and
> completion do not require software support. Component support for each is
> detectable via the DEVCAP2 register.
> 
> AtomicOp requests are permitted only if a component's
> DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
> detected but is a no-op if set on a component with no support. These
> requests can only be serviced if the upstream components support AtomicOp
> completion and/or routing to a component which does.
> 
> A concrete example is the AMD Fiji-class GPU, which is specified to
> support AtomicOp requests, routed through a PLX 8747 switch (advertising
> AtomicOp routing) to a Haswell host bridge (advertising AtomicOp
> completion support). When AtomicOp requests are disabled the GPU logs
> attempts to initiate requests to an MMIO register for debugging.
> 
> Add pci_enable_atomic_ops_to_root for per-device control over AtomicOp
> requests. Upstream bridges are checked for AtomicOp routing capability and
> the call fails if any lack this capability. The root port is checked for
> AtomicOp completion capabilities and the call fails if it does not support
> any. Routes to other PCIe components are not checked for AtomicOp routing
> and completion capabilities.
> 
> v2: Check for AtomicOp route to root port with AtomicOp completion
> v3: Style fixes
> v4: Endpoint to root port only, check upstream egress blocking
> v5: Rebase, use existing PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK define
> 
> CC: linux-pci@vger.kernel.org
> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/pci/pci.c             | 81 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h           |  1 +
>  include/uapi/linux/pci_regs.h |  2 ++
>  3 files changed, 84 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 6078dfc..89a8bb0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2966,6 +2966,87 @@ bool pci_acs_path_enabled(struct pci_dev *start,
>  }
>  
>  /**
> + * pci_enable_atomic_ops_to_root - enable AtomicOp requests to root port
> + * @dev: the PCI device
> + *
> + * Return 0 if the device is capable of generating AtomicOp requests,

I don't believe this part.

You return 0 if the upstream path can route AtomicOps and the Root
Port can complete them.  But there's nothing here that's conditional
on capabilities of *dev*.

You could read back PCI_EXP_DEVCTL2 to see if
PCI_EXP_DEVCTL2_ATOMIC_REQ was writable, but even then, you can't
really tell what the device is capable of.

> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
> + * 128-bit AtomicOp completion, or negative otherwise.
> + */
> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
> +{
> +	struct pci_bus *bus = dev->bus;
> +
> +	if (!pci_is_pcie(dev))
> +		return -EINVAL;
> +
> +	switch (pci_pcie_type(dev)) {
> +	/*
> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
> +	 * to implement AtomicOp requester capabilities.
> +	 */
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	case PCI_EXP_TYPE_LEG_END:
> +	case PCI_EXP_TYPE_RC_END:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	while (bus->parent) {
> +		struct pci_dev *bridge = bus->self;
> +		u32 cap;
> +
> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
> +
> +		switch (pci_pcie_type(bridge)) {
> +		/*
> +		 * Upstream, downstream and root ports may implement AtomicOp
> +		 * routing capabilities. AtomicOp routing via a root port is
> +		 * not considered.
> +		 */
> +		case PCI_EXP_TYPE_UPSTREAM:
> +		case PCI_EXP_TYPE_DOWNSTREAM:
> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> +				return -EINVAL;
> +			break;
> +
> +		/*
> +		 * Root ports are permitted to implement AtomicOp completion
> +		 * capabilities.
> +		 */
> +		case PCI_EXP_TYPE_ROOT_PORT:
> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
> +				return -EINVAL;
> +			break;
> +		}

IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
initiate AtomicOps that target system memory.  This interface
(pci_enable_atomic_ops_to_root()) doesn't specify what size operations
the driver wants to do.  If the GPU requests a 128-bit op and the Root
Port doesn't support it, I think we'll see an Unsupported Request
error.

Do you need to extend this interface so the driver can specify what
sizes it wants?

The existing code in qedr_pci_set_atomic() is very similar.  We should
make this new interface work for both places, then actually use it in
qedr_pci_set_atomic().

> +
> +		/*
> +		 * Upstream ports may block AtomicOps on egress.
> +		 */
> +		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {

pci_pcie_type() is not a reliable method for determining the function
of a switch port.  There are systems where the upstream port is
labeled as a downstream port, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c8fc9339409d

> +			u32 ctl2;
> +
> +			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
> +						   &ctl2);
> +			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
> +				return -EINVAL;
> +		}
> +
> +		bus = bus->parent;
> +	}
> +
> +	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
> +				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
> +
> +/**
>   * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
>   * @dev: the PCI device
>   * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTC, 4=INTD)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f4f8ee5..2a39f63 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2062,6 +2062,7 @@ void pci_request_acs(void);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
>  			  struct pci_dev *end, u16 acs_flags);
> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev);
>  
>  #define PCI_VPD_LRDT			0x80	/* Large Resource Data Type */
>  #define PCI_VPD_LRDT_ID(x)		((x) | PCI_VPD_LRDT)
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f8d5804..45f251a 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -623,7 +623,9 @@
>  #define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
>  #define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
>  #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE	0x00000040 /* Atomic Op routing */
> +#define  PCI_EXP_DEVCAP2_ATOMIC_COMP32	0x00000080 /* 32b AtomicOp completion */
>  #define PCI_EXP_DEVCAP2_ATOMIC_COMP64	0x00000100 /* Atomic 64-bit compare */
> +#define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion*/

The comments should be similar.  I think yours are better than the
original, so please change the original to

  /* 64b AtomicOp completion */

so they all match.

>  #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
>  #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
>  #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
> -- 
> 2.7.4
>
Jason Gunthorpe Dec. 12, 2017, 11:42 p.m. UTC | #2
On Tue, Dec 12, 2017 at 05:27:07PM -0600, Bjorn Helgaas wrote:
> [+cc Ram, Michal, Ariel, Doug, Jason]
> 
> The [29/37] in the subject makes it look like this is part of a larger
> series, but I can't find the rest of it on linux-pci or linux-kernel.

Didn't find the cover letter, but the AMD patchworks captured the series..

https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/

> I don't want to merge a new interface unless there's an in-tree user
> of it.  I assume the rest of the series includes a user.

Looks like it.

I would also guess we will see users in drivers/infiniband emerge as
CPU coherent atomics are also a topic our hardware drivers will be
interested in. But I am not aware of any pending patches.

Jason
Oded Gabbay Dec. 13, 2017, 7:22 a.m. UTC | #3
On Wed, Dec 13, 2017 at 1:42 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Dec 12, 2017 at 05:27:07PM -0600, Bjorn Helgaas wrote:
>> [+cc Ram, Michal, Ariel, Doug, Jason]
>>
>> The [29/37] in the subject makes it look like this is part of a larger
>> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> Didn't find the cover letter, but the AMD patchworks captured the series..
>
> https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/

Hi,
This patchset is mainly for the amdkfd driver, which is used for
running HSA Framework on AMD's APUs and in the near future, dGPUs.
This driver has been in the kernel since 3.19.
PCIe atomics were not required for APUs because there GPU part is
integrated with the CPU and they have atomic accesses between them.

For enabling HSA on dGPUs (such as Fiji, Vega, Polaris) which connect
through PCIe, we need to have PCIe atomics support.
The patchset starts to upstream the dGPU support and one of the
pre-requisites is the patch in discussion.

>
>> I don't want to merge a new interface unless there's an in-tree user
>> of it.  I assume the rest of the series includes a user.
>
> Looks like it.
So, yes, there is a user in the kernel and there is an entire
open-source userspace framework around it, called ROCm
(https://github.com/RadeonOpenCompute/ROCm)

Oded

>
> I would also guess we will see users in drivers/infiniband emerge as
> CPU coherent atomics are also a topic our hardware drivers will be
> interested in. But I am not aware of any pending patches.
>
> Jason
Felix Kuehling Jan. 2, 2018, 11:41 p.m. UTC | #4
On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
> [+cc Ram, Michal, Ariel, Doug, Jason]
>
> The [29/37] in the subject makes it look like this is part of a larger
> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> I don't want to merge a new interface unless there's an in-tree user
> of it.  I assume the rest of the series includes a user.
>
> On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
[snip]
>> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
>> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
>> + * 128-bit AtomicOp completion, or negative otherwise.
>> + */
>> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
>> +{
>> +	struct pci_bus *bus = dev->bus;
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return -EINVAL;
>> +
>> +	switch (pci_pcie_type(dev)) {
>> +	/*
>> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
>> +	 * to implement AtomicOp requester capabilities.
>> +	 */
>> +	case PCI_EXP_TYPE_ENDPOINT:
>> +	case PCI_EXP_TYPE_LEG_END:
>> +	case PCI_EXP_TYPE_RC_END:
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	while (bus->parent) {
>> +		struct pci_dev *bridge = bus->self;
>> +		u32 cap;
>> +
>> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>> +
>> +		switch (pci_pcie_type(bridge)) {
>> +		/*
>> +		 * Upstream, downstream and root ports may implement AtomicOp
>> +		 * routing capabilities. AtomicOp routing via a root port is
>> +		 * not considered.
>> +		 */
>> +		case PCI_EXP_TYPE_UPSTREAM:
>> +		case PCI_EXP_TYPE_DOWNSTREAM:
>> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
>> +				return -EINVAL;
>> +			break;
>> +
>> +		/*
>> +		 * Root ports are permitted to implement AtomicOp completion
>> +		 * capabilities.
>> +		 */
>> +		case PCI_EXP_TYPE_ROOT_PORT:
>> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
>> +				return -EINVAL;
>> +			break;
>> +		}
> IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
> initiate AtomicOps that target system memory.  This interface
> (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
> the driver wants to do.  If the GPU requests a 128-bit op and the Root
> Port doesn't support it, I think we'll see an Unsupported Request
> error.
>
> Do you need to extend this interface so the driver can specify what
> sizes it wants?
>
> The existing code in qedr_pci_set_atomic() is very similar.  We should
> make this new interface work for both places, then actually use it in
> qedr_pci_set_atomic().

Hi Bjorn, Doug, Ram,

I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
seems to use a different criteria to find the completer for atomic
requests. Jay's function expects the root port to have a parent, which
was the case on the systems he tested. But Ram's function looks for a
bridge without a parent and checks completion capabilities on that. Jay
believes that to be a root complex, not a root port.

According to the spec, "Root ports are permitted to implement AtomicOp
completion capabilities." It talks about a root port, not a root complex.

Can you help us understand, which interpretation is correct? And how to
correctly identify the root port for checking completion capabilities?
Are there valid topologies where a root port does not have a parent?

Regards,
  Felix
Bjorn Helgaas Jan. 4, 2018, 11:40 p.m. UTC | #5
On Tue, Jan 02, 2018 at 06:41:17PM -0500, Felix Kuehling wrote:
> On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
> > [+cc Ram, Michal, Ariel, Doug, Jason]
> >
> > The [29/37] in the subject makes it look like this is part of a larger
> > series, but I can't find the rest of it on linux-pci or linux-kernel.
> >
> > I don't want to merge a new interface unless there's an in-tree user
> > of it.  I assume the rest of the series includes a user.
> >
> > On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
> [snip]
> >> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
> >> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
> >> + * 128-bit AtomicOp completion, or negative otherwise.
> >> + */
> >> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
> >> +{
> >> +	struct pci_bus *bus = dev->bus;
> >> +
> >> +	if (!pci_is_pcie(dev))
> >> +		return -EINVAL;
> >> +
> >> +	switch (pci_pcie_type(dev)) {
> >> +	/*
> >> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
> >> +	 * to implement AtomicOp requester capabilities.
> >> +	 */
> >> +	case PCI_EXP_TYPE_ENDPOINT:
> >> +	case PCI_EXP_TYPE_LEG_END:
> >> +	case PCI_EXP_TYPE_RC_END:
> >> +		break;
> >> +	default:
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	while (bus->parent) {
> >> +		struct pci_dev *bridge = bus->self;
> >> +		u32 cap;
> >> +
> >> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
> >> +
> >> +		switch (pci_pcie_type(bridge)) {
> >> +		/*
> >> +		 * Upstream, downstream and root ports may implement AtomicOp
> >> +		 * routing capabilities. AtomicOp routing via a root port is
> >> +		 * not considered.
> >> +		 */
> >> +		case PCI_EXP_TYPE_UPSTREAM:
> >> +		case PCI_EXP_TYPE_DOWNSTREAM:
> >> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> >> +				return -EINVAL;
> >> +			break;
> >> +
> >> +		/*
> >> +		 * Root ports are permitted to implement AtomicOp completion
> >> +		 * capabilities.
> >> +		 */
> >> +		case PCI_EXP_TYPE_ROOT_PORT:
> >> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
> >> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
> >> +				return -EINVAL;
> >> +			break;
> >> +		}
> > IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
> > initiate AtomicOps that target system memory.  This interface
> > (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
> > the driver wants to do.  If the GPU requests a 128-bit op and the Root
> > Port doesn't support it, I think we'll see an Unsupported Request
> > error.
> >
> > Do you need to extend this interface so the driver can specify what
> > sizes it wants?
> >
> > The existing code in qedr_pci_set_atomic() is very similar.  We should
> > make this new interface work for both places, then actually use it in
> > qedr_pci_set_atomic().
> 
> Hi Bjorn, Doug, Ram,
> 
> I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
> seems to use a different criteria to find the completer for atomic
> requests. Jay's function expects the root port to have a parent, which
> was the case on the systems he tested. But Ram's function looks for a
> bridge without a parent and checks completion capabilities on that. Jay
> believes that to be a root complex, not a root port.

By "Ram's function", I guess you mean qedr_pci_set_atomic()?

That starts with a PCIe device ("pdev"; it assumes but does not check
that this is a PCIe device), and traverses through all the bridges
leading to it.  Usually this will be:

  endpoint -> root port
  endpoint -> switch downstream port -> switch upstream port -> root port

Or there may be additional switches in the middle.  The code is
actually not quite correct because it is legal to have this:

  endpoint -> PCI-to-PCIe bridge -> conventional PCI bridge -> ...

and qedr_pci_set_atomic() will traverse up through the conventional
part of the hierarchy, where there is no PCI_EXP_DEVCAP2.

In general, a Root Port is the root of a PCIe hierarchy and there is
no parent device.  E.g., on my laptop:

  00:1c.0 Intel Root Port (bridge to [bus 02])
  00:1c.2 Intel Root Port (bridge to [bus 04])

What sort of parent do you expect?  As I mentioned, it's legal to have
a PCI/PCI-X to PCIe bridge inside a conventional PCI hierarchy, but
that's a little unusual.

> According to the spec, "Root ports are permitted to implement AtomicOp
> completion capabilities." It talks about a root port, not a root complex.
> 
> Can you help us understand, which interpretation is correct? And how to
> correctly identify the root port for checking completion capabilities?

If you start with a PCIe device and traverse upstream, you should
eventually reach a Root Port or a PCI/PCI-X to PCIe bridge.

> Are there valid topologies where a root port does not have a parent?

I don't understand this because Root Ports normally do not have
parents.

PCIe devices other than Root Ports normally have a Root Port (or
PCI/PCI-X to PCIe bridge) at the root of the PCIe hierarchy, but there
are definitely exceptions.

For example, there some systems where the Root Port is not visible to
Linux, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ee8bdfb6568d
On systems like that, I don't think you can safely use AtomicOps.

Bjorn
Felix Kuehling Jan. 5, 2018, 12:09 a.m. UTC | #6
Hi Bjorn,

I figured it out. The difference between the functions is whether they
use struct pci_bus * or struct pci_dev * as cursor. I found that the
functions are in fact equivalent. The last loop iteration in
pci_enable_atomic_ops_to_root is equivalent to the code after the loop
in qedr_pci_set_atomic. Both handle the root port.

I think my confusion was based on an incorrect assumption that
bridge->bus->self is the same as bridge. But brigde->bus is in fact the
parent bus.

I sent out an updated patch that addresses your comments to the previous
version. This should be general enough that it can replace
qedr_pci_set_atomic.

Regards,
  Felix


On 2018-01-04 06:40 PM, Bjorn Helgaas wrote:
> On Tue, Jan 02, 2018 at 06:41:17PM -0500, Felix Kuehling wrote:
>> On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
>>> [+cc Ram, Michal, Ariel, Doug, Jason]
>>>
>>> The [29/37] in the subject makes it look like this is part of a larger
>>> series, but I can't find the rest of it on linux-pci or linux-kernel.
>>>
>>> I don't want to merge a new interface unless there's an in-tree user
>>> of it.  I assume the rest of the series includes a user.
>>>
>>> On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
>> [snip]
>>>> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
>>>> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
>>>> + * 128-bit AtomicOp completion, or negative otherwise.
>>>> + */
>>>> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
>>>> +{
>>>> +	struct pci_bus *bus = dev->bus;
>>>> +
>>>> +	if (!pci_is_pcie(dev))
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (pci_pcie_type(dev)) {
>>>> +	/*
>>>> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
>>>> +	 * to implement AtomicOp requester capabilities.
>>>> +	 */
>>>> +	case PCI_EXP_TYPE_ENDPOINT:
>>>> +	case PCI_EXP_TYPE_LEG_END:
>>>> +	case PCI_EXP_TYPE_RC_END:
>>>> +		break;
>>>> +	default:
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	while (bus->parent) {
>>>> +		struct pci_dev *bridge = bus->self;
>>>> +		u32 cap;
>>>> +
>>>> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>>>> +
>>>> +		switch (pci_pcie_type(bridge)) {
>>>> +		/*
>>>> +		 * Upstream, downstream and root ports may implement AtomicOp
>>>> +		 * routing capabilities. AtomicOp routing via a root port is
>>>> +		 * not considered.
>>>> +		 */
>>>> +		case PCI_EXP_TYPE_UPSTREAM:
>>>> +		case PCI_EXP_TYPE_DOWNSTREAM:
>>>> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
>>>> +				return -EINVAL;
>>>> +			break;
>>>> +
>>>> +		/*
>>>> +		 * Root ports are permitted to implement AtomicOp completion
>>>> +		 * capabilities.
>>>> +		 */
>>>> +		case PCI_EXP_TYPE_ROOT_PORT:
>>>> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>>>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
>>>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
>>>> +				return -EINVAL;
>>>> +			break;
>>>> +		}
>>> IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
>>> initiate AtomicOps that target system memory.  This interface
>>> (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
>>> the driver wants to do.  If the GPU requests a 128-bit op and the Root
>>> Port doesn't support it, I think we'll see an Unsupported Request
>>> error.
>>>
>>> Do you need to extend this interface so the driver can specify what
>>> sizes it wants?
>>>
>>> The existing code in qedr_pci_set_atomic() is very similar.  We should
>>> make this new interface work for both places, then actually use it in
>>> qedr_pci_set_atomic().
>> Hi Bjorn, Doug, Ram,
>>
>> I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
>> seems to use a different criteria to find the completer for atomic
>> requests. Jay's function expects the root port to have a parent, which
>> was the case on the systems he tested. But Ram's function looks for a
>> bridge without a parent and checks completion capabilities on that. Jay
>> believes that to be a root complex, not a root port.
> By "Ram's function", I guess you mean qedr_pci_set_atomic()?
>
> That starts with a PCIe device ("pdev"; it assumes but does not check
> that this is a PCIe device), and traverses through all the bridges
> leading to it.  Usually this will be:
>
>   endpoint -> root port
>   endpoint -> switch downstream port -> switch upstream port -> root port
>
> Or there may be additional switches in the middle.  The code is
> actually not quite correct because it is legal to have this:
>
>   endpoint -> PCI-to-PCIe bridge -> conventional PCI bridge -> ...
>
> and qedr_pci_set_atomic() will traverse up through the conventional
> part of the hierarchy, where there is no PCI_EXP_DEVCAP2.
>
> In general, a Root Port is the root of a PCIe hierarchy and there is
> no parent device.  E.g., on my laptop:
>
>   00:1c.0 Intel Root Port (bridge to [bus 02])
>   00:1c.2 Intel Root Port (bridge to [bus 04])
>
> What sort of parent do you expect?  As I mentioned, it's legal to have
> a PCI/PCI-X to PCIe bridge inside a conventional PCI hierarchy, but
> that's a little unusual.
>
>> According to the spec, "Root ports are permitted to implement AtomicOp
>> completion capabilities." It talks about a root port, not a root complex.
>>
>> Can you help us understand, which interpretation is correct? And how to
>> correctly identify the root port for checking completion capabilities?
> If you start with a PCIe device and traverse upstream, you should
> eventually reach a Root Port or a PCI/PCI-X to PCIe bridge.
>
>> Are there valid topologies where a root port does not have a parent?
> I don't understand this because Root Ports normally do not have
> parents.
>
> PCIe devices other than Root Ports normally have a Root Port (or
> PCI/PCI-X to PCIe bridge) at the root of the PCIe hierarchy, but there
> are definitely exceptions.
>
> For example, there some systems where the Root Port is not visible to
> Linux, e.g.,
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ee8bdfb6568d
> On systems like that, I don't think you can safely use AtomicOps.
>
> Bjorn
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6078dfc..89a8bb0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2966,6 +2966,87 @@  bool pci_acs_path_enabled(struct pci_dev *start,
 }
 
 /**
+ * pci_enable_atomic_ops_to_root - enable AtomicOp requests to root port
+ * @dev: the PCI device
+ *
+ * Return 0 if the device is capable of generating AtomicOp requests,
+ * all upstream bridges support AtomicOp routing, egress blocking is disabled
+ * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
+ * 128-bit AtomicOp completion, or negative otherwise.
+ */
+int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
+{
+	struct pci_bus *bus = dev->bus;
+
+	if (!pci_is_pcie(dev))
+		return -EINVAL;
+
+	switch (pci_pcie_type(dev)) {
+	/*
+	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
+	 * to implement AtomicOp requester capabilities.
+	 */
+	case PCI_EXP_TYPE_ENDPOINT:
+	case PCI_EXP_TYPE_LEG_END:
+	case PCI_EXP_TYPE_RC_END:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	while (bus->parent) {
+		struct pci_dev *bridge = bus->self;
+		u32 cap;
+
+		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
+
+		switch (pci_pcie_type(bridge)) {
+		/*
+		 * Upstream, downstream and root ports may implement AtomicOp
+		 * routing capabilities. AtomicOp routing via a root port is
+		 * not considered.
+		 */
+		case PCI_EXP_TYPE_UPSTREAM:
+		case PCI_EXP_TYPE_DOWNSTREAM:
+			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
+				return -EINVAL;
+			break;
+
+		/*
+		 * Root ports are permitted to implement AtomicOp completion
+		 * capabilities.
+		 */
+		case PCI_EXP_TYPE_ROOT_PORT:
+			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
+				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
+				return -EINVAL;
+			break;
+		}
+
+		/*
+		 * Upstream ports may block AtomicOps on egress.
+		 */
+		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {
+			u32 ctl2;
+
+			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
+						   &ctl2);
+			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
+				return -EINVAL;
+		}
+
+		bus = bus->parent;
+	}
+
+	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
+				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
+
+	return 0;
+}
+EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
+
+/**
  * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
  * @dev: the PCI device
  * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTC, 4=INTD)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f4f8ee5..2a39f63 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2062,6 +2062,7 @@  void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
 			  struct pci_dev *end, u16 acs_flags);
+int pci_enable_atomic_ops_to_root(struct pci_dev *dev);
 
 #define PCI_VPD_LRDT			0x80	/* Large Resource Data Type */
 #define PCI_VPD_LRDT_ID(x)		((x) | PCI_VPD_LRDT)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f8d5804..45f251a 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -623,7 +623,9 @@ 
 #define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
 #define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
 #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE	0x00000040 /* Atomic Op routing */
+#define  PCI_EXP_DEVCAP2_ATOMIC_COMP32	0x00000080 /* 32b AtomicOp completion */
 #define PCI_EXP_DEVCAP2_ATOMIC_COMP64	0x00000100 /* Atomic 64-bit compare */
+#define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion*/
 #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
 #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
 #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */