diff mbox series

[V9,2/5] PCI: Create device tree node for selected devices

Message ID 1687368849-36722-3-git-send-email-lizhi.hou@amd.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Generate device tree node for pci devices | expand

Commit Message

Lizhi Hou June 21, 2023, 5:34 p.m. UTC
The PCI endpoint device such as Xilinx Alveo PCI card maps the register
spaces from multiple hardware peripherals to its PCI BAR. Normally,
the PCI core discovers devices and BARs using the PCI enumeration process.
There is no infrastructure to discover the hardware peripherals that are
present in a PCI device, and which can be accessed through the PCI BARs.

For Alveo PCI card, the card firmware provides a flattened device tree to
describe the hardware peripherals on its BARs. The Alveo card driver can
load this flattened device tree and leverage device tree framework to
generate platform devices for the hardware peripherals eventually.

Apparently, the device tree framework requires a device tree node for the
PCI device. Thus, it can generate the device tree nodes for hardware
peripherals underneath. Because PCI is self discoverable bus, there might
not be a device tree node created for PCI devices. This patch is to add
support to generate device tree node for PCI devices.

Added a kernel option. When the option is turned on, the kernel will
generate device tree nodes for PCI bridges unconditionally.

Initially, the basic properties are added for the dynamically generated
device tree nodes.

Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
 drivers/pci/Kconfig       |  12 +++
 drivers/pci/Makefile      |   1 +
 drivers/pci/bus.c         |   2 +
 drivers/pci/of.c          |  81 +++++++++++++++-
 drivers/pci/of_property.c | 194 ++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h         |  19 ++++
 drivers/pci/remove.c      |   1 +
 7 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/of_property.c

Comments

Bjorn Helgaas June 21, 2023, 8:22 p.m. UTC | #1
In subject, IIUC this patch does not actually create device tree nodes
for selected devices.  It looks like it:

  - Adds an of_pci_make_dev_node() *interface* that can be used to
    create this node

  - Creates such a node for *every* bridge

  - Does nothing at all for "selected devices" or the Xilinx Alveo

On Wed, Jun 21, 2023 at 10:34:06AM -0700, Lizhi Hou wrote:
> The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> spaces from multiple hardware peripherals to its PCI BAR. Normally,
> the PCI core discovers devices and BARs using the PCI enumeration process.
> There is no infrastructure to discover the hardware peripherals that are
> present in a PCI device, and which can be accessed through the PCI BARs.
> 
> For Alveo PCI card, the card firmware provides a flattened device tree to
> describe the hardware peripherals on its BARs. The Alveo card driver can
> load this flattened device tree and leverage device tree framework to
> generate platform devices for the hardware peripherals eventually.

The Alveo details are relevant to the quirk patch but not to *this*
patch.

But the reason for creating a node for every bridge device *is*
relevant and should be included here, since that change affects
everybody that uses OF.

> Apparently, the device tree framework requires a device tree node for the
> PCI device. Thus, it can generate the device tree nodes for hardware
> peripherals underneath. Because PCI is self discoverable bus, there might
> not be a device tree node created for PCI devices. This patch is to add
> support to generate device tree node for PCI devices.

s/This patch is to add/Add/

> Added a kernel option. When the option is turned on, the kernel will
> generate device tree nodes for PCI bridges unconditionally.

s/Added a kernel option/Add a PCI_DYNAMIC_OF_NODES config option/
(Be specific, and way what the patch does, not what you did.)

> Initially, the basic properties are added for the dynamically generated
> device tree nodes.

Make this specific, e.g., list the specific properties added.

> +config PCI_DYNAMIC_OF_NODES
> +	bool "Create Devicetree nodes for PCI devices"
> +	depends on OF
> +	select OF_DYNAMIC
> +	help
> +	  This option enables support for generating device tree nodes for some
> +	  PCI devices. Thus, the driver of this kind can load and overlay
> +	  flattened device tree for its downstream devices.
> +
> +	  Once this option is selected, the device tree nodes will be generated
> +	  for all PCI bridges.

Is there a convention for using "devicetree" vs "device tree"?  The
help message uses both and it would be nice to only use one or the
other.

> @@ -501,8 +501,10 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
>  		 * to rely on this function (you ship a firmware that doesn't
>  		 * create device nodes for all PCI devices).
>  		 */
> -		if (ppnode)
> +		if (ppnode && of_property_present(ppnode, "interrupt-map"))

Maybe this deserves a comment?  The connection between "interrupt-map"
and the rest of this patch isn't obvious to me.

Also, it looks like this happens for *everybody*, regardless of
PCI_DYNAMIC_OF_NODES, which seems a little suspect.  If it's an
unrelated bug fix it should be a different patch.

>  			break;
> +		else
> +			ppnode = NULL;

> +void of_pci_make_dev_node(struct pci_dev *pdev)
> +{
> +	struct device_node *ppnode, *np = NULL;
> +	const char *pci_type = "dev";
> +	struct of_changeset *cset;
> +	const char *name;
> +	int ret;
> +
> +	/*
> +	 * If there is already a device tree node linked to this device,
> +	 * return immediately.
> +	 */
> +	if (pci_device_to_OF_node(pdev))
> +		return;
> +
> +	/* Check if there is device tree node for parent device */
> +	if (!pdev->bus->self)
> +		ppnode = pdev->bus->dev.of_node;
> +	else
> +		ppnode = pdev->bus->self->dev.of_node;
> +	if (!ppnode)
> +		return;
> +
> +	if (pci_is_bridge(pdev))
> +		pci_type = "pci";

Initialize pci_type = "dev" here instead of way up top:

  if (pci_is_bridge(pdev))
    pci_type = "pci";
  else
    pci_type = "dev";

> +	name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
> +			 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));

> +static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
> +			      struct device_node *np)
> +{
> +	struct of_pci_range *rp;
> +	struct resource *res;
> +	int i = 0, j, ret;
> +	u32 flags, num;
> +	u64 val64;
> +
> +	if (pci_is_bridge(pdev)) {
> +		num = PCI_BRIDGE_RESOURCE_NUM;
> +		res = &pdev->resource[PCI_BRIDGE_RESOURCES];
> +	} else {
> +		num = PCI_STD_NUM_BARS;
> +		res = &pdev->resource[PCI_STD_RESOURCES];
> +	}
> +
> +	rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
> +	if (!rp)
> +		return -ENOMEM;
> +
> +	for (j = 0; j < num; j++) {

Initialize i = 0 here so it's connected with the use:

  for (i = 0, j = 0; j < num; ...)

> +		if (!resource_size(&res[j]))
> +			continue;
> +
> +		if (of_pci_get_addr_flags(&res[j], &flags))
> +			continue;
> +
> +		val64 = res[j].start;
> +		of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
> +				   false);
> +		if (pci_is_bridge(pdev)) {
> +			memcpy(rp[i].child_addr, rp[i].parent_addr,
> +			       sizeof(rp[i].child_addr));
> +		} else {
> +			/*
> +			 * For endpoint device, the lower 64-bits of child
> +			 * address is always zero.

For the non-OF folks (like me), can you say what the semantics of
parent_addr vs child_addr are?  I suppose maybe parent_addr is an
address on the primary side of a bridge and child_addr is the
corresponding address on the secondary side?

And PCI bridges don't perform address translation, so they are
identical?

> +			 */
> +			rp[i].child_addr[0] = j;
> +		}

> +int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
> +			  struct device_node *np)
> +{
> +	int ret = 0;
> +
> +	if (pci_is_bridge(pdev)) {
> +		ret |= of_changeset_add_prop_string(ocs, np, "device_type",
> +						    "pci");
> +	}
> +
> +	ret |= of_pci_prop_ranges(pdev, ocs, np);
> +	ret |= of_changeset_add_prop_u32(ocs, np, "#address-cells",
> +					 OF_PCI_ADDRESS_CELLS);
> +	ret |= of_changeset_add_prop_u32(ocs, np, "#size-cells",
> +					 OF_PCI_SIZE_CELLS);
> +	ret |= of_pci_prop_reg(pdev, ocs, np);
> +	ret |= of_pci_prop_compatible(pdev, ocs, np);
> +
> +	/*
> +	 * The added properties will be released when the
> +	 * changeset is destroyed.
> +	 */

I don't think it's meaningful to OR together the "negative error
values" returned by all these functions.  Presumably those are things
like -EINVAL, -ENOMEM, etc.  ORing them together is admittedly
non-zero, but yields nonsense.

> +	return ret;

> +static inline void
> +of_pci_make_dev_node(struct pci_dev *pdev)
> +{
> +}
> +
> +static inline void
> +of_pci_remove_node(struct pci_dev *pdev)
> +{
> +}

Pull these functions all onto one line, like other similar stubs in
this file.

> +#endif /* CONFIG_PCI_DYNAMIC_OF_NODES */

Unnecessary comment since this is all 10 lines.
Bjorn Helgaas June 21, 2023, 8:27 p.m. UTC | #2
On Wed, Jun 21, 2023 at 03:22:33PM -0500, Bjorn Helgaas wrote:
> In subject, IIUC this patch does not actually create device tree nodes
> for selected devices.  It looks like it:
> 
>   - Adds an of_pci_make_dev_node() *interface* that can be used to
>     create this node
> 
>   - Creates such a node for *every* bridge
> 
>   - Does nothing at all for "selected devices" or the Xilinx Alveo

I forgot: with these comments addressed:

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> On Wed, Jun 21, 2023 at 10:34:06AM -0700, Lizhi Hou wrote:
> > The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> > spaces from multiple hardware peripherals to its PCI BAR. Normally,
> > the PCI core discovers devices and BARs using the PCI enumeration process.
> > There is no infrastructure to discover the hardware peripherals that are
> > present in a PCI device, and which can be accessed through the PCI BARs.
> > 
> > For Alveo PCI card, the card firmware provides a flattened device tree to
> > describe the hardware peripherals on its BARs. The Alveo card driver can
> > load this flattened device tree and leverage device tree framework to
> > generate platform devices for the hardware peripherals eventually.
> 
> The Alveo details are relevant to the quirk patch but not to *this*
> patch.
> 
> But the reason for creating a node for every bridge device *is*
> relevant and should be included here, since that change affects
> everybody that uses OF.
> 
> > Apparently, the device tree framework requires a device tree node for the
> > PCI device. Thus, it can generate the device tree nodes for hardware
> > peripherals underneath. Because PCI is self discoverable bus, there might
> > not be a device tree node created for PCI devices. This patch is to add
> > support to generate device tree node for PCI devices.
> 
> s/This patch is to add/Add/
> 
> > Added a kernel option. When the option is turned on, the kernel will
> > generate device tree nodes for PCI bridges unconditionally.
> 
> s/Added a kernel option/Add a PCI_DYNAMIC_OF_NODES config option/
> (Be specific, and way what the patch does, not what you did.)
> 
> > Initially, the basic properties are added for the dynamically generated
> > device tree nodes.
> 
> Make this specific, e.g., list the specific properties added.
> 
> > +config PCI_DYNAMIC_OF_NODES
> > +	bool "Create Devicetree nodes for PCI devices"
> > +	depends on OF
> > +	select OF_DYNAMIC
> > +	help
> > +	  This option enables support for generating device tree nodes for some
> > +	  PCI devices. Thus, the driver of this kind can load and overlay
> > +	  flattened device tree for its downstream devices.
> > +
> > +	  Once this option is selected, the device tree nodes will be generated
> > +	  for all PCI bridges.
> 
> Is there a convention for using "devicetree" vs "device tree"?  The
> help message uses both and it would be nice to only use one or the
> other.
> 
> > @@ -501,8 +501,10 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> >  		 * to rely on this function (you ship a firmware that doesn't
> >  		 * create device nodes for all PCI devices).
> >  		 */
> > -		if (ppnode)
> > +		if (ppnode && of_property_present(ppnode, "interrupt-map"))
> 
> Maybe this deserves a comment?  The connection between "interrupt-map"
> and the rest of this patch isn't obvious to me.
> 
> Also, it looks like this happens for *everybody*, regardless of
> PCI_DYNAMIC_OF_NODES, which seems a little suspect.  If it's an
> unrelated bug fix it should be a different patch.
> 
> >  			break;
> > +		else
> > +			ppnode = NULL;
> 
> > +void of_pci_make_dev_node(struct pci_dev *pdev)
> > +{
> > +	struct device_node *ppnode, *np = NULL;
> > +	const char *pci_type = "dev";
> > +	struct of_changeset *cset;
> > +	const char *name;
> > +	int ret;
> > +
> > +	/*
> > +	 * If there is already a device tree node linked to this device,
> > +	 * return immediately.
> > +	 */
> > +	if (pci_device_to_OF_node(pdev))
> > +		return;
> > +
> > +	/* Check if there is device tree node for parent device */
> > +	if (!pdev->bus->self)
> > +		ppnode = pdev->bus->dev.of_node;
> > +	else
> > +		ppnode = pdev->bus->self->dev.of_node;
> > +	if (!ppnode)
> > +		return;
> > +
> > +	if (pci_is_bridge(pdev))
> > +		pci_type = "pci";
> 
> Initialize pci_type = "dev" here instead of way up top:
> 
>   if (pci_is_bridge(pdev))
>     pci_type = "pci";
>   else
>     pci_type = "dev";
> 
> > +	name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
> > +			 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> 
> > +static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
> > +			      struct device_node *np)
> > +{
> > +	struct of_pci_range *rp;
> > +	struct resource *res;
> > +	int i = 0, j, ret;
> > +	u32 flags, num;
> > +	u64 val64;
> > +
> > +	if (pci_is_bridge(pdev)) {
> > +		num = PCI_BRIDGE_RESOURCE_NUM;
> > +		res = &pdev->resource[PCI_BRIDGE_RESOURCES];
> > +	} else {
> > +		num = PCI_STD_NUM_BARS;
> > +		res = &pdev->resource[PCI_STD_RESOURCES];
> > +	}
> > +
> > +	rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
> > +	if (!rp)
> > +		return -ENOMEM;
> > +
> > +	for (j = 0; j < num; j++) {
> 
> Initialize i = 0 here so it's connected with the use:
> 
>   for (i = 0, j = 0; j < num; ...)
> 
> > +		if (!resource_size(&res[j]))
> > +			continue;
> > +
> > +		if (of_pci_get_addr_flags(&res[j], &flags))
> > +			continue;
> > +
> > +		val64 = res[j].start;
> > +		of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
> > +				   false);
> > +		if (pci_is_bridge(pdev)) {
> > +			memcpy(rp[i].child_addr, rp[i].parent_addr,
> > +			       sizeof(rp[i].child_addr));
> > +		} else {
> > +			/*
> > +			 * For endpoint device, the lower 64-bits of child
> > +			 * address is always zero.
> 
> For the non-OF folks (like me), can you say what the semantics of
> parent_addr vs child_addr are?  I suppose maybe parent_addr is an
> address on the primary side of a bridge and child_addr is the
> corresponding address on the secondary side?
> 
> And PCI bridges don't perform address translation, so they are
> identical?
> 
> > +			 */
> > +			rp[i].child_addr[0] = j;
> > +		}
> 
> > +int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
> > +			  struct device_node *np)
> > +{
> > +	int ret = 0;
> > +
> > +	if (pci_is_bridge(pdev)) {
> > +		ret |= of_changeset_add_prop_string(ocs, np, "device_type",
> > +						    "pci");
> > +	}
> > +
> > +	ret |= of_pci_prop_ranges(pdev, ocs, np);
> > +	ret |= of_changeset_add_prop_u32(ocs, np, "#address-cells",
> > +					 OF_PCI_ADDRESS_CELLS);
> > +	ret |= of_changeset_add_prop_u32(ocs, np, "#size-cells",
> > +					 OF_PCI_SIZE_CELLS);
> > +	ret |= of_pci_prop_reg(pdev, ocs, np);
> > +	ret |= of_pci_prop_compatible(pdev, ocs, np);
> > +
> > +	/*
> > +	 * The added properties will be released when the
> > +	 * changeset is destroyed.
> > +	 */
> 
> I don't think it's meaningful to OR together the "negative error
> values" returned by all these functions.  Presumably those are things
> like -EINVAL, -ENOMEM, etc.  ORing them together is admittedly
> non-zero, but yields nonsense.
> 
> > +	return ret;
> 
> > +static inline void
> > +of_pci_make_dev_node(struct pci_dev *pdev)
> > +{
> > +}
> > +
> > +static inline void
> > +of_pci_remove_node(struct pci_dev *pdev)
> > +{
> > +}
> 
> Pull these functions all onto one line, like other similar stubs in
> this file.
> 
> > +#endif /* CONFIG_PCI_DYNAMIC_OF_NODES */
> 
> Unnecessary comment since this is all 10 lines.
Lizhi Hou June 26, 2023, 5:34 p.m. UTC | #3
On 6/21/23 13:22, Bjorn Helgaas wrote:
> In subject, IIUC this patch does not actually create device tree nodes
> for selected devices.  It looks like it:
>
>    - Adds an of_pci_make_dev_node() *interface* that can be used to
>      create this node
>
>    - Creates such a node for *every* bridge
>
>    - Does nothing at all for "selected devices" or the Xilinx Alveo
>
> On Wed, Jun 21, 2023 at 10:34:06AM -0700, Lizhi Hou wrote:
>> The PCI endpoint device such as Xilinx Alveo PCI card maps the register
>> spaces from multiple hardware peripherals to its PCI BAR. Normally,
>> the PCI core discovers devices and BARs using the PCI enumeration process.
>> There is no infrastructure to discover the hardware peripherals that are
>> present in a PCI device, and which can be accessed through the PCI BARs.
>>
>> For Alveo PCI card, the card firmware provides a flattened device tree to
>> describe the hardware peripherals on its BARs. The Alveo card driver can
>> load this flattened device tree and leverage device tree framework to
>> generate platform devices for the hardware peripherals eventually.
> The Alveo details are relevant to the quirk patch but not to *this*
> patch.
>
> But the reason for creating a node for every bridge device *is*
> relevant and should be included here, since that change affects
> everybody that uses OF.
>
>> Apparently, the device tree framework requires a device tree node for the
>> PCI device. Thus, it can generate the device tree nodes for hardware
>> peripherals underneath. Because PCI is self discoverable bus, there might
>> not be a device tree node created for PCI devices. This patch is to add
>> support to generate device tree node for PCI devices.
> s/This patch is to add/Add/
>
>> Added a kernel option. When the option is turned on, the kernel will
>> generate device tree nodes for PCI bridges unconditionally.
> s/Added a kernel option/Add a PCI_DYNAMIC_OF_NODES config option/
> (Be specific, and way what the patch does, not what you did.)
>
>> Initially, the basic properties are added for the dynamically generated
>> device tree nodes.
> Make this specific, e.g., list the specific properties added.

I rewrote the description as below. Does it look better?

     PCI: Create device tree node for bridge

     The PCI endpoint device such as Xilinx Alveo PCI card maps the register
     spaces from multiple hardware peripherals to its PCI BAR. Normally,
     the PCI core discovers devices and BARs using the PCI enumeration 
process.
     There is no infrastructure to discover the hardware peripherals 
that are
     present in a PCI device, and which can be accessed through the PCI 
BARs.

     Apparently, the device tree framework requires a device tree node 
for the
     PCI device. Thus, it can generate the device tree nodes for hardware
     peripherals underneath. Because PCI is self discoverable bus, there 
might
     not be a device tree node created for PCI devices. Furthermore, if 
the PCI
     device is hot pluggable, when it is plugged in, the device tree 
nodes for
     its parent bridges are required. Add support to generate device 
tree node
     for PCI bridges.

     Added an of_pci_make_dev_node() interface that can be used to create
     device tree node for PCI devices.

     Added a PCI_DYNAMIC_OF_NODES config option. When the option is 
turned on,
     the kernel will generate device tree nodes for PCI bridges 
unconditionally.

     Initially, the basic properties are added for the dynamically generated
     device tree nodes which include #address-cells, #size-cells, 
device_type,
     compatible, ranges, reg.

>
>> +config PCI_DYNAMIC_OF_NODES
>> +	bool "Create Devicetree nodes for PCI devices"
>> +	depends on OF
>> +	select OF_DYNAMIC
>> +	help
>> +	  This option enables support for generating device tree nodes for some
>> +	  PCI devices. Thus, the driver of this kind can load and overlay
>> +	  flattened device tree for its downstream devices.
>> +
>> +	  Once this option is selected, the device tree nodes will be generated
>> +	  for all PCI bridges.
> Is there a convention for using "devicetree" vs "device tree"?  The
> help message uses both and it would be nice to only use one or the
> other.
Ok. Will use "device tree".
>
>> @@ -501,8 +501,10 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
>>   		 * to rely on this function (you ship a firmware that doesn't
>>   		 * create device nodes for all PCI devices).
>>   		 */
>> -		if (ppnode)
>> +		if (ppnode && of_property_present(ppnode, "interrupt-map"))
> Maybe this deserves a comment?  The connection between "interrupt-map"
> and the rest of this patch isn't obvious to me.
>
> Also, it looks like this happens for *everybody*, regardless of
> PCI_DYNAMIC_OF_NODES, which seems a little suspect.  If it's an
> unrelated bug fix it should be a different patch.

This is not a bug fix. The check will distinguish between device tree 
nodes automatically created for pci bridges by this patch with those 
created by a DT based system. With this patch, device tree nodes are 
created for pci bridges, thus ppnode here will be non-zero and we will 
break out of the loop. In order to still use 
pci_swizzle_interrupt_pin(), checking “interrupt-map” for ppnode is 
added here.


After thinking about this more, using “interrupt-map” property may not 
be correct for the cases where ppnode is not dynamically generated and 
it does not have “interrupt-map”. So, I would introduce a new property 
“dynamic” for pci bridge nodes generated dynamically. And change the 
code to: if (ppnode && of_property_present(ppnode, "dynamic")).


Does this make sense?
>
>>   			break;
>> +		else
>> +			ppnode = NULL;
>> +void of_pci_make_dev_node(struct pci_dev *pdev)
>> +{
>> +	struct device_node *ppnode, *np = NULL;
>> +	const char *pci_type = "dev";
>> +	struct of_changeset *cset;
>> +	const char *name;
>> +	int ret;
>> +
>> +	/*
>> +	 * If there is already a device tree node linked to this device,
>> +	 * return immediately.
>> +	 */
>> +	if (pci_device_to_OF_node(pdev))
>> +		return;
>> +
>> +	/* Check if there is device tree node for parent device */
>> +	if (!pdev->bus->self)
>> +		ppnode = pdev->bus->dev.of_node;
>> +	else
>> +		ppnode = pdev->bus->self->dev.of_node;
>> +	if (!ppnode)
>> +		return;
>> +
>> +	if (pci_is_bridge(pdev))
>> +		pci_type = "pci";
> Initialize pci_type = "dev" here instead of way up top:
>
>    if (pci_is_bridge(pdev))
>      pci_type = "pci";
>    else
>      pci_type = "dev";
sure.
>
>> +	name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
>> +			 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
>> +static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
>> +			      struct device_node *np)
>> +{
>> +	struct of_pci_range *rp;
>> +	struct resource *res;
>> +	int i = 0, j, ret;
>> +	u32 flags, num;
>> +	u64 val64;
>> +
>> +	if (pci_is_bridge(pdev)) {
>> +		num = PCI_BRIDGE_RESOURCE_NUM;
>> +		res = &pdev->resource[PCI_BRIDGE_RESOURCES];
>> +	} else {
>> +		num = PCI_STD_NUM_BARS;
>> +		res = &pdev->resource[PCI_STD_RESOURCES];
>> +	}
>> +
>> +	rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
>> +	if (!rp)
>> +		return -ENOMEM;
>> +
>> +	for (j = 0; j < num; j++) {
> Initialize i = 0 here so it's connected with the use:
>
>    for (i = 0, j = 0; j < num; ...)
ok.
>
>> +		if (!resource_size(&res[j]))
>> +			continue;
>> +
>> +		if (of_pci_get_addr_flags(&res[j], &flags))
>> +			continue;
>> +
>> +		val64 = res[j].start;
>> +		of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
>> +				   false);
>> +		if (pci_is_bridge(pdev)) {
>> +			memcpy(rp[i].child_addr, rp[i].parent_addr,
>> +			       sizeof(rp[i].child_addr));
>> +		} else {
>> +			/*
>> +			 * For endpoint device, the lower 64-bits of child
>> +			 * address is always zero.
> For the non-OF folks (like me), can you say what the semantics of
> parent_addr vs child_addr are?  I suppose maybe parent_addr is an
> address on the primary side of a bridge and child_addr is the
> corresponding address on the secondary side?
>
> And PCI bridges don't perform address translation, so they are
> identical?
I will add more comments here.
>
>> +			 */
>> +			rp[i].child_addr[0] = j;
>> +		}
>> +int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
>> +			  struct device_node *np)
>> +{
>> +	int ret = 0;
>> +
>> +	if (pci_is_bridge(pdev)) {
>> +		ret |= of_changeset_add_prop_string(ocs, np, "device_type",
>> +						    "pci");
>> +	}
>> +
>> +	ret |= of_pci_prop_ranges(pdev, ocs, np);
>> +	ret |= of_changeset_add_prop_u32(ocs, np, "#address-cells",
>> +					 OF_PCI_ADDRESS_CELLS);
>> +	ret |= of_changeset_add_prop_u32(ocs, np, "#size-cells",
>> +					 OF_PCI_SIZE_CELLS);
>> +	ret |= of_pci_prop_reg(pdev, ocs, np);
>> +	ret |= of_pci_prop_compatible(pdev, ocs, np);
>> +
>> +	/*
>> +	 * The added properties will be released when the
>> +	 * changeset is destroyed.
>> +	 */
> I don't think it's meaningful to OR together the "negative error
> values" returned by all these functions.  Presumably those are things
> like -EINVAL, -ENOMEM, etc.  ORing them together is admittedly
> non-zero, but yields nonsense.
ok. I will return for each failure.
>
>> +	return ret;
>> +static inline void
>> +of_pci_make_dev_node(struct pci_dev *pdev)
>> +{
>> +}
>> +
>> +static inline void
>> +of_pci_remove_node(struct pci_dev *pdev)
>> +{
>> +}
> Pull these functions all onto one line, like other similar stubs in
> this file.
Sure.
>
>> +#endif /* CONFIG_PCI_DYNAMIC_OF_NODES */
> Unnecessary comment since this is all 10 lines.

Will remove it.

Thanks,

Lizhi
Bjorn Helgaas June 26, 2023, 6:11 p.m. UTC | #4
On Mon, Jun 26, 2023 at 10:34:05AM -0700, Lizhi Hou wrote:
> On 6/21/23 13:22, Bjorn Helgaas wrote:

>     Added an of_pci_make_dev_node() interface that can be used to create
>     device tree node for PCI devices.
> 
>     Added a PCI_DYNAMIC_OF_NODES config option. When the option is turned
> on,
>     the kernel will generate device tree nodes for PCI bridges
> unconditionally.
> 
>     Initially, the basic properties are added for the dynamically generated
>     device tree nodes which include #address-cells, #size-cells,
> device_type,
>     compatible, ranges, reg.

s/Added/Add/ (twice, mentioned before).

The commit log should say what the *patch* does, not what *you* did.

> > > @@ -501,8 +501,10 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> > >   		 * to rely on this function (you ship a firmware that doesn't
> > >   		 * create device nodes for all PCI devices).
> > >   		 */
> > > -		if (ppnode)
> > > +		if (ppnode && of_property_present(ppnode, "interrupt-map"))
> >
> > Maybe this deserves a comment?  The connection between "interrupt-map"
> > and the rest of this patch isn't obvious to me.
> > 
> > Also, it looks like this happens for *everybody*, regardless of
> > PCI_DYNAMIC_OF_NODES, which seems a little suspect.  If it's an
> > unrelated bug fix it should be a different patch.
> 
> This is not a bug fix. The check will distinguish between device tree nodes
> automatically created for pci bridges by this patch with those created by a
> DT based system. With this patch, device tree nodes are created for pci
> bridges, thus ppnode here will be non-zero and we will break out of the
> loop. In order to still use pci_swizzle_interrupt_pin(), checking
> “interrupt-map” for ppnode is added here.
> 
> After thinking about this more, using “interrupt-map” property may not be
> correct for the cases where ppnode is not dynamically generated and it does
> not have “interrupt-map”. So, I would introduce a new property “dynamic” for
> pci bridge nodes generated dynamically. And change the code to: if (ppnode
> && of_property_present(ppnode, "dynamic")).
> 
> Does this make sense?

Makes a lot more sense to me than relying on some unrelated and
undocumented property.  Probably still would benefit from an #ifdef.

Rob might have an opinion on whether "dynamic" makes sense from a
DT perspective.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9309f2469b41..24c3107c68cc 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -193,6 +193,18 @@  config PCI_HYPERV
 	  The PCI device frontend driver allows the kernel to import arbitrary
 	  PCI devices from a PCI backend to support PCI driver domains.
 
+config PCI_DYNAMIC_OF_NODES
+	bool "Create Devicetree nodes for PCI devices"
+	depends on OF
+	select OF_DYNAMIC
+	help
+	  This option enables support for generating device tree nodes for some
+	  PCI devices. Thus, the driver of this kind can load and overlay
+	  flattened device tree for its downstream devices.
+
+	  Once this option is selected, the device tree nodes will be generated
+	  for all PCI bridges.
+
 choice
 	prompt "PCI Express hierarchy optimization setting"
 	default PCIE_BUS_DEFAULT
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 2680e4c92f0a..cc8b4e01e29d 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -32,6 +32,7 @@  obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
 obj-$(CONFIG_PCI_DOE)		+= doe.o
+obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5bc81cc0a2de..ab7d06cd0099 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -340,6 +340,8 @@  void pci_bus_add_device(struct pci_dev *dev)
 	 */
 	pcibios_bus_add_device(dev);
 	pci_fixup_device(pci_fixup_final, dev);
+	if (pci_is_bridge(dev))
+		of_pci_make_dev_node(dev);
 	pci_create_sysfs_dev_files(dev);
 	pci_proc_attach_device(dev);
 	pci_bridge_d3_update(dev);
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 2c25f4fa0225..01818b27a8da 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -501,8 +501,10 @@  static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
 		 * to rely on this function (you ship a firmware that doesn't
 		 * create device nodes for all PCI devices).
 		 */
-		if (ppnode)
+		if (ppnode && of_property_present(ppnode, "interrupt-map"))
 			break;
+		else
+			ppnode = NULL;
 
 		/*
 		 * We can only get here if we hit a P2P bridge with no node;
@@ -617,6 +619,83 @@  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)
 	return pci_parse_request_of_pci_ranges(dev, bridge);
 }
 
+#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
+
+void of_pci_remove_node(struct pci_dev *pdev)
+{
+	struct device_node *np;
+
+	np = pci_device_to_OF_node(pdev);
+	if (!np || !of_node_check_flag(np, OF_DYNAMIC))
+		return;
+	pdev->dev.of_node = NULL;
+
+	of_changeset_revert(np->data);
+	of_changeset_destroy(np->data);
+	of_node_put(np);
+}
+
+void of_pci_make_dev_node(struct pci_dev *pdev)
+{
+	struct device_node *ppnode, *np = NULL;
+	const char *pci_type = "dev";
+	struct of_changeset *cset;
+	const char *name;
+	int ret;
+
+	/*
+	 * If there is already a device tree node linked to this device,
+	 * return immediately.
+	 */
+	if (pci_device_to_OF_node(pdev))
+		return;
+
+	/* Check if there is device tree node for parent device */
+	if (!pdev->bus->self)
+		ppnode = pdev->bus->dev.of_node;
+	else
+		ppnode = pdev->bus->self->dev.of_node;
+	if (!ppnode)
+		return;
+
+	if (pci_is_bridge(pdev))
+		pci_type = "pci";
+
+	name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
+			 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+	if (!name)
+		return;
+
+	cset = kmalloc(sizeof(*cset), GFP_KERNEL);
+	if (!cset)
+		goto failed;
+	of_changeset_init(cset);
+
+	np = of_changeset_create_node(ppnode, name, cset);
+	if (!np)
+		goto failed;
+	np->data = cset;
+
+	ret = of_pci_add_properties(pdev, cset, np);
+	if (ret)
+		goto failed;
+
+	ret = of_changeset_apply(cset);
+	if (ret)
+		goto failed;
+
+	pdev->dev.of_node = np;
+	kfree(name);
+
+	return;
+
+failed:
+	if (np)
+		of_node_put(np);
+	kfree(name);
+}
+#endif
+
 #endif /* CONFIG_PCI */
 
 /**
diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c
new file mode 100644
index 000000000000..a48d3a5a3685
--- /dev/null
+++ b/drivers/pci/of_property.c
@@ -0,0 +1,194 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2022, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/pci.h>
+#include <linux/of.h>
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include "pci.h"
+
+#define OF_PCI_ADDRESS_CELLS		3
+#define OF_PCI_SIZE_CELLS		2
+
+struct of_pci_addr_pair {
+	u32		phys_addr[OF_PCI_ADDRESS_CELLS];
+	u32		size[OF_PCI_SIZE_CELLS];
+};
+
+struct of_pci_range {
+	u32		child_addr[OF_PCI_ADDRESS_CELLS];
+	u32		parent_addr[OF_PCI_ADDRESS_CELLS];
+	u32		size[OF_PCI_SIZE_CELLS];
+};
+
+#define OF_PCI_ADDR_SPACE_IO		0x1
+#define OF_PCI_ADDR_SPACE_MEM32		0x2
+#define OF_PCI_ADDR_SPACE_MEM64		0x3
+
+#define OF_PCI_ADDR_FIELD_NONRELOC	BIT(31)
+#define OF_PCI_ADDR_FIELD_SS		GENMASK(25, 24)
+#define OF_PCI_ADDR_FIELD_PREFETCH	BIT(30)
+#define OF_PCI_ADDR_FIELD_BUS		GENMASK(23, 16)
+#define OF_PCI_ADDR_FIELD_DEV		GENMASK(15, 11)
+#define OF_PCI_ADDR_FIELD_FUNC		GENMASK(10, 8)
+#define OF_PCI_ADDR_FIELD_REG		GENMASK(7, 0)
+
+enum of_pci_prop_compatible {
+	PROP_COMPAT_PCI_VVVV_DDDD,
+	PROP_COMPAT_PCICLASS_CCSSPP,
+	PROP_COMPAT_PCICLASS_CCSS,
+	PROP_COMPAT_NUM,
+};
+
+static void of_pci_set_address(struct pci_dev *pdev, u32 *prop, u64 addr,
+			       u32 reg_num, u32 flags, bool reloc)
+{
+	prop[0] = FIELD_PREP(OF_PCI_ADDR_FIELD_BUS, pdev->bus->number) |
+		FIELD_PREP(OF_PCI_ADDR_FIELD_DEV, PCI_SLOT(pdev->devfn)) |
+		FIELD_PREP(OF_PCI_ADDR_FIELD_FUNC, PCI_FUNC(pdev->devfn));
+	prop[0] |= flags | reg_num;
+	if (!reloc) {
+		prop[0] |= OF_PCI_ADDR_FIELD_NONRELOC;
+		prop[1] = upper_32_bits(addr);
+		prop[2] = lower_32_bits(addr);
+	}
+}
+
+static int of_pci_get_addr_flags(struct resource *res, u32 *flags)
+{
+	u32 ss;
+
+	if (res->flags & IORESOURCE_IO)
+		ss = OF_PCI_ADDR_SPACE_IO;
+	else if (res->flags & IORESOURCE_MEM_64)
+		ss = OF_PCI_ADDR_SPACE_MEM64;
+	else if (res->flags & IORESOURCE_MEM)
+		ss = OF_PCI_ADDR_SPACE_MEM32;
+	else
+		return -EINVAL;
+
+	*flags = 0;
+	if (res->flags & IORESOURCE_PREFETCH)
+		*flags |= OF_PCI_ADDR_FIELD_PREFETCH;
+
+	*flags |= FIELD_PREP(OF_PCI_ADDR_FIELD_SS, ss);
+
+	return 0;
+}
+
+static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
+			      struct device_node *np)
+{
+	struct of_pci_range *rp;
+	struct resource *res;
+	int i = 0, j, ret;
+	u32 flags, num;
+	u64 val64;
+
+	if (pci_is_bridge(pdev)) {
+		num = PCI_BRIDGE_RESOURCE_NUM;
+		res = &pdev->resource[PCI_BRIDGE_RESOURCES];
+	} else {
+		num = PCI_STD_NUM_BARS;
+		res = &pdev->resource[PCI_STD_RESOURCES];
+	}
+
+	rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
+	if (!rp)
+		return -ENOMEM;
+
+	for (j = 0; j < num; j++) {
+		if (!resource_size(&res[j]))
+			continue;
+
+		if (of_pci_get_addr_flags(&res[j], &flags))
+			continue;
+
+		val64 = res[j].start;
+		of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
+				   false);
+		if (pci_is_bridge(pdev)) {
+			memcpy(rp[i].child_addr, rp[i].parent_addr,
+			       sizeof(rp[i].child_addr));
+		} else {
+			/*
+			 * For endpoint device, the lower 64-bits of child
+			 * address is always zero.
+			 */
+			rp[i].child_addr[0] = j;
+		}
+
+		val64 = resource_size(&res[j]);
+		rp[i].size[0] = upper_32_bits(val64);
+		rp[i].size[1] = lower_32_bits(val64);
+
+		i++;
+	}
+
+	ret = of_changeset_add_prop_u32_array(ocs, np, "ranges", (u32 *)rp,
+					      i * sizeof(*rp) / sizeof(u32));
+	kfree(rp);
+
+	return ret;
+}
+
+static int of_pci_prop_reg(struct pci_dev *pdev, struct of_changeset *ocs,
+			   struct device_node *np)
+{
+	struct of_pci_addr_pair reg;
+
+	/* configuration space */
+	of_pci_set_address(pdev, reg.phys_addr, 0, 0, 0, true);
+
+	return of_changeset_add_prop_u32_array(ocs, np, "reg", (u32 *)&reg,
+					       sizeof(reg) / sizeof(u32));
+}
+
+static int of_pci_prop_compatible(struct pci_dev *pdev,
+				  struct of_changeset *ocs,
+				  struct device_node *np)
+{
+	const char *compat_strs[PROP_COMPAT_NUM] = { 0 };
+	int i, ret;
+
+	compat_strs[PROP_COMPAT_PCI_VVVV_DDDD] =
+		kasprintf(GFP_KERNEL, "pci%x,%x", pdev->vendor, pdev->device);
+	compat_strs[PROP_COMPAT_PCICLASS_CCSSPP] =
+		kasprintf(GFP_KERNEL, "pciclass,%06x", pdev->class);
+	compat_strs[PROP_COMPAT_PCICLASS_CCSS] =
+		kasprintf(GFP_KERNEL, "pciclass,%04x", pdev->class >> 8);
+
+	ret = of_changeset_add_prop_string_array(ocs, np, "compatible",
+						 compat_strs, PROP_COMPAT_NUM);
+	for (i = 0; i < PROP_COMPAT_NUM; i++)
+		kfree(compat_strs[i]);
+
+	return ret;
+}
+
+int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
+			  struct device_node *np)
+{
+	int ret = 0;
+
+	if (pci_is_bridge(pdev)) {
+		ret |= of_changeset_add_prop_string(ocs, np, "device_type",
+						    "pci");
+	}
+
+	ret |= of_pci_prop_ranges(pdev, ocs, np);
+	ret |= of_changeset_add_prop_u32(ocs, np, "#address-cells",
+					 OF_PCI_ADDRESS_CELLS);
+	ret |= of_changeset_add_prop_u32(ocs, np, "#size-cells",
+					 OF_PCI_SIZE_CELLS);
+	ret |= of_pci_prop_reg(pdev, ocs, np);
+	ret |= of_pci_prop_compatible(pdev, ocs, np);
+
+	/*
+	 * The added properties will be released when the
+	 * changeset is destroyed.
+	 */
+	return ret;
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2475098f6518..82cc2b35ff6d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -678,6 +678,25 @@  static inline int devm_of_pci_bridge_init(struct device *dev, struct pci_host_br
 
 #endif /* CONFIG_OF */
 
+struct of_changeset;
+
+#ifdef CONFIG_PCI_DYNAMIC_OF_NODES
+void of_pci_make_dev_node(struct pci_dev *pdev);
+void of_pci_remove_node(struct pci_dev *pdev);
+int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
+			  struct device_node *np);
+#else
+static inline void
+of_pci_make_dev_node(struct pci_dev *pdev)
+{
+}
+
+static inline void
+of_pci_remove_node(struct pci_dev *pdev)
+{
+}
+#endif /* CONFIG_PCI_DYNAMIC_OF_NODES */
+
 #ifdef CONFIG_PCIEAER
 void pci_no_aer(void);
 void pci_aer_init(struct pci_dev *dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d68aee29386b..d749ea8250d6 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -22,6 +22,7 @@  static void pci_stop_dev(struct pci_dev *dev)
 		device_release_driver(&dev->dev);
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
+		of_pci_remove_node(dev);
 
 		pci_dev_assign_added(dev, false);
 	}