diff mbox series

cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH

Message ID 20240426224913.1027420-1-dave.jiang@intel.com
State Accepted
Commit 5d211c7090590033581175d6405ae40917ca3a06
Headers show
Series cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH | expand

Commit Message

Dave Jiang April 26, 2024, 10:47 p.m. UTC
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
 [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
 [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]

... plus some related subsequent NULL pointer dereference:

 [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8

The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.

Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.

Reported-by: Robert Richter <rrichter@amd.com>
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---

Hi Robert,
Can you please try this patch and see if it addresses the issue you saw
on your RCH platform? Thanks!

 drivers/cxl/core/port.c      | 15 ++++++++++++++-
 tools/testing/cxl/test/cxl.c |  3 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

Dan Williams April 27, 2024, 12:10 a.m. UTC | #1
Dave Jiang wrote:
> Robert reported the following when booting a CXL host with Restricted CXL
> Host (RCH) topology:
>  [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
>  [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
> 
> ... plus some related subsequent NULL pointer dereference:
> 
>  [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
> 
> The iterator to walk the PCIe path did not account for RCH topology.
> However RCH does not support hotplug and the memory exported by the
> Restricted CXL Device (RCD) should be covered by HMAT and therefore no
> access_coordinate is needed. Add check to see if the endpoint device is
> RCD and skip calculation.
> 
> Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
> to exercise the topology iterator. The dev_is_pci() check added is to help
> with this test and should be harmless for normal operation.
> 
> Reported-by: Robert Richter <rrichter@amd.com>
> Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
> Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> 
> Hi Robert,
> Can you please try this patch and see if it addresses the issue you saw
> on your RCH platform? Thanks!

Definitely wait for that test result, but as for the patch and the
approach it looks good to me. It matches other RCH-topology-walk skips
like the one in devm_cxl_enumerate_ports().

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Now, that said, if someone can think of why the driver needs to do
dynamic performance enumeration for RCH (i.e. a platform BIOS does not
populate HMAT), they should speak up now.
Robert Richter April 29, 2024, noon UTC | #2
On 26.04.24 15:47:56, Dave Jiang wrote:
> Robert reported the following when booting a CXL host with Restricted CXL
> Host (RCH) topology:
>  [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
>  [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
> 
> ... plus some related subsequent NULL pointer dereference:
> 
>  [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
> 
> The iterator to walk the PCIe path did not account for RCH topology.
> However RCH does not support hotplug and the memory exported by the
> Restricted CXL Device (RCD) should be covered by HMAT and therefore no
> access_coordinate is needed. Add check to see if the endpoint device is
> RCD and skip calculation.
> 
> Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
> to exercise the topology iterator. The dev_is_pci() check added is to help
> with this test and should be harmless for normal operation.
> 
> Reported-by: Robert Richter <rrichter@amd.com>
> Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
> Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

This patch fixes the issue.

Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>

But see below for a question...

> ---
> 
> Hi Robert,
> Can you please try this patch and see if it addresses the issue you saw
> on your RCH platform? Thanks!
> 
>  drivers/cxl/core/port.c      | 15 ++++++++++++++-
>  tools/testing/cxl/test/cxl.c |  3 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 762783bb091a..887ed6e358fb 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2184,6 +2184,7 @@ static bool parent_port_is_cxl_root(struct cxl_port *port)
>  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>  				      struct access_coordinate *coord)
>  {
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>  	struct access_coordinate c[] = {
>  		{
>  			.read_bandwidth = UINT_MAX,
> @@ -2197,12 +2198,20 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>  	struct cxl_port *iter = port;
>  	struct cxl_dport *dport;
>  	struct pci_dev *pdev;
> +	struct device *dev;
>  	unsigned int bw;
>  	bool is_cxl_root;
>  
>  	if (!is_cxl_endpoint(port))
>  		return -EINVAL;
>  
> +	/*
> +	 * Skip calculation for RCD. Expectation is HMAT already covers RCD case
> +	 * since RCH does not support hotplug.
> +	 */
> +	if (cxlmd->cxlds->rcd)
> +		return 0;
> +
>  	/*
>  	 * Exit the loop when the parent port of the current iter port is cxl
>  	 * root. The iterative loop starts at the endpoint and gathers the
> @@ -2232,8 +2241,12 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>  		return -EINVAL;
>  	cxl_coordinates_combine(c, c, dport->coord);
>  
> +	dev = port->uport_dev->parent;
> +	if (!dev_is_pci(dev))
> +		return -ENODEV;
> +
>  	/* Get the calculated PCI paths bandwidth */
> -	pdev = to_pci_dev(port->uport_dev->parent);
> +	pdev = to_pci_dev(dev);
>  	bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL);
>  	if (bw == 0)
>  		return -ENXIO;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 61c69297e797..72e2ce58e1dc 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1001,6 +1001,7 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
>  	struct range pmem_range = {
>  		.start = cxlds->pmem_res.start,
>  		.end = cxlds->pmem_res.end,
> @@ -1020,6 +1021,8 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>  		dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
>  
>  	cxl_memdev_update_perf(cxlmd);
> +
> +	cxl_endpoint_get_perf_coordinates(port, ep_c);

I don't see what this is for as ep_c is unused later? The only reason
is for error checking to see if that throws some kernel message in the
logs but return code is unused.

Thanks,

-Robert

>  }
>  
>  static struct cxl_mock_ops cxl_mock_ops = {
> -- 
> 2.44.0
>
Robert Richter April 29, 2024, 12:23 p.m. UTC | #3
On 26.04.24 17:10:07, Dan Williams wrote:
> Now, that said, if someone can think of why the driver needs to do
> dynamic performance enumeration for RCH (i.e. a platform BIOS does not
> populate HMAT), they should speak up now.

I wouldn't stick with that forever as you never know, but for now
disabling it for RCDs looks good to me to fix this, esp. considering
we are already at -rc6.

E.g I found in cxl-3.1, 9.11.6 RCD Discovery:

"""
In the future, a
CXL-aware OS may extract this information directly from the device via
Table Access DOE.
"""

Thanks,

-Robert
Dave Jiang April 29, 2024, 3:56 p.m. UTC | #4
On 4/29/24 5:00 AM, Robert Richter wrote:
> On 26.04.24 15:47:56, Dave Jiang wrote:
>> Robert reported the following when booting a CXL host with Restricted CXL
>> Host (RCH) topology:
>>  [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
>>  [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
>>
>> ... plus some related subsequent NULL pointer dereference:
>>
>>  [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
>>
>> The iterator to walk the PCIe path did not account for RCH topology.
>> However RCH does not support hotplug and the memory exported by the
>> Restricted CXL Device (RCD) should be covered by HMAT and therefore no
>> access_coordinate is needed. Add check to see if the endpoint device is
>> RCD and skip calculation.
>>
>> Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
>> to exercise the topology iterator. The dev_is_pci() check added is to help
>> with this test and should be harmless for normal operation.
>>
>> Reported-by: Robert Richter <rrichter@amd.com>
>> Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
>> Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> This patch fixes the issue.
> 
> Tested-by: Robert Richter <rrichter@amd.com>
> Reviewed-by: Robert Richter <rrichter@amd.com>

Thank you Robert! I'll get this queued for rc7. 
> 
> But see below for a question...
> 
>> ---
>>
>> Hi Robert,
>> Can you please try this patch and see if it addresses the issue you saw
>> on your RCH platform? Thanks!
>>
>>  drivers/cxl/core/port.c      | 15 ++++++++++++++-
>>  tools/testing/cxl/test/cxl.c |  3 +++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 762783bb091a..887ed6e358fb 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -2184,6 +2184,7 @@ static bool parent_port_is_cxl_root(struct cxl_port *port)
>>  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  				      struct access_coordinate *coord)
>>  {
>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>>  	struct access_coordinate c[] = {
>>  		{
>>  			.read_bandwidth = UINT_MAX,
>> @@ -2197,12 +2198,20 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  	struct cxl_port *iter = port;
>>  	struct cxl_dport *dport;
>>  	struct pci_dev *pdev;
>> +	struct device *dev;
>>  	unsigned int bw;
>>  	bool is_cxl_root;
>>  
>>  	if (!is_cxl_endpoint(port))
>>  		return -EINVAL;
>>  
>> +	/*
>> +	 * Skip calculation for RCD. Expectation is HMAT already covers RCD case
>> +	 * since RCH does not support hotplug.
>> +	 */
>> +	if (cxlmd->cxlds->rcd)
>> +		return 0;
>> +
>>  	/*
>>  	 * Exit the loop when the parent port of the current iter port is cxl
>>  	 * root. The iterative loop starts at the endpoint and gathers the
>> @@ -2232,8 +2241,12 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  		return -EINVAL;
>>  	cxl_coordinates_combine(c, c, dport->coord);
>>  
>> +	dev = port->uport_dev->parent;
>> +	if (!dev_is_pci(dev))
>> +		return -ENODEV;
>> +
>>  	/* Get the calculated PCI paths bandwidth */
>> -	pdev = to_pci_dev(port->uport_dev->parent);
>> +	pdev = to_pci_dev(dev);
>>  	bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL);
>>  	if (bw == 0)
>>  		return -ENXIO;
>> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
>> index 61c69297e797..72e2ce58e1dc 100644
>> --- a/tools/testing/cxl/test/cxl.c
>> +++ b/tools/testing/cxl/test/cxl.c
>> @@ -1001,6 +1001,7 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>>  	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
>>  	struct range pmem_range = {
>>  		.start = cxlds->pmem_res.start,
>>  		.end = cxlds->pmem_res.end,
>> @@ -1020,6 +1021,8 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>>  		dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
>>  
>>  	cxl_memdev_update_perf(cxlmd);
>> +
>> +	cxl_endpoint_get_perf_coordinates(port, ep_c);
> 
> I don't see what this is for as ep_c is unused later? The only reason
> is for error checking to see if that throws some kernel message in the
> logs but return code is unused.

Right. The results are thrown away. The call is there to specifically test the iterator and making sure we don't crash or fail. It has no other purpose. I looked into plumbing this function for usage in cxl-test but found it not possible to override the PCI API function that retrieves bandwidth. So for now we'll just have it just test the topology iterator. I'll add a comment. 
> 
> Thanks,
> 
> -Robert
> 
>>  }
>>  
>>  static struct cxl_mock_ops cxl_mock_ops = {
>> -- 
>> 2.44.0
>>
Jonathan Cameron April 30, 2024, 4:54 p.m. UTC | #5
On Mon, 29 Apr 2024 14:23:16 +0200
Robert Richter <rrichter@amd.com> wrote:

> On 26.04.24 17:10:07, Dan Williams wrote:
> > Now, that said, if someone can think of why the driver needs to do
> > dynamic performance enumeration for RCH (i.e. a platform BIOS does not
> > populate HMAT), they should speak up now.  

It would have to selectively populate HMAT as we'd need the
Generic Port distances anyway.  That is an odd corner indeed.

Agreed on disabling it until someone shouts.

> 
> I wouldn't stick with that forever as you never know, but for now
> disabling it for RCDs looks good to me to fix this, esp. considering
> we are already at -rc6.
> 
> E.g I found in cxl-3.1, 9.11.6 RCD Discovery:
> 
> """
> In the future, a
> CXL-aware OS may extract this information directly from the device via
> Table Access DOE.
> """
> 
> Thanks,
> 
> -Robert
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 762783bb091a..887ed6e358fb 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2184,6 +2184,7 @@  static bool parent_port_is_cxl_root(struct cxl_port *port)
 int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 				      struct access_coordinate *coord)
 {
+	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
 	struct access_coordinate c[] = {
 		{
 			.read_bandwidth = UINT_MAX,
@@ -2197,12 +2198,20 @@  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 	struct cxl_port *iter = port;
 	struct cxl_dport *dport;
 	struct pci_dev *pdev;
+	struct device *dev;
 	unsigned int bw;
 	bool is_cxl_root;
 
 	if (!is_cxl_endpoint(port))
 		return -EINVAL;
 
+	/*
+	 * Skip calculation for RCD. Expectation is HMAT already covers RCD case
+	 * since RCH does not support hotplug.
+	 */
+	if (cxlmd->cxlds->rcd)
+		return 0;
+
 	/*
 	 * Exit the loop when the parent port of the current iter port is cxl
 	 * root. The iterative loop starts at the endpoint and gathers the
@@ -2232,8 +2241,12 @@  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 		return -EINVAL;
 	cxl_coordinates_combine(c, c, dport->coord);
 
+	dev = port->uport_dev->parent;
+	if (!dev_is_pci(dev))
+		return -ENODEV;
+
 	/* Get the calculated PCI paths bandwidth */
-	pdev = to_pci_dev(port->uport_dev->parent);
+	pdev = to_pci_dev(dev);
 	bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL);
 	if (bw == 0)
 		return -ENXIO;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 61c69297e797..72e2ce58e1dc 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1001,6 +1001,7 @@  static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
 	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+	struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
 	struct range pmem_range = {
 		.start = cxlds->pmem_res.start,
 		.end = cxlds->pmem_res.end,
@@ -1020,6 +1021,8 @@  static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
 		dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
 
 	cxl_memdev_update_perf(cxlmd);
+
+	cxl_endpoint_get_perf_coordinates(port, ep_c);
 }
 
 static struct cxl_mock_ops cxl_mock_ops = {