diff mbox series

[2/2] cxl: Add checks to access_coordinate calculation to fail missing data

Message ID 20240229002542.634982-2-dave.jiang@intel.com
State Superseded
Headers show
Series [1/2] cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates() | expand

Commit Message

Dave Jiang Feb. 29, 2024, 12:25 a.m. UTC
Jonathan noted that when the coordinates for host bridge and switches
can be 0s if no actual data are retrieved and the calculation continues.
The resulting number would be inaccurate. Add checks to ensure that the
calculation would complete only if the numbers are valid.

Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

Comments

Dan Williams Feb. 29, 2024, 12:35 a.m. UTC | #1
Dave Jiang wrote:
> Jonathan noted that when the coordinates for host bridge and switches
> can be 0s if no actual data are retrieved and the calculation continues.
> The resulting number would be inaccurate. Add checks to ensure that the
> calculation would complete only if the numbers are valid.

Similar comment as patch1. This smells like a fix, is this an urgent
thing to get into v6.8, i.e. most configurations are busted without
this, or is a nice to have fixup for a QEMU effect that may or may not
show up in physical systems?
Dave Jiang Feb. 29, 2024, 12:39 a.m. UTC | #2
On 2/28/24 5:35 PM, Dan Williams wrote:
> Dave Jiang wrote:
>> Jonathan noted that when the coordinates for host bridge and switches
>> can be 0s if no actual data are retrieved and the calculation continues.
>> The resulting number would be inaccurate. Add checks to ensure that the
>> calculation would complete only if the numbers are valid.
> 
> Similar comment as patch1. This smells like a fix, is this an urgent
> thing to get into v6.8, i.e. most configurations are busted without
> this, or is a nice to have fixup for a QEMU effect that may or may not
> show up in physical systems?

This would only be an issue if the switches do not supply a proper CDAT and/or if BIOS does not provide proper ACPI tables. I think it is only experienced on QEMU currently. I'll add a fix tag if you think it should be a fix.
Dan Williams Feb. 29, 2024, 12:44 a.m. UTC | #3
Dave Jiang wrote:
> 
> 
> On 2/28/24 5:35 PM, Dan Williams wrote:
> > Dave Jiang wrote:
> >> Jonathan noted that when the coordinates for host bridge and switches
> >> can be 0s if no actual data are retrieved and the calculation continues.
> >> The resulting number would be inaccurate. Add checks to ensure that the
> >> calculation would complete only if the numbers are valid.
> > 
> > Similar comment as patch1. This smells like a fix, is this an urgent
> > thing to get into v6.8, i.e. most configurations are busted without
> > this, or is a nice to have fixup for a QEMU effect that may or may not
> > show up in physical systems?
> 
> This would only be an issue if the switches do not supply a proper
> CDAT and/or if BIOS does not provide proper ACPI tables. I think it is
> only experienced on QEMU currently. I'll add a fix tag if you think it
> should be a fix.

No, but please add that relevant clarification of impact to the
changelog. It would have made it clear this is a nice to have thing for
v6.9, nothing to worry about for v6.8.
Jonathan Cameron Feb. 29, 2024, 5:25 p.m. UTC | #4
On Wed, 28 Feb 2024 16:44:29 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Dave Jiang wrote:
> > 
> > 
> > On 2/28/24 5:35 PM, Dan Williams wrote:  
> > > Dave Jiang wrote:  
> > >> Jonathan noted that when the coordinates for host bridge and switches
> > >> can be 0s if no actual data are retrieved and the calculation continues.
> > >> The resulting number would be inaccurate. Add checks to ensure that the
> > >> calculation would complete only if the numbers are valid.  
> > > 
> > > Similar comment as patch1. This smells like a fix, is this an urgent
> > > thing to get into v6.8, i.e. most configurations are busted without
> > > this, or is a nice to have fixup for a QEMU effect that may or may not
> > > show up in physical systems?  
> > 
> > This would only be an issue if the switches do not supply a proper
> > CDAT and/or if BIOS does not provide proper ACPI tables. I think it is
> > only experienced on QEMU currently. I'll add a fix tag if you think it
> > should be a fix.  
> 
> No, but please add that relevant clarification of impact to the
> changelog. It would have made it clear this is a nice to have thing for
> v6.9, nothing to worry about for v6.8.

Not seen in wild, but it might show up with a BIOS that reported
CXL root ports via Generic Ports (via a PCI handle in the SRAT entry)
rather than CXL Host Bridge (via an ACPI Handle int he SRAT entry).
The code doesn't yet support this case (which is fine) but it should
cleanly not support it rather than giving wrong numbers. There
are other fun corners like embedded CXL switches in the host where
the reporting will also be more 'creative'. I expect we will see
these eventually but not in first generation or two.

I 'think' this would be a reasonable system choice as there is nothing
to say that a CXL hostbridge doesn't have a bunch of root ports
(potentially on different dies) that have different host initiator
to port exit latency and bandwidth characteristics.

Whilst no known hardware. I don't think this is a FW bug case only.

Not a rush job though, fixes tag + 6.9 will be fine.

Jonathan
Jonathan Cameron Feb. 29, 2024, 5:44 p.m. UTC | #5
On Wed, 28 Feb 2024 17:25:42 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Jonathan noted that when the coordinates for host bridge and switches
> can be 0s if no actual data are retrieved and the calculation continues.
> The resulting number would be inaccurate. Add checks to ensure that the
> calculation would complete only if the numbers are valid.
> 
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Hi Dave,

Whilst I think the fix is right, it is getting hard to read. Maybe
a rethink is needed on how that iteration works?

> ---
>  drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index e1d30a885700..2c82fe24b789 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1,
>  		c1->read_latency += c2->read_latency;
>  }
>  
> +static bool coordinates_invalid(struct access_coordinate *c)
> +{
> +	if (!c->read_bandwidth && !c->write_bandwidth &&
> +	    !c->read_latency && !c->write_latency)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool parent_port_is_cxl_root(struct cxl_port *port)
> +{
> +	return is_cxl_root(to_cxl_port(port->dev.parent));
> +}
> +
>  /**
>   * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
>   *				   of CXL path
> @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>  	 * port each iteration. If the parent is cxl root then there is
>  	 * nothing to gather.
>  	 */
> -	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
> -		combine_coordinates(&c, &dport->sw_coord);
> +	while (!parent_port_is_cxl_root(iter)) {
> +		iter = to_cxl_port(iter->dev.parent);
> +
> +		/* There's no CDAT for the host bridge, so skip if so. */

Comment refers to skipping whereas code is 'doing more' for the other case
so this is confusing to me.

The inverse of this only occurs on the last iteration I think.

Possibly a do / while instead of a while will do it.
I'm far from confident though as all the levels of look up have
me too confused.


	do {
		if (coordinates_invalid(&dport->sw_coord))
			return -EINVAL;

		combine_coordinates(&c, &dport->sw_coord);
		iter = to_cxl_port(iter->dev.parent);
  		dport = iter->parent_dport;
	} while (!parent_port_is_cxl_root(iter));
	/* Do final link updates */
	c.write_latency += dport->link_latency;
	c.read_latency += dport->link_latency;

> +		if (!parent_port_is_cxl_root(iter)) {
> +			if (coordinates_invalid(&dport->sw_coord))
> +				return -EINVAL;
> +
> +			combine_coordinates(&c, &dport->sw_coord);
> +		}
> +
>  		c.write_latency += dport->link_latency;
>  		c.read_latency += dport->link_latency;
> -
> -		iter = to_cxl_port(iter->dev.parent);
>  		dport = iter->parent_dport;
>  	}
>  
>  	/* Augment with the generic port (host bridge) perf data */
> +	if (coordinates_invalid(&dport->hb_coord))
> +		return -EINVAL;
>  	combine_coordinates(&c, &dport->hb_coord);
>  
>  	/* Get the calculated PCI paths bandwidth */
Dave Jiang March 5, 2024, 10:36 p.m. UTC | #6
On 2/29/24 10:44 AM, Jonathan Cameron wrote:
> On Wed, 28 Feb 2024 17:25:42 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> Jonathan noted that when the coordinates for host bridge and switches
>> can be 0s if no actual data are retrieved and the calculation continues.
>> The resulting number would be inaccurate. Add checks to ensure that the
>> calculation would complete only if the numbers are valid.
>>
>> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> Hi Dave,
> 
> Whilst I think the fix is right, it is getting hard to read. Maybe
> a rethink is needed on how that iteration works?
> 
>> ---
>>  drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++----
>>  1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index e1d30a885700..2c82fe24b789 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1,
>>  		c1->read_latency += c2->read_latency;
>>  }
>>  
>> +static bool coordinates_invalid(struct access_coordinate *c)
>> +{
>> +	if (!c->read_bandwidth && !c->write_bandwidth &&
>> +	    !c->read_latency && !c->write_latency)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static bool parent_port_is_cxl_root(struct cxl_port *port)
>> +{
>> +	return is_cxl_root(to_cxl_port(port->dev.parent));
>> +}
>> +
>>  /**
>>   * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
>>   *				   of CXL path
>> @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  	 * port each iteration. If the parent is cxl root then there is
>>  	 * nothing to gather.
>>  	 */
>> -	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
>> -		combine_coordinates(&c, &dport->sw_coord);
>> +	while (!parent_port_is_cxl_root(iter)) {
>> +		iter = to_cxl_port(iter->dev.parent);
>> +
>> +		/* There's no CDAT for the host bridge, so skip if so. */
> 
> Comment refers to skipping whereas code is 'doing more' for the other case
> so this is confusing to me.
> 
> The inverse of this only occurs on the last iteration I think.
> 
> Possibly a do / while instead of a while will do it.
> I'm far from confident though as all the levels of look up have
> me too confused.

So this is somewhat tricky. For example:
devices/pci0000:35/0000:35:01.0/0000:37:00.0/mem5

In this case the endpoint is attached to the host bridge without any switches. The endpoint is 0000:37:00.0 and the host bridge down stream port is 0000:35:01.0. In this instance there is no switch and therefore switch CDAT, but there is a valid dport. So we would skip the dport->sw_coord. However, we do need to pick up the link_latency between endpoint and downstream port. So we spend 1 iteration in the loop and skips the dport->sw_coord.

devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/0000:c1:00.0/0000:c2:00.0/mem8

Now in this case there's a CXL switch in between. So in first iteration, we pick up the dport->sw_coord. And in second iteration, we skip the dport->sw_coord. However, we would be adding two link_latency. For the link between 0000:c2:00.0 (endpoint) and 0000:c1:00.0 (switch downstream port), and the link between 0000:c0:00.0 (switch upstream port) and 0000:bf:00.0 (host bridge down stream port). So therefore we can't put the sum of link_latency outside of the loop.

Not sure how much better this is:

	do {
		struct cxl_port *parent_port = to_cxl_port(iter->dev.parent);

		dport = iter->parent_dport;
		if (!parent_port_is_cxl_root(parent_port)) {
			if (coordinates_invalid(&dport->sw_coord))
				return -EINVAL;

			combine_coordinates(&c, &dport->sw_coord);
		}

		c.write_latency += dport->link_latency;
		c.read_latency += dport->link_latency;
		iter = to_cxl_port(iter->dev.parent);
	} while (!parent_port_is_cxl_root(iter));

or:

	do {
		struct cxl_port *parent_port = to_cxl_port(iter->dev.parent);

		dport = iter->parent_dport;
		c.write_latency += dport->link_latency;
		c.read_latency += dport->link_latency;

		if (parent_port_is_cxl_root(parent_port))
			break;

		if (coordinates_invalid(&dport->sw_coord))
			return -EINVAL;

		combine_coordinates(&c, &dport->sw_coord);
		iter = to_cxl_port(iter->dev.parent);
	} while (!parent_port_is_cxl_root(iter));


> 
> 
> 	do {
> 		if (coordinates_invalid(&dport->sw_coord))
> 			return -EINVAL;
> 
> 		combine_coordinates(&c, &dport->sw_coord);
> 		iter = to_cxl_port(iter->dev.parent);
>   		dport = iter->parent_dport;
> 	} while (!parent_port_is_cxl_root(iter));
> 	/* Do final link updates */
> 	c.write_latency += dport->link_latency;
> 	c.read_latency += dport->link_latency;
> 
>> +		if (!parent_port_is_cxl_root(iter)) {
>> +			if (coordinates_invalid(&dport->sw_coord))
>> +				return -EINVAL;
>> +
>> +			combine_coordinates(&c, &dport->sw_coord);
>> +		}
>> +
>>  		c.write_latency += dport->link_latency;
>>  		c.read_latency += dport->link_latency;
>> -
>> -		iter = to_cxl_port(iter->dev.parent);
>>  		dport = iter->parent_dport;
>>  	}
>>  
>>  	/* Augment with the generic port (host bridge) perf data */
>> +	if (coordinates_invalid(&dport->hb_coord))
>> +		return -EINVAL;
>>  	combine_coordinates(&c, &dport->hb_coord);
>>  
>>  	/* Get the calculated PCI paths bandwidth */
>
Dave Jiang March 6, 2024, 12:18 a.m. UTC | #7
On 3/5/24 3:36 PM, Dave Jiang wrote:
> 
> 
> On 2/29/24 10:44 AM, Jonathan Cameron wrote:
>> On Wed, 28 Feb 2024 17:25:42 -0700
>> Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>> Jonathan noted that when the coordinates for host bridge and switches
>>> can be 0s if no actual data are retrieved and the calculation continues.
>>> The resulting number would be inaccurate. Add checks to ensure that the
>>> calculation would complete only if the numbers are valid.
>>>
>>> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>
>> Hi Dave,
>>
>> Whilst I think the fix is right, it is getting hard to read. Maybe
>> a rethink is needed on how that iteration works?
>>
>>> ---
>>>  drivers/cxl/core/port.c | 31 +++++++++++++++++++++++++++----
>>>  1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>>> index e1d30a885700..2c82fe24b789 100644
>>> --- a/drivers/cxl/core/port.c
>>> +++ b/drivers/cxl/core/port.c
>>> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1,
>>>  		c1->read_latency += c2->read_latency;
>>>  }
>>>  
>>> +static bool coordinates_invalid(struct access_coordinate *c)
>>> +{
>>> +	if (!c->read_bandwidth && !c->write_bandwidth &&
>>> +	    !c->read_latency && !c->write_latency)
>>> +		return true;
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +static bool parent_port_is_cxl_root(struct cxl_port *port)
>>> +{
>>> +	return is_cxl_root(to_cxl_port(port->dev.parent));
>>> +}
>>> +
>>>  /**
>>>   * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
>>>   *				   of CXL path
>>> @@ -2142,16 +2156,25 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>>  	 * port each iteration. If the parent is cxl root then there is
>>>  	 * nothing to gather.
>>>  	 */
>>> -	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
>>> -		combine_coordinates(&c, &dport->sw_coord);
>>> +	while (!parent_port_is_cxl_root(iter)) {
>>> +		iter = to_cxl_port(iter->dev.parent);
>>> +
>>> +		/* There's no CDAT for the host bridge, so skip if so. */
>>
>> Comment refers to skipping whereas code is 'doing more' for the other case
>> so this is confusing to me.
>>
>> The inverse of this only occurs on the last iteration I think.
>>
>> Possibly a do / while instead of a while will do it.
>> I'm far from confident though as all the levels of look up have
>> me too confused.
> 
> So this is somewhat tricky. For example:
> devices/pci0000:35/0000:35:01.0/0000:37:00.0/mem5
> 
> In this case the endpoint is attached to the host bridge without any switches. The endpoint is 0000:37:00.0 and the host bridge down stream port is 0000:35:01.0. In this instance there is no switch and therefore switch CDAT, but there is a valid dport. So we would skip the dport->sw_coord. However, we do need to pick up the link_latency between endpoint and downstream port. So we spend 1 iteration in the loop and skips the dport->sw_coord.
> 
> devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/0000:c1:00.0/0000:c2:00.0/mem8
> 
> Now in this case there's a CXL switch in between. So in first iteration, we pick up the dport->sw_coord. And in second iteration, we skip the dport->sw_coord. However, we would be adding two link_latency. For the link between 0000:c2:00.0 (endpoint) and 0000:c1:00.0 (switch downstream port), and the link between 0000:c0:00.0 (switch upstream port) and 0000:bf:00.0 (host bridge down stream port). So therefore we can't put the sum of link_latency outside of the loop.
> 
> Not sure how much better this is:
> 
> 	do {
> 		struct cxl_port *parent_port = to_cxl_port(iter->dev.parent);
> 
> 		dport = iter->parent_dport;
> 		if (!parent_port_is_cxl_root(parent_port)) {
> 			if (coordinates_invalid(&dport->sw_coord))
> 				return -EINVAL;
> 
> 			combine_coordinates(&c, &dport->sw_coord);
> 		}
> 
> 		c.write_latency += dport->link_latency;
> 		c.read_latency += dport->link_latency;
> 		iter = to_cxl_port(iter->dev.parent);
> 	} while (!parent_port_is_cxl_root(iter));
> 
> or:
> 
> 	do {
> 		struct cxl_port *parent_port = to_cxl_port(iter->dev.parent);
> 
> 		dport = iter->parent_dport;
> 		c.write_latency += dport->link_latency;
> 		c.read_latency += dport->link_latency;
> 
> 		if (parent_port_is_cxl_root(parent_port))
> 			break;
> 
> 		if (coordinates_invalid(&dport->sw_coord))
> 			return -EINVAL;
> 
> 		combine_coordinates(&c, &dport->sw_coord);
> 		iter = to_cxl_port(iter->dev.parent);
> 	} while (!parent_port_is_cxl_root(iter));

Actually, I think if we just make it dport->coord instead of dport->sw_coord and dport->hb_coord, we can remove the check and everything should work out properly. 

> 
> 
>>
>>
>> 	do {
>> 		if (coordinates_invalid(&dport->sw_coord))
>> 			return -EINVAL;
>>
>> 		combine_coordinates(&c, &dport->sw_coord);
>> 		iter = to_cxl_port(iter->dev.parent);
>>   		dport = iter->parent_dport;
>> 	} while (!parent_port_is_cxl_root(iter));
>> 	/* Do final link updates */
>> 	c.write_latency += dport->link_latency;
>> 	c.read_latency += dport->link_latency;
>>
>>> +		if (!parent_port_is_cxl_root(iter)) {
>>> +			if (coordinates_invalid(&dport->sw_coord))
>>> +				return -EINVAL;
>>> +
>>> +			combine_coordinates(&c, &dport->sw_coord);
>>> +		}
>>> +
>>>  		c.write_latency += dport->link_latency;
>>>  		c.read_latency += dport->link_latency;
>>> -
>>> -		iter = to_cxl_port(iter->dev.parent);
>>>  		dport = iter->parent_dport;
>>>  	}
>>>  
>>>  	/* Augment with the generic port (host bridge) perf data */
>>> +	if (coordinates_invalid(&dport->hb_coord))
>>> +		return -EINVAL;
>>>  	combine_coordinates(&c, &dport->hb_coord);
>>>  
>>>  	/* Get the calculated PCI paths bandwidth */
>>
>
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e1d30a885700..2c82fe24b789 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2110,6 +2110,20 @@  static void combine_coordinates(struct access_coordinate *c1,
 		c1->read_latency += c2->read_latency;
 }
 
+static bool coordinates_invalid(struct access_coordinate *c)
+{
+	if (!c->read_bandwidth && !c->write_bandwidth &&
+	    !c->read_latency && !c->write_latency)
+		return true;
+
+	return false;
+}
+
+static bool parent_port_is_cxl_root(struct cxl_port *port)
+{
+	return is_cxl_root(to_cxl_port(port->dev.parent));
+}
+
 /**
  * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
  *				   of CXL path
@@ -2142,16 +2156,25 @@  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 	 * port each iteration. If the parent is cxl root then there is
 	 * nothing to gather.
 	 */
-	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
-		combine_coordinates(&c, &dport->sw_coord);
+	while (!parent_port_is_cxl_root(iter)) {
+		iter = to_cxl_port(iter->dev.parent);
+
+		/* There's no CDAT for the host bridge, so skip if so. */
+		if (!parent_port_is_cxl_root(iter)) {
+			if (coordinates_invalid(&dport->sw_coord))
+				return -EINVAL;
+
+			combine_coordinates(&c, &dport->sw_coord);
+		}
+
 		c.write_latency += dport->link_latency;
 		c.read_latency += dport->link_latency;
-
-		iter = to_cxl_port(iter->dev.parent);
 		dport = iter->parent_dport;
 	}
 
 	/* Augment with the generic port (host bridge) perf data */
+	if (coordinates_invalid(&dport->hb_coord))
+		return -EINVAL;
 	combine_coordinates(&c, &dport->hb_coord);
 
 	/* Get the calculated PCI paths bandwidth */