diff mbox series

[v3,3/3] cxl: Add checks to access_coordinate calculation to fail missing data

Message ID 20240306175204.1906538-3-dave.jiang@intel.com
State Superseded
Headers show
Series [v3,1/3] cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates() | expand

Commit Message

Dave Jiang March 6, 2024, 5:52 p.m. UTC
Jonathan noted that when the coordinates for host bridge and switches
can be 0s if no actual data are retrieved and the calculation continues.
The resulting number would be inaccurate. Add checks to ensure that the
calculation would complete only if the numbers are valid.

While not seen in the wild, issue may show up with a BIOS that reported
CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).

Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v3:
- Simplify iteration loop in cxl_endpoint_get_perf_coordinates(). (Jonathan)
---
 drivers/cxl/core/port.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

Comments

Jonathan Cameron March 7, 2024, 12:59 p.m. UTC | #1
On Wed, 6 Mar 2024 10:52:04 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Jonathan noted that when the coordinates for host bridge and switches
> can be 0s if no actual data are retrieved and the calculation continues.
> The resulting number would be inaccurate. Add checks to ensure that the
> calculation would complete only if the numbers are valid.
> 
> While not seen in the wild, issue may show up with a BIOS that reported
> CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).
> 
> Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Looks nice and clean now., So subject to resolution of the philosophy question
inline...

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
> v3:
> - Simplify iteration loop in cxl_endpoint_get_perf_coordinates(). (Jonathan)
> ---
>  drivers/cxl/core/port.c | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 6fa273677963..baaaeddae775 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2110,6 +2110,20 @@ static void combine_coordinates(struct access_coordinate *c1,
>  		c1->read_latency += c2->read_latency;
>  }
>  
> +static bool coordinates_invalid(struct access_coordinate *c)
> +{
> +	if (!c->read_bandwidth && !c->write_bandwidth &&
> +	    !c->read_latency && !c->write_latency)
> +		return true;
I'm not sure on logic here. I agree in theory it's possible to have
only one coord presented but do we actually want to support that?
One of those cases where perhaps Linux should insist on sanity even
if the specifications do not. 

I'd demand them all and flip to coordinates_valid() perhaps?

	return c->readbandwidth && c->write_bandwidth &&
	       c->read_latency && c->write_latency;

Maybe there is a dubious argument that a host might not provide bandwidth
to the HB if it knows it is way bigger than anything beyond that point...

Doubtful..
> +
> +	return false;
> +}
> +
> +static bool parent_port_is_cxl_root(struct cxl_port *port)
> +{
> +	return is_cxl_root(to_cxl_port(port->dev.parent));
> +}
> +
>  /**
>   * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
>   *				   of CXL path
> @@ -2133,23 +2147,22 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>  	if (!is_cxl_endpoint(port))
>  		return -EINVAL;
>  
> -	dport = iter->parent_dport;
> -
>  	/*
> -	 * Exit the loop when the parent port of the current port is cxl root.
> -	 * The iterative loop starts at the endpoint and gathers the
> -	 * latency of the CXL link from the current iter to the next downstream
> -	 * port each iteration. If the parent is cxl root then there is
> -	 * nothing to gather.
> +	 * Exit the loop when the parent port of the current iter port is cxl
> +	 * root. The iterative loop starts at the endpoint and gathers the
> +	 * latency of the CXL link from the current device/port to the connected
> +	 * downstream port each iteration.
>  	 */
> -	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
> +	do {
> +		dport = iter->parent_dport;
> +		if (coordinates_invalid(&dport->coord))
> +			return -EINVAL;
>  		combine_coordinates(&c, &dport->coord);
>  		c.write_latency += dport->link_latency;
>  		c.read_latency += dport->link_latency;
>  
>  		iter = to_cxl_port(iter->dev.parent);
> -		dport = iter->parent_dport;
> -	}
> +	} while (!parent_port_is_cxl_root(iter));
>  
>  	/* Get the calculated PCI paths bandwidth */
>  	pdev = to_pci_dev(port->uport_dev->parent);
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 6fa273677963..baaaeddae775 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2110,6 +2110,20 @@  static void combine_coordinates(struct access_coordinate *c1,
 		c1->read_latency += c2->read_latency;
 }
 
+static bool coordinates_invalid(struct access_coordinate *c)
+{
+	if (!c->read_bandwidth && !c->write_bandwidth &&
+	    !c->read_latency && !c->write_latency)
+		return true;
+
+	return false;
+}
+
+static bool parent_port_is_cxl_root(struct cxl_port *port)
+{
+	return is_cxl_root(to_cxl_port(port->dev.parent));
+}
+
 /**
  * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
  *				   of CXL path
@@ -2133,23 +2147,22 @@  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
 	if (!is_cxl_endpoint(port))
 		return -EINVAL;
 
-	dport = iter->parent_dport;
-
 	/*
-	 * Exit the loop when the parent port of the current port is cxl root.
-	 * The iterative loop starts at the endpoint and gathers the
-	 * latency of the CXL link from the current iter to the next downstream
-	 * port each iteration. If the parent is cxl root then there is
-	 * nothing to gather.
+	 * Exit the loop when the parent port of the current iter port is cxl
+	 * root. The iterative loop starts at the endpoint and gathers the
+	 * latency of the CXL link from the current device/port to the connected
+	 * downstream port each iteration.
 	 */
-	while (!is_cxl_root(to_cxl_port(iter->dev.parent))) {
+	do {
+		dport = iter->parent_dport;
+		if (coordinates_invalid(&dport->coord))
+			return -EINVAL;
 		combine_coordinates(&c, &dport->coord);
 		c.write_latency += dport->link_latency;
 		c.read_latency += dport->link_latency;
 
 		iter = to_cxl_port(iter->dev.parent);
-		dport = iter->parent_dport;
-	}
+	} while (!parent_port_is_cxl_root(iter));
 
 	/* Get the calculated PCI paths bandwidth */
 	pdev = to_pci_dev(port->uport_dev->parent);