diff mbox series

cxl/region: Match auto-discovered region decoders by HPA range

Message ID 20230804213004.1669658-1-alison.schofield@intel.com
State Superseded
Headers show
Series cxl/region: Match auto-discovered region decoders by HPA range | expand

Commit Message

Alison Schofield Aug. 4, 2023, 9:30 p.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Today, when the region driver attaches a region to a port, it
selects the ports next available decoder to program.

With the addition of auto-discovered regions, a port decoder has
already been programmed, so grabbing the next available decoder
can be a mismatch when there is more than one region using the
port. Match on the port HPA range for auto-discovered regions.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 
 drivers/cxl/core/region.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)


base-commit: fe77cc2e5a6a7c85f5c6ef8a39d7694ffc7f41c9

Comments

Dan Williams Aug. 4, 2023, 10:13 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Today, when the region driver attaches a region to a port, it
> selects the ports next available decoder to program.

A small nit: s/Today/Currently/

> With the addition of auto-discovered regions, a port decoder has
> already been programmed, so grabbing the next available decoder
> can be a mismatch when there is more than one region using the
> port. Match on the port HPA range for auto-discovered regions.

This patch looks correct to me, just a couple questions beloe.

It would be great if it was accompanied by a cxl_test scenario that
tested the failing case, but barring that it would be good to have some
logs from a scenario where people can notice if this fix applies to
their failure.

> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

I think this wants a "Fixes" tag:

Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")

> ---
>  
>  drivers/cxl/core/region.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e115ba382e04..8bfec7a96975 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -717,13 +717,37 @@ static int match_free_decoder(struct device *dev, void *data)
>  	return 0;
>  }
>  
> +static int match_auto_decoder(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = data;
> +	struct cxl_decoder *cxld;
> +
> +	if (!is_switch_decoder(dev))
> +		return 0;

Is this check needed? Endpoint decoders should also match by range.
Maybe make it explicit like:

    if (cxld == &cxled->cxld)
    	return 0;

...where it is obvious no further checks are needed, but I think that
also goes away with the change proposal below:


> +
> +	cxld = to_cxl_decoder(dev);
> +
> +	if (!range_contains(&cxld->hpa_range, &cxled->cxld.hpa_range))
> +		return 0;

Hmm, shouldn't it be identical and no bigger?

if (cxld->hpa_range != cxled->cxld.hpa_range)

> +
> +	if (!cxld->region)
> +		return 1;

Interesting, I am trying to think through the implications of failing
here. That would only happen if the port had been setup previously with
a different region for the same address range? How would that happen?

It feel like it should be:

	if (cxld->region) {
		dev_WARN(...)
		return 0;
	}

	return 1;

> +
> +	return 0;
> +}
> +
>  static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> -						   struct cxl_region *cxlr)
> +						   struct cxl_region *cxlr,
> +						   struct cxl_endpoint_decoder *cxled)
>  {
>  	struct device *dev;
>  	int id = 0;
>  
> -	dev = device_find_child(&port->dev, &id, match_free_decoder);
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> +		dev = device_find_child(&port->dev, cxled, match_auto_decoder);
> +	else
> +		dev = device_find_child(&port->dev, &id, match_free_decoder);
> +
>  	if (!dev)
>  		return NULL;
>  	/*
> @@ -839,7 +863,8 @@ static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr,
>  	if (port == cxled_to_port(cxled))
>  		cxld = &cxled->cxld;
>  	else
> -		cxld = cxl_region_find_decoder(port, cxlr);
> +		cxld = cxl_region_find_decoder(port, cxlr, cxled);

It looks like the cxled is only used to convey the range. Maybe just get
that from cxlr->params->res and not add another parameter here? Of
course that would then change the suggestions above where you can not
compare 'struct range' instances directly.
Alison Schofield Aug. 16, 2023, 12:13 a.m. UTC | #2
On Fri, Aug 04, 2023 at 03:13:58PM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Today, when the region driver attaches a region to a port, it
> > selects the ports next available decoder to program.
> 
> A small nit: s/Today/Currently/
Got it.

> 
> > With the addition of auto-discovered regions, a port decoder has
> > already been programmed, so grabbing the next available decoder
> > can be a mismatch when there is more than one region using the
> > port. Match on the port HPA range for auto-discovered regions.
> 
> This patch looks correct to me, just a couple questions beloe.
> 
> It would be great if it was accompanied by a cxl_test scenario that
> tested the failing case, but barring that it would be good to have some
> logs from a scenario where people can notice if this fix applies to
> their failure.

Cxl_test:

The cxl_test regression test for this fix is to setup 2 regions for
auto-detection on the same port and load/reload enough times that
the HPA violation occurs when the fix is not in place. I'll see
about adding that in a v2 of this patch, and an ndctl PATCH for the
regression/unit test.

Dmesg logs:
The footprint of this failure is only visible with CXL DEBUG enabled:

[] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200]
[] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference

When CXL DEBUG is not enabled, there is no failure message. The region
just never materializes. With this patch, I hope the HPA order violation
is only a dev_dbg() level message again.

Makes me wonder:

This case aside, the 'opportunistic' approach to region assembly doesn't
offer a place to make any summary statements about the auto discovery
results.

In cxl_endpoint_probe() we get here with CXL_DECODER_STATE_AUTO :

>>	/*
>>	 * Now that all endpoint decoders are successfully enumerated, try to
>>	 * assemble regions from committed decoders
>>	 */
>>	device_for_each_child(&port->dev, root, discover_region);

Do we need to do some expected/found work here?

> 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> I think this wants a "Fixes" tag:
> 
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> 
Got it.

> > ---
> >  
> >  drivers/cxl/core/region.c | 31 ++++++++++++++++++++++++++++---
> >  1 file changed, 28 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index e115ba382e04..8bfec7a96975 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -717,13 +717,37 @@ static int match_free_decoder(struct device *dev, void *data)
> >  	return 0;
> >  }
> >  
> > +static int match_auto_decoder(struct device *dev, void *data)
> > +{
> > +	struct cxl_endpoint_decoder *cxled = data;
> > +	struct cxl_decoder *cxld;
> > +
> > +	if (!is_switch_decoder(dev))
> > +		return 0;
> 
> Is this check needed? Endpoint decoders should also match by range.
> Maybe make it explicit like:
> 
>     if (cxld == &cxled->cxld)
>     	return 0;
> 
> ...where it is obvious no further checks are needed, but I think that
> also goes away with the change proposal below:
> 
> 
> > +
> > +	cxld = to_cxl_decoder(dev);
> > +
> > +	if (!range_contains(&cxld->hpa_range, &cxled->cxld.hpa_range))
> > +		return 0;
> 
> Hmm, shouldn't it be identical and no bigger?
> 
> if (cxld->hpa_range != cxled->cxld.hpa_range)
> 
> > +
> > +	if (!cxld->region)
> > +		return 1;
> 
> Interesting, I am trying to think through the implications of failing
> here. That would only happen if the port had been setup previously with
> a different region for the same address range? How would that happen?
> 
> It feel like it should be:
> 
> 	if (cxld->region) {
> 		dev_WARN(...)
> 		return 0;
> 	}
> 
> 	return 1;
>

Understood. I picked up checks from the existing match_free_decoder() too
willy nilly.

> > +
> > +	return 0;
> > +}
> > +
> >  static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > -						   struct cxl_region *cxlr)
> > +						   struct cxl_region *cxlr,
> > +						   struct cxl_endpoint_decoder *cxled)
> >  {
> >  	struct device *dev;
> >  	int id = 0;
> >  
> > -	dev = device_find_child(&port->dev, &id, match_free_decoder);
> > +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> > +		dev = device_find_child(&port->dev, cxled, match_auto_decoder);
> > +	else
> > +		dev = device_find_child(&port->dev, &id, match_free_decoder);
> > +
> >  	if (!dev)
> >  		return NULL;
> >  	/*
> > @@ -839,7 +863,8 @@ static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr,
> >  	if (port == cxled_to_port(cxled))
> >  		cxld = &cxled->cxld;
> >  	else
> > -		cxld = cxl_region_find_decoder(port, cxlr);
> > +		cxld = cxl_region_find_decoder(port, cxlr, cxled);
> 
> It looks like the cxled is only used to convey the range. Maybe just get
> that from cxlr->params->res and not add another parameter here? Of
> course that would then change the suggestions above where you can not
> compare 'struct range' instances directly.

Got it. Will compare the res->starts and res->ends directly and use
the cxlr->params.

Thanks Dan!
Dan Williams Aug. 16, 2023, 1:43 a.m. UTC | #3
Alison Schofield wrote:
> On Fri, Aug 04, 2023 at 03:13:58PM -0700, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > Today, when the region driver attaches a region to a port, it
> > > selects the ports next available decoder to program.
> > 
> > A small nit: s/Today/Currently/
> Got it.
> 
> > 
> > > With the addition of auto-discovered regions, a port decoder has
> > > already been programmed, so grabbing the next available decoder
> > > can be a mismatch when there is more than one region using the
> > > port. Match on the port HPA range for auto-discovered regions.
> > 
> > This patch looks correct to me, just a couple questions beloe.
> > 
> > It would be great if it was accompanied by a cxl_test scenario that
> > tested the failing case, but barring that it would be good to have some
> > logs from a scenario where people can notice if this fix applies to
> > their failure.
> 
> Cxl_test:
> 
> The cxl_test regression test for this fix is to setup 2 regions for
> auto-detection on the same port and load/reload enough times that
> the HPA violation occurs when the fix is not in place. I'll see
> about adding that in a v2 of this patch, and an ndctl PATCH for the
> regression/unit test.
> 
> Dmesg logs:
> The footprint of this failure is only visible with CXL DEBUG enabled:
> 
> [] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200]
> [] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference
> 
> When CXL DEBUG is not enabled, there is no failure message. The region
> just never materializes. With this patch, I hope the HPA order violation
> is only a dev_dbg() level message again.
> 
> Makes me wonder:
> 
> This case aside, the 'opportunistic' approach to region assembly doesn't
> offer a place to make any summary statements about the auto discovery
> results.
> 
> In cxl_endpoint_probe() we get here with CXL_DECODER_STATE_AUTO :
> 
> >>	/*
> >>	 * Now that all endpoint decoders are successfully enumerated, try to
> >>	 * assemble regions from committed decoders
> >>	 */
> >>	device_for_each_child(&port->dev, root, discover_region);
> 
> Do we need to do some expected/found work here?

In the PMEM label case the CXL core could do that because it would have
an idea of what is supposed to be there. Otherwise, this code path can
not tell the difference between regions that were partially setup before
a kexec happened, or regions that were fully setup by the BIOS ahead of
time.

It might be worthwhile to report on System RAM that intersects a CXL
Window and expect a region to show up, but between BIOS quirks and bugs
it is difficult to make crisp "expected" statements here.
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e115ba382e04..8bfec7a96975 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -717,13 +717,37 @@  static int match_free_decoder(struct device *dev, void *data)
 	return 0;
 }
 
+static int match_auto_decoder(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_decoder *cxld;
+
+	if (!is_switch_decoder(dev))
+		return 0;
+
+	cxld = to_cxl_decoder(dev);
+
+	if (!range_contains(&cxld->hpa_range, &cxled->cxld.hpa_range))
+		return 0;
+
+	if (!cxld->region)
+		return 1;
+
+	return 0;
+}
+
 static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
-						   struct cxl_region *cxlr)
+						   struct cxl_region *cxlr,
+						   struct cxl_endpoint_decoder *cxled)
 {
 	struct device *dev;
 	int id = 0;
 
-	dev = device_find_child(&port->dev, &id, match_free_decoder);
+	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
+		dev = device_find_child(&port->dev, cxled, match_auto_decoder);
+	else
+		dev = device_find_child(&port->dev, &id, match_free_decoder);
+
 	if (!dev)
 		return NULL;
 	/*
@@ -839,7 +863,8 @@  static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr,
 	if (port == cxled_to_port(cxled))
 		cxld = &cxled->cxld;
 	else
-		cxld = cxl_region_find_decoder(port, cxlr);
+		cxld = cxl_region_find_decoder(port, cxlr, cxled);
+
 	if (!cxld) {
 		dev_dbg(&cxlr->dev, "%s: no decoder available\n",
 			dev_name(&port->dev));