diff mbox series

cxl/region: don't try to cleanup after cxl_region_setup_targets() fails

Message ID 169696311899.1171696.7812961484055097837.stgit@bgt-140510-bm03.eng.stellus.in
State Superseded
Headers show
Series cxl/region: don't try to cleanup after cxl_region_setup_targets() fails | expand

Commit Message

Jim Harris Oct. 10, 2023, 6:38 p.m. UTC
Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in
cxl_region_attach()") tried to avoid 'eiw' initialization errors when
->nr_targets exceeded 16, by just decrementing ->nr_targets when
cxl_region_setup_targets() failed. Patch 86987c76 ("cxl/region: Cleanup
target list on attach error") extended that cleanup to also clear
cxled->pos and p->targets[pos].

The initialization error was incidentally fixed separately by patch
8d4285425 ("cxl/region: Fix port setup uninitialized variable warnings")
which was merged a few days after 5e42bcbc.

But now the original cleanup when cxl_region_setup_targets() fails
prevents endpoint and switch decoder resources from being reused:

1) the cleanup does not set the decoder's region to NULL, which results
   in future dpa_size_store() calls returning -EBUSY
2) the decoder is not properly freed, which results in future commit
   errors associated with the upstream switch

Now that the initialization errors were fixed separately, the proper
cleanup for this case is to just return immediately. Then the resources
associated with this target get cleanup up as normal when the failed
region is deleted.

Tested by trying to create an invalid region for a 2 switch * 2 endpoint
topology, and then following up with creating a valid region.

Signed-off-by: Jim Harris <jim.harris@samsung.com>
---
 drivers/cxl/core/region.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Dan Carpenter Oct. 11, 2023, 2:04 p.m. UTC | #1
On Tue, Oct 10, 2023 at 06:38:39PM +0000, Jim Harris wrote:
> Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in
> cxl_region_attach()") tried to avoid 'eiw' initialization errors when
> ->nr_targets exceeded 16, by just decrementing ->nr_targets when
> cxl_region_setup_targets() failed.

I mean that's what I wrote but I fairly sure that I was concerned about
->nr_targets getting incremented to an invalid value.

drivers/cxl/core/region.c
  1746          p->targets[pos] = cxled;
                ^^^^^^^^^^^^^^^
This array has CXL_DECODER_MAX_INTERLEAVE (16) elements.

  1747          cxled->pos = pos;
  1748          p->nr_targets++;
  1749  
  1750          if (p->nr_targets == p->interleave_ways) {
                                     ^^^^^^^^^^^^^^^^^^
This is how many we want, but it's capped at 16 so we don't go over.
Like I guess we add one at a time until we hit the max and then when we
get everything added

  1751                  rc = cxl_region_setup_targets(cxlr);

Then we register stuff.

So if we decrement and try to attach another region then my idea was
that it would write over the last element in the array.  But if we don't
have the decrement and we try to attach another region it will go beyond
the end of the array.

  1752                  if (rc)
  1753                          goto err_decrement;
  1754                  p->state = CXL_CONFIG_ACTIVE;
  1755          }
  1756  
  1757          cxled->cxld.interleave_ways = p->interleave_ways;
  1758          cxled->cxld.interleave_granularity = p->interleave_granularity;
  1759          cxled->cxld.hpa_range = (struct range) {
  1760                  .start = p->res->start,
  1761                  .end = p->res->end,
  1762          };
  1763  
  1764          return 0;
  1765  
  1766  err_decrement:
  1767          p->nr_targets--;
  1768          cxled->pos = -1;
  1769          p->targets[pos] = NULL;
  1770          return rc;
  1771  }

But I was just going from static analysis and code review and not
testing and obviously you have tested this.  A simple fix for my
concern would be to do this:

regards,
dan carpenter

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6d63b8798c29..5948c4a01745 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1649,6 +1649,11 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -ENODEV;
 	}
 
+	if (p->nr_targets >= p->interleave_ways) {
+		dev_dbg(&cxlr->dev, "%s too many regions\n", dev_name(&cxled->cxld.dev));
+		return -EINVAL;
+	}
+
 	/* all full of members, or interleave config not established? */
 	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
 		dev_dbg(&cxlr->dev, "region already active\n");
Jim Harris Oct. 11, 2023, 2:31 p.m. UTC | #2
> On Oct 11, 2023, at 7:04 AM, Dan Carpenter <dan.carpenter@linaro.org> wrote:
> 
> On Tue, Oct 10, 2023 at 06:38:39PM +0000, Jim Harris wrote:
>> Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in
>> cxl_region_attach()") tried to avoid 'eiw' initialization errors when
>> ->nr_targets exceeded 16, by just decrementing ->nr_targets when
>> cxl_region_setup_targets() failed.
> 
> I mean that's what I wrote but I fairly sure that I was concerned about
> ->nr_targets getting incremented to an invalid value.
> 
> drivers/cxl/core/region.c
>  1746          p->targets[pos] = cxled;
>                ^^^^^^^^^^^^^^^
> This array has CXL_DECODER_MAX_INTERLEAVE (16) elements.

Agreed, we need to guard against the array overflow too.

> 
> But I was just going from static analysis and code review and not
> testing and obviously you have tested this.  A simple fix for my
> concern would be to do this:
> 
> regards,
> dan carpenter
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6d63b8798c29..5948c4a01745 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1649,6 +1649,11 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return -ENODEV;
> }
> 
> + if (p->nr_targets >= p->interleave_ways) {
> + dev_dbg(&cxlr->dev, "%s too many regions\n", dev_name(&cxled->cxld.dev));
> + return -EINVAL;
> + }
> +
> /* all full of members, or interleave config not established? */
> if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> dev_dbg(&cxlr->dev, "region already active\n”);

I’ll push a v2.

I had to convince myself that we didn’t also need a comparison against
CXL_DECODER_MAX_INTERLEAVE. But interleave_ways_store() will fail with
a value > 16 via the ways_to_eiw() call, so the p->interleave_ways check
is sufficient.
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6d63b8798c29..315ca1640e06 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1750,7 +1750,7 @@  static int cxl_region_attach(struct cxl_region *cxlr,
 	if (p->nr_targets == p->interleave_ways) {
 		rc = cxl_region_setup_targets(cxlr);
 		if (rc)
-			goto err_decrement;
+			return rc;
 		p->state = CXL_CONFIG_ACTIVE;
 	}
 
@@ -1762,12 +1762,6 @@  static int cxl_region_attach(struct cxl_region *cxlr,
 	};
 
 	return 0;
-
-err_decrement:
-	p->nr_targets--;
-	cxled->pos = -1;
-	p->targets[pos] = NULL;
-	return rc;
 }
 
 static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)