[14/46] cxl/hdm: Enumerate allocated DPA

Message ID	165603880411.551046.9204694225111844300.stgit@dwillia2-xfh (mailing list archive)
State	Superseded
Headers	show Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B78D41FC8; Fri, 24 Jun 2022 02:46:58 +0000 (UTC) Date: Thu, 23 Jun 2022 19:46:44 -0700 From: Dan Williams <dan.j.williams@intel.com> To: <linux-cxl@vger.kernel.org> CC: Ben Widawsky <bwidawsk@kernel.org>, <hch@infradead.org>, <alison.schofield@intel.com>, <nvdimm@lists.linux.dev>, <linux-pci@vger.kernel.org>, <patches@lists.linux.dev> Subject: [PATCH 14/46] cxl/hdm: Enumerate allocated DPA Message-ID: <165603880411.551046.9204694225111844300.stgit@dwillia2-xfh> References: <165603869943.551046.3498980330327696732.stgit@dwillia2-xfh> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <165603869943.551046.3498980330327696732.stgit@dwillia2-xfh> User-Agent: StGit/0.18-3-g996c Precedence: bulk MIME-Version: 1.0
Series	CXL PMEM Region Provisioning \| expand [00/46] CXL PMEM Region Provisioning [01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention [02/46] cxl/port: Keep port->uport valid for the entire life of a port [03/46] cxl/hdm: Use local hdm variable [04/46] cxl/core: Rename ->decoder_range ->hpa_range [05/46] cxl/core: Drop ->platform_res attribute for root decoders [06/46] cxl/core: Drop is_cxl_decoder() [07/46] cxl: Introduce cxl_to_{ways,granularity} [08/46] cxl/core: Define a 'struct cxl_switch_decoder' [09/46] cxl/acpi: Track CXL resources in iomem_resource [10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources [11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources [12/46] cxl/mem: Convert partition-info to resources [13/46] cxl/hdm: Require all decoders to be enumerated [14/46] cxl/hdm: Enumerate allocated DPA [15/46] cxl/Documentation: List attribute permissions [16/46] cxl/hdm: Add 'mode' attribute to decoder objects [17/46] cxl/hdm: Track next decoder to allocate [18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder [19/46] cxl/debug: Move debugfs init to cxl_core_init() [20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' [21/46] tools/testing/cxl: Move cxl_test resources to the top of memory [22/46] tools/testing/cxl: Expand CFMWS windows [23/46] tools/testing/cxl: Add partition support [24/46] tools/testing/cxl: Fix decoder default state [25/46] cxl/port: Record dport in endpoint references [26/46] cxl/port: Record parent dport when adding ports [27/46] cxl/port: Move 'cxl_ep' references to an xarray per port [28/46] cxl/port: Move dport tracking to an xarray [29/46] cxl/port: Cache CXL host bridge data [30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity [31/46] cxl/hdm: Initialize decoder type for memory expander devices [32/46] cxl/mem: Enumerate port targets before adding endpoints [33/46] resource: Introduce alloc_free_mem_region() [34/46] cxl/region: Add region creation support [35/46] cxl/region: Add a 'uuid' attribute [36/46] cxl/region: Add interleave ways attribute [37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions [38/46] cxl/region: Enable the assignment of endpoint decoders to regions [39/46] cxl/acpi: Add a host-bridge index lookup mechanism [40/46] cxl/region: Attach endpoint decoders [41/46] cxl/region: Program target lists [42/46] cxl/hdm: Commit decoder state to hardware [43/46] cxl/region: Add region driver boiler plate [44/46] cxl/pmem: Delete unused nvdimm attribute [45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge [46/46] cxl/region: Introduce cxl_pmem_region objects

Message ID

165603880411.551046.9204694225111844300.stgit@dwillia2-xfh (mailing list archive)

State

Superseded

Headers

Date: Thu, 23 Jun 2022 19:46:44 -0700
From: Dan Williams <dan.j.williams@intel.com>
To: <linux-cxl@vger.kernel.org>
CC: Ben Widawsky <bwidawsk@kernel.org>, <hch@infradead.org>,
	<alison.schofield@intel.com>, <nvdimm@lists.linux.dev>,
	<linux-pci@vger.kernel.org>, <patches@lists.linux.dev>
Subject: [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
Message-ID: <165603880411.551046.9204694225111844300.stgit@dwillia2-xfh>
References: <165603869943.551046.3498980330327696732.stgit@dwillia2-xfh>
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <165603869943.551046.3498980330327696732.stgit@dwillia2-xfh>
User-Agent: StGit/0.18-3-g996c
Precedence: bulk
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 aTc3e3J89KqnwehVkufvF3ug8XAXtRUvsTk/adRoEg/Om4I1aQCST9+yg4VLOVakiAGh0tmTH+iLmtoPYde25K69hRvvbWt2ZEKC1o2cKnqZrVWESW8WHX04PJ1UCF/LeMd/RCPCWRxBvM3zUkTAbG6e7TWLEYHmLRURNj8qTkCa+86+DfGdO6/XRgqU6eiok9HL51XqfVRoPmAo2cz9CmZnTBmGMvf1ZkjLIj7S4c4GcFf+wZvFYfrAT02NebN5XZnNcedo0zbJCCK8OqUTqrbmk0xyL1bbf+n1oqt2WsSESD6mQnUU4y0V6xFrAERvvw+8ncPrM42ZocVf8wyKKdoMPm4V99mB5n3K+WTBzyzBrLGvxp5udhohVe5j4acPXuSiIou6klZnflmR73I5CT/GE2RvcdDMW66IXTfMoZNbLDy78hPtA96hH2tWVbUItLLZWDTpQ6f5CGsr5xhwJgSnHAbeNFrkkMAwvZlr4B2IvOzdqee1G1HV+JDytJ2A/gq80EjmAyACGzgkK1DALNhXcT/HyVSanot5og8aWQCMPtXFg3oh+/XRgxDcuB/D7gZHx8g87EwKa0zQqcI+aMTgtWRLg9F941Cgfxdi0DbnBFBOsfgr5/nU6O8dLp3/Wy5WRsFDV42Dx7wTWobvLCVGKt1W9HNLe+GDwh1RLc1Qh7O7qetKJglNHdKfR8Zu3aLW5F+zTMx3SDIFxsXVvZfsN5azprgBWh9+fTbAMwAGdRAPe+pMl+lpmeSJwGrwibyAxgbAwLDvfYOp4pkmuE4meiU6rzZDb8y/htghVsQiLNAj51FpRt5QRWOSQ9W0pWnsXeN+FF4KJtZb5tAtq1TAx4B6dHI2qfB7s0O1P9OcgH6HLeVF4r/PaYZM1HRV9iMLWfJmb5Ip0yoN1twr5mGz+48ptK3ysI7mu9KV00lMoZ+Pouztgn3nz5cCiQ/9S2w7vkhQJGAZCPAh28COl/HPh9BgFWs4iH2URxPxpBfs/TWS+clreex+5mcO26plm0pbLzoGB0SVglAUUe8mk4lVBRNZxc0052Fm64f8ia2fWqhwZV1viskRT3RwjhCZltVJzPqyeRMSUBhUThpG/m+CAn7xi89K3O8xPY+ODWWgvWlaJKmV1YeEM+EpMoglwntSVZz0hHkO9hWZczuA7ZCykzLqbvBm8svng3hypeanb3XCDJoubH9KuGJ9PFjPwYDFEIxVU/YkORgtLJ1SVYeZXTQnU4us2u7XC5o+q5avdr7qJVHb5/Jm3/sU53JLfr2kATtLA0rHOLS9hdydGHYYRaAZSEBaUBLh6HgCFI5M73COM0OdyVvdy9ZydasnlF96k9XGKJU9wxOgMtuuchUtiHAmfnYYT4vK904DW5oXD4O1FXyK1qW239DCrUb9dBQ7HdheUJ6Oi2S3g1KtXJV7kHsSZn2AI51m+5Bx3aD9IvIiHyKsUm3v+M7RXjdY6UaGdzF5LQECqUggu9cM5fSGcpiF0PQqk2WdKbLiKPN+1V9SiRPVDWgxg72rbdACcxBBRMtQwq0bBqzCrPUbBvFIxcrbm6ZSpZBpCxgqI6pgJtW3PaPVaNJOoyOHjIMorE1bCxt6ov3Z5+3FO0tXGQ==
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 d91bceac-39cd-4664-ebdf-08da558bc76d
X-MS-Exchange-CrossTenant-AuthSource: 
 MWHPR1101MB2126.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jun 2022 02:46:46.3218
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 JG6ppHoG1+8qY/0ySOyAqckhKUQNFnEu0VVDv2DMX3AXKfXT2z9bNBQnpx8nTuI4Cx63h8KBEVxh3OS9lc1ARGOJP7NQweDmmDMtfuk5+Zo=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2789
X-OriginatorOrg: intel.com

Series

CXL PMEM Region Provisioning | expand

Commit Message

Dan Williams June 24, 2022, 2:46 a.m. UTC

In preparation for provisioining CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comrpised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.

Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volaltile-ram).

The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/cxl/cxl.h      |    2 +
 drivers/cxl/cxlmem.h   |   13 ++++
 3 files changed, 152 insertions(+), 11 deletions(-)

Comments

Jonathan Cameron June 29, 2022, 2:43 p.m. UTC | #1

On Thu, 23 Jun 2022 19:46:44 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for provisioining CXL regions, add accounting for the DPA
> space consumed by existing regions / decoders. Recall, a CXL region is a
> memory range comrpised from one or more endpoint devices contributing a
> mapping of their DPA into HPA space through a decoder.
> 
> Record the DPA ranges covered by committed decoders at initial probe of
> endpoint ports relative to a per-device resource tree of the DPA type
> (pmem or volaltile-ram).
> 
> The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
> state across all endpoints and their decoders at once. The vast majority
> of DPA operations are reads as region creation is expected to be as rare
> as disk partitioning and volume creation. The device_lock() for this
> synchronization is specifically avoided for concern of entangling with
> sysfs attribute removal.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
>  drivers/cxl/cxl.h      |    2 +
>  drivers/cxl/cxlmem.h   |   13 ++++
>  3 files changed, 152 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index c940a4911fee..daae6e533146 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -7,6 +7,8 @@
>  #include "cxlmem.h"
>  #include "core.h"
>  
> +static DECLARE_RWSEM(cxl_dpa_rwsem);

I've not checked many files, but pci.c has equivalent static defines after
the DOC: entry so for consistency move this below that?


> +
>  /**
>   * DOC: cxl core hdm
>   *
> @@ -128,10 +130,108 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
>  
> +/*
> + * Must be called in a context that synchronizes against this decoder's
> + * port ->remove() callback (like an endpoint decoder sysfs attribute)
> + */
> +static void cxl_dpa_release(void *cxled);
> +static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
> +{
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct resource *res = cxled->dpa_res;
> +
> +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> +
> +	if (remove_action)
> +		devm_remove_action(&port->dev, cxl_dpa_release, cxled);

This code organization is more surprising than I'd like. Why not move this to
a wrapper that is like devm_kfree() and similar which do the free now and
remove from the devm list?

static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
{
	struct cxl_port *port = cxled_to_port(cxled);
	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
	struct cxl_dev_state *cxlds = cxlmd->cxlds;
	struct resource *res = cxled->dpa_res;

	if (cxled->skip)
		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
				 cxled->skip);
	cxled->skip = 0;
	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
	cxled->dpa_res = NULL;
}

/* possibly add some underscores to this name to indicate it's special
   in when you can safely call it */
static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
{
	struct cxl_port *port = cxled_to_port(cxled);
	lockdep_assert_held_write(&cxl_dpa_rwsem);
	devm_remove_action(&port->dev, cxl_dpa_release, cxled);
	__cxl_dpa_release(cxled);
}

static void cxl_dpa_release(void *cxled)
{
	down_write(&cxl_dpa_rwsem);
	__cxl_dpa_release(cxled, false);
	up_write(&cxl_dpa_rwsem);
}

> +
> +	if (cxled->skip)
> +		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> +				 cxled->skip);
> +	cxled->skip = 0;
> +	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> +	cxled->dpa_res = NULL;
> +}
> +
> +static void cxl_dpa_release(void *cxled)
> +{
> +	down_write(&cxl_dpa_rwsem);
> +	__cxl_dpa_release(cxled, false);
> +	up_write(&cxl_dpa_rwsem);
> +}
> +
> +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> +			     resource_size_t base, resource_size_t len,
> +			     resource_size_t skip)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct device *dev = &port->dev;
> +	struct resource *res;
> +
> +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> +
> +	if (!len)
> +		return 0;
> +
> +	if (cxled->dpa_res) {
> +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> +			port->id, cxled->cxld.id, cxled->dpa_res);
> +		return -EBUSY;
> +	}
> +
> +	if (skip) {
> +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> +				       dev_name(dev), 0);


Interface that uses a backwards definition of skip as what to skip before
the base parameter is a little odd can we rename base parameter to something
like 'current_top' then have base = current_top + skip?  current_top naming
not great though...



> +		if (!res) {
> +			dev_dbg(dev,
> +				"decoder%d.%d: failed to reserve skip space\n",
> +				port->id, cxled->cxld.id);
> +			return -EBUSY;
> +		}
> +	}
> +	res = __request_region(&cxlds->dpa_res, base, len, dev_name(dev), 0);
> +	if (!res) {
> +		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> +			port->id, cxled->cxld.id);
> +		if (skip)
> +			__release_region(&cxlds->dpa_res, base - skip, skip);
> +		return -EBUSY;
> +	}
> +	cxled->dpa_res = res;
> +	cxled->skip = skip;
> +
> +	return 0;
> +}
> +

...

Dan Williams July 10, 2022, 3:03 a.m. UTC | #2

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:44 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > In preparation for provisioining CXL regions, add accounting for the DPA
> > space consumed by existing regions / decoders. Recall, a CXL region is a
> > memory range comrpised from one or more endpoint devices contributing a
> > mapping of their DPA into HPA space through a decoder.
> > 
> > Record the DPA ranges covered by committed decoders at initial probe of
> > endpoint ports relative to a per-device resource tree of the DPA type
> > (pmem or volaltile-ram).
> > 
> > The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
> > state across all endpoints and their decoders at once. The vast majority
> > of DPA operations are reads as region creation is expected to be as rare
> > as disk partitioning and volume creation. The device_lock() for this
> > synchronization is specifically avoided for concern of entangling with
> > sysfs attribute removal.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
> >  drivers/cxl/cxl.h      |    2 +
> >  drivers/cxl/cxlmem.h   |   13 ++++
> >  3 files changed, 152 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index c940a4911fee..daae6e533146 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -7,6 +7,8 @@
> >  #include "cxlmem.h"
> >  #include "core.h"
> >  
> > +static DECLARE_RWSEM(cxl_dpa_rwsem);
> 
> I've not checked many files, but pci.c has equivalent static defines after
> the DOC: entry so for consistency move this below that?

ok.

> 
> 
> > +
> >  /**
> >   * DOC: cxl core hdm
> >   *
> > @@ -128,10 +130,108 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
> >  
> > +/*
> > + * Must be called in a context that synchronizes against this decoder's
> > + * port ->remove() callback (like an endpoint decoder sysfs attribute)
> > + */
> > +static void cxl_dpa_release(void *cxled);
> > +static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
> > +{
> > +	struct cxl_port *port = cxled_to_port(cxled);
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct resource *res = cxled->dpa_res;
> > +
> > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > +
> > +	if (remove_action)
> > +		devm_remove_action(&port->dev, cxl_dpa_release, cxled);
> 
> This code organization is more surprising than I'd like. Why not move this to
> a wrapper that is like devm_kfree() and similar which do the free now and
> remove from the devm list?

True. I see how this got here incrementally, but this end state can
definitely now be fixed up to be more devm idiomatic.

> 
> static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> {
> 	struct cxl_port *port = cxled_to_port(cxled);
> 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> 	struct resource *res = cxled->dpa_res;
> 
> 	if (cxled->skip)
> 		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> 				 cxled->skip);
> 	cxled->skip = 0;
> 	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> 	cxled->dpa_res = NULL;
> }
> 
> /* possibly add some underscores to this name to indicate it's special
>    in when you can safely call it */
> static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> {
> 	struct cxl_port *port = cxled_to_port(cxled);
> 	lockdep_assert_held_write(&cxl_dpa_rwsem);
> 	devm_remove_action(&port->dev, cxl_dpa_release, cxled);
> 	__cxl_dpa_release(cxled);
> }
> 
> static void cxl_dpa_release(void *cxled)
> {
> 	down_write(&cxl_dpa_rwsem);
> 	__cxl_dpa_release(cxled, false);
> 	up_write(&cxl_dpa_rwsem);
> }
> 
> > +
> > +	if (cxled->skip)
> > +		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> > +				 cxled->skip);
> > +	cxled->skip = 0;
> > +	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > +	cxled->dpa_res = NULL;
> > +}
> > +
> > +static void cxl_dpa_release(void *cxled)
> > +{
> > +	down_write(&cxl_dpa_rwsem);
> > +	__cxl_dpa_release(cxled, false);
> > +	up_write(&cxl_dpa_rwsem);
> > +}
> > +
> > +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > +			     resource_size_t base, resource_size_t len,
> > +			     resource_size_t skip)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_port *port = cxled_to_port(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct device *dev = &port->dev;
> > +	struct resource *res;
> > +
> > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > +
> > +	if (!len)
> > +		return 0;
> > +
> > +	if (cxled->dpa_res) {
> > +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> > +			port->id, cxled->cxld.id, cxled->dpa_res);
> > +		return -EBUSY;
> > +	}
> > +
> > +	if (skip) {
> > +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> > +				       dev_name(dev), 0);
> 
> 
> Interface that uses a backwards definition of skip as what to skip before
> the base parameter is a little odd can we rename base parameter to something
> like 'current_top' then have base = current_top + skip?  current_top naming
> not great though...

How about just name it "skipped" instead of "skip"? As the parameter is
how many bytes were skipped to allow a new allocation to start at base.

Jonathan Cameron July 19, 2022, 2:25 p.m. UTC | #3

...

> > > +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > > +			     resource_size_t base, resource_size_t len,
> > > +			     resource_size_t skip)
> > > +{
> > > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > > +	struct cxl_port *port = cxled_to_port(cxled);
> > > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > +	struct device *dev = &port->dev;
> > > +	struct resource *res;
> > > +
> > > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > > +
> > > +	if (!len)
> > > +		return 0;
> > > +
> > > +	if (cxled->dpa_res) {
> > > +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> > > +			port->id, cxled->cxld.id, cxled->dpa_res);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	if (skip) {
> > > +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> > > +				       dev_name(dev), 0);  
> > 
> > 
> > Interface that uses a backwards definition of skip as what to skip before
> > the base parameter is a little odd can we rename base parameter to something
> > like 'current_top' then have base = current_top + skip?  current_top naming
> > not great though...  
> 
> How about just name it "skipped" instead of "skip"? As the parameter is
> how many bytes were skipped to allow a new allocation to start at base.

Works for me (guessing you long since went with this given how far behind I am!)

Thanks,

Jonathan

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index c940a4911fee..daae6e533146 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -7,6 +7,8 @@ 
 #include "cxlmem.h"
 #include "core.h"
 
+static DECLARE_RWSEM(cxl_dpa_rwsem);
+
 /**
  * DOC: cxl core hdm
  *
@@ -128,10 +130,108 @@  struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
 
+/*
+ * Must be called in a context that synchronizes against this decoder's
+ * port ->remove() callback (like an endpoint decoder sysfs attribute)
+ */
+static void cxl_dpa_release(void *cxled);
+static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
+{
+	struct cxl_port *port = cxled_to_port(cxled);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct resource *res = cxled->dpa_res;
+
+	lockdep_assert_held_write(&cxl_dpa_rwsem);
+
+	if (remove_action)
+		devm_remove_action(&port->dev, cxl_dpa_release, cxled);
+
+	if (cxled->skip)
+		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
+				 cxled->skip);
+	cxled->skip = 0;
+	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
+	cxled->dpa_res = NULL;
+}
+
+static void cxl_dpa_release(void *cxled)
+{
+	down_write(&cxl_dpa_rwsem);
+	__cxl_dpa_release(cxled, false);
+	up_write(&cxl_dpa_rwsem);
+}
+
+static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+			     resource_size_t base, resource_size_t len,
+			     resource_size_t skip)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_port *port = cxled_to_port(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct device *dev = &port->dev;
+	struct resource *res;
+
+	lockdep_assert_held_write(&cxl_dpa_rwsem);
+
+	if (!len)
+		return 0;
+
+	if (cxled->dpa_res) {
+		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
+			port->id, cxled->cxld.id, cxled->dpa_res);
+		return -EBUSY;
+	}
+
+	if (skip) {
+		res = __request_region(&cxlds->dpa_res, base - skip, skip,
+				       dev_name(dev), 0);
+		if (!res) {
+			dev_dbg(dev,
+				"decoder%d.%d: failed to reserve skip space\n",
+				port->id, cxled->cxld.id);
+			return -EBUSY;
+		}
+	}
+	res = __request_region(&cxlds->dpa_res, base, len, dev_name(dev), 0);
+	if (!res) {
+		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
+			port->id, cxled->cxld.id);
+		if (skip)
+			__release_region(&cxlds->dpa_res, base - skip, skip);
+		return -EBUSY;
+	}
+	cxled->dpa_res = res;
+	cxled->skip = skip;
+
+	return 0;
+}
+
+static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+			   resource_size_t base, resource_size_t len,
+			   resource_size_t skip)
+{
+	struct cxl_port *port = cxled_to_port(cxled);
+	int rc;
+
+	down_write(&cxl_dpa_rwsem);
+	rc = __cxl_dpa_reserve(cxled, base, len, skip);
+	up_write(&cxl_dpa_rwsem);
+
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
+}
+
 static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
-			    int *target_map, void __iomem *hdm, int which)
+			    int *target_map, void __iomem *hdm, int which,
+			    u64 *dpa_base)
 {
-	u64 size, base;
+	struct cxl_endpoint_decoder *cxled = NULL;
+	u64 size, base, skip, dpa_size;
+	bool committed;
+	u32 remainder;
 	int i, rc;
 	u32 ctrl;
 	union {
@@ -139,11 +239,15 @@  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		unsigned char target_id[8];
 	} target_list;
 
+	if (is_endpoint_decoder(&cxld->dev))
+		cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
 	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
 	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
 	size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
+	committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED);
 
-	if (!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED))
+	if (!committed)
 		size = 0;
 	if (base == U64_MAX || size == U64_MAX) {
 		dev_warn(&port->dev, "decoder%d.%d: Invalid resource range\n",
@@ -156,8 +260,8 @@  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		.end = base + size - 1,
 	};
 
-	/* switch decoders are always enabled if committed */
-	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED) {
+	/* decoders are enabled if committed */
+	if (committed) {
 		cxld->flags |= CXL_DECODER_F_ENABLE;
 		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
 			cxld->flags |= CXL_DECODER_F_LOCK;
@@ -180,14 +284,35 @@  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	else
 		cxld->target_type = CXL_DECODER_ACCELERATOR;
 
-	if (is_endpoint_decoder(&cxld->dev))
+	if (!cxled) {
+		target_list.value =
+			ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));
+		for (i = 0; i < cxld->interleave_ways; i++)
+			target_map[i] = target_list.target_id[i];
+
 		return 0;
+	}
 
-	target_list.value =
-		ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));
-	for (i = 0; i < cxld->interleave_ways; i++)
-		target_map[i] = target_list.target_id[i];
+	if (!committed)
+		return 0;
 
+	dpa_size = div_u64_rem(size, cxld->interleave_ways, &remainder);
+	if (remainder) {
+		dev_err(&port->dev,
+			"decoder%d.%d: invalid committed configuration size: %#llx ways: %d\n",
+			port->id, cxld->id, size, cxld->interleave_ways);
+		return -ENXIO;
+	}
+	skip = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SKIP_LOW(which));
+	rc = cxl_dpa_reserve(cxled, *dpa_base + skip, dpa_size, skip);
+	if (rc) {
+		dev_err(&port->dev,
+			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
+			port->id, cxld->id, *dpa_base,
+			*dpa_base + dpa_size + skip - 1, rc);
+		return rc;
+	}
+	*dpa_base += dpa_size + skip;
 	return 0;
 }
 
@@ -200,6 +325,7 @@  int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
 	struct cxl_port *port = cxlhdm->port;
 	int i, committed;
+	u64 dpa_base = 0;
 	u32 ctrl;
 
 	/*
@@ -247,7 +373,7 @@  int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			return PTR_ERR(cxld);
 		}
 
-		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
+		rc = init_hdm_decoder(port, cxld, target_map, hdm, i, &dpa_base);
 		if (rc) {
 			put_device(&cxld->dev);
 			return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 579f2d802396..6832d6d70548 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -56,6 +56,8 @@ 
 #define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
 #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
 #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
+#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
+#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
 
 static inline int cxl_hdm_decoder_count(u32 cap_hdr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a9609d40643f..b4e5ed9eabc9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -50,6 +50,19 @@  static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
 	return container_of(dev, struct cxl_memdev, dev);
 }
 
+static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled)
+{
+	return to_cxl_port(cxled->cxld.dev.parent);
+}
+
+static inline struct cxl_memdev *
+cxled_to_memdev(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent);
+
+	return to_cxl_memdev(port->uport);
+}
+
 bool is_cxl_memdev(struct device *dev);
 static inline bool is_cxl_endpoint(struct cxl_port *port)
 {

[14/46] cxl/hdm: Enumerate allocated DPA

Commit Message

Comments

Patch