mbox series

[00/46] CXL PMEM Region Provisioning

Message ID 165603869943.551046.3498980330327696732.stgit@dwillia2-xfh (mailing list archive)
Headers show
Series CXL PMEM Region Provisioning | expand

Message

Dan Williams June 24, 2022, 2:45 a.m. UTC
tl;dr: 46 patches is way too many patches to review in one sitting. Jump
to the PATCH SUMMARY below to find a subset of interest to jump into.

The series is also posted on the 'preview' branch [1]. Note that branch
rebases, the tip of that branch at time of posting is:

7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview

---

Until the CXL 2.0 definition arrived there was little reason for OS
drivers to care about CXL memory expanders. Similar to DDR they just
implemented a physical address range that was described to the OS by
platform firmware (EFI Memory Map + ACPI SRAT/SLIT/HMAT etc). The CXL
2.0 definition adds support for PMEM, hotplug, switch topologies, and
device-interleaving which exceeds the limits of what can be reasonably
abstracted by EFI + ACPI mechanisms. As a result, Linux needs a native
capability to provision new CXL regions.

The term "region" is the same term that originated in the LIBNVDIMM
implementation to describe a host physical / system physical address
range. For PMEM a region is a persistent memory range that can be
further sub-divided into namespaces. For CXL there are three
classifications of regions:
- PMEM: set up by CXL native tooling and persisted in CXL region labels

- RAM: set up dynamically by CXL native tooling after hotplug events, or
  leftover capacity not mapped by platform firmware. Any persistent
  configuration would come from set up scripts / configuration files in
  usersapce.

- System RAM: set up by platform firmware and described by EFI + ACPI
  metadata, these regions are static.

For now, these patches implement just PMEM regions without region label
support. Note though that the infrastructure routines like
cxl_region_attach() and cxl_region_setup_targets() are building blocks
for region-label support, provisioning RAM regions, and enumerating
System RAM regions.

The general flow for provisioning a CXL region is to:
- Find a device or set of devices with available device-physical-address
  (DPA) capacity

- Find a platform CXL window that has free capacity to map a new region
  and that is able to target the devices in the previous step.

- Allocate DPA according to the CXL specification rules of sequential
  enabling of decoders by id and when a device hosts multiple decoders
  make sure that lower-id decoders map lower HPA and higher-id decoders
  map higher HPA.

- Assign endpoint decoders to a region and validate that the switching
  topology supports the requested configuration. Recall that
  interleaving is governed by modulo or xormap math that constrains which
  device can support which positions in a given region interleave.

- Program all the decoders an all endpoints and participating switches
  to bring the new address range online.

Once the range is online then existing drivers like LIBNVDIMM or
device-dax can manage the memory range as if the ACPI BIOS had conveyed
its parameters at boot.

This patch kit is the result of significant amounts of path finding work
[2] and long discussions with Ben. Thank you Ben for all that work!
Where the patches in this kit go in a different design direction than
the RFC, the authorship is changed and a Co-developed-by is added mainly
so I get blamed for the bad decisions and not Ben. The major updates
from that last posting are:

- all CXL resources are reflected in full in iomem_resource

- host-physical-address (HPA) range allocation moves to a
  devm_request_free_mem_region() derivative

- locking moves to two global rwsems, one for DPA / endpoint decoders
  and one for HPA / regions.

- the existing port scanning path is augmented to cache more topology
  information rather than recreate it at region creation time

[2]: https://lore.kernel.org/r/20220413183720.2444089-1-ben.widawsky@intel.com

PATCH SUMMARY

If you want to jump straight to the meat of the new infrastructure start
reading at patch 34.

- Patch 34 through 42 is the bulk of the new infrastructure that is
  needed to stand up a new region regardless of whether it is PMEM, or
  RAM.

- Patch 33 is a new core facility for allocating physical address space.
  It is a straightforward extension of devm_request_free_mem_region().

- Patch 9 uses insert_resource_expand_to_fit() to inform the new
  allocator mentioned above about which address ranges are busy / free.

- Patch 46 is the support that takes a CXL PMEM region and turns it into
  a LIBNVDIMM region. Patches 43-45 are just prep work for patch 46.

- Patch 16 - 20 is the infrastructure to mangage DPA capacity, including
  enumerating the DPA that platform firmware may have already allocated
  to a System RAM region. They also enable DPA allocations to be manipulated
  separate from the case when the decoder is assigned to a given region.
  This separation of allocation and region assignment is necessary for
  enumerating regions from region labels where labels within and across
  devices may disagree. Userspace in that situation may need to jump in
  and sort out the allocation conflicts.

- Patches 21 - 24 are updates to cxl_test to put this new implementation
  through its paces with a x8 device region creation test. Recall that
  cxl_test is a way to ship canned CXL configurations in the kernel
  alongside new CXL subsystem code to supplement testing that can be done
  with real devices or QEMU emulation. Note cxl_test just implements
  device topology and ABI, it does not test the PCI-related aspects of
  the implementation.

- Patches 25 - 29 are enhancements to the port enumeration code to cache
  and improve the lookup of topology metadata that is relevant for
  region provisioning.

- Patches 30 - 32 are some straightforward pre-work for exporting
  decoder settings via sysfs.

- Patch 1 - 8, 10 - 15 are some miscellaneous fixes and refactorings
  that should be straightforward to review.

[PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
[PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
[PATCH 03/46] cxl/hdm: Use local hdm variable
[PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
[PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
[PATCH 06/46] cxl/core: Drop is_cxl_decoder()
[PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
[PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
[PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
[PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
[PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
[PATCH 12/46] cxl/mem: Convert partition-info to resources
[PATCH 13/46] cxl/hdm: Require all decoders to be enumerated
[PATCH 14/46] cxl/hdm: Enumerate allocated DPA
[PATCH 15/46] cxl/Documentation: List attribute permissions
[PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
[PATCH 17/46] cxl/hdm: Track next decoder to allocate
[PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
[PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init()
[PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
[PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
[PATCH 22/46] tools/testing/cxl: Expand CFMWS windows
[PATCH 23/46] tools/testing/cxl: Add partition support
[PATCH 24/46] tools/testing/cxl: Fix decoder default state
[PATCH 25/46] cxl/port: Record dport in endpoint references
[PATCH 26/46] cxl/port: Record parent dport when adding ports
[PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port
[PATCH 28/46] cxl/port: Move dport tracking to an xarray
[PATCH 29/46] cxl/port: Cache CXL host bridge data
[PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
[PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices
[PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
[PATCH 33/46] resource: Introduce alloc_free_mem_region()
[PATCH 34/46] cxl/region: Add region creation support
[PATCH 35/46] cxl/region: Add a 'uuid' attribute
[PATCH 36/46] cxl/region: Add interleave ways attribute
[PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
[PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
[PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism
[PATCH 40/46] cxl/region: Attach endpoint decoders
[PATCH 41/46] cxl/region: Program target lists
[PATCH 42/46] cxl/hdm: Commit decoder state to hardware
[PATCH 43/46] cxl/region: Add region driver boiler plate
[PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute
[PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
[PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects

---

 Documentation/ABI/testing/sysfs-bus-cxl         |  271 +++
 Documentation/driver-api/cxl/memory-devices.rst |   11 
 drivers/cxl/Kconfig                             |    8 
 drivers/cxl/acpi.c                              |  198 ++-
 drivers/cxl/core/Makefile                       |    1 
 drivers/cxl/core/core.h                         |   52 +
 drivers/cxl/core/hdm.c                          |  663 ++++++++
 drivers/cxl/core/mbox.c                         |   95 +
 drivers/cxl/core/memdev.c                       |    4 
 drivers/cxl/core/pci.c                          |    8 
 drivers/cxl/core/pmem.c                         |    4 
 drivers/cxl/core/port.c                         |  678 ++++++---
 drivers/cxl/core/region.c                       | 1797 +++++++++++++++++++++++
 drivers/cxl/cxl.h                               |  294 +++-
 drivers/cxl/cxlmem.h                            |   39 
 drivers/cxl/mem.c                               |   49 -
 drivers/cxl/pci.c                               |    2 
 drivers/cxl/pmem.c                              |  256 +++
 drivers/nvdimm/region_devs.c                    |   28 
 include/linux/ioport.h                          |    2 
 include/linux/libnvdimm.h                       |    5 
 kernel/resource.c                               |  181 ++
 mm/Kconfig                                      |    5 
 tools/testing/cxl/Kbuild                        |    1 
 tools/testing/cxl/test/cxl.c                    |  123 +-
 tools/testing/cxl/test/mem.c                    |   53 -
 tools/testing/cxl/test/mock.c                   |    8 
 27 files changed, 4300 insertions(+), 536 deletions(-)
 create mode 100644 drivers/cxl/core/region.c

base-commit: f50974eee5c4a5de1e4f1a3d873099f170df25f8

Comments

Jonathan Cameron June 24, 2022, 3:13 p.m. UTC | #1
On Thu, 23 Jun 2022 19:45:00 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> to the PATCH SUMMARY below to find a subset of interest to jump into.
> 
> The series is also posted on the 'preview' branch [1]. Note that branch
> rebases, the tip of that branch at time of posting is:
> 
> 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview

Via a W=1 build some docs are out of sync with parameter names.
I'm lazy so I'll leave finding the right patch to you ;)
drivers/cxl/core/region.c:1490: warning: Function parameter or member 'type' not described in 'devm_cxl_add_region'
drivers/cxl/core/region.c:1719: warning: Function parameter or member 'cxlr' not described in 'devm_cxl_add_pmem_region'
drivers/cxl/core/region.c:1719: warning: Excess function parameter 'host' description in 'devm_cxl_add_pmem_region'

whilst here, docs for generic_nvdimm_flush() need updating to reflect
generic getting added to the name in 2019...

Jonathan
Dan Williams June 24, 2022, 3:32 p.m. UTC | #2
Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:00 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> > to the PATCH SUMMARY below to find a subset of interest to jump into.
> > 
> > The series is also posted on the 'preview' branch [1]. Note that branch
> > rebases, the tip of that branch at time of posting is:
> > 
> > 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> > 
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview
> 
> Via a W=1 build some docs are out of sync with parameter names.
> I'm lazy so I'll leave finding the right patch to you ;)
> drivers/cxl/core/region.c:1490: warning: Function parameter or member 'type' not described in 'devm_cxl_add_region'

Added:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f2a0ead20ca7..f5ca4f811463 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -84,6 +84,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
  * @cxlrd: root decoder
  * @id: memregion id to create
  * @mode: mode for the endpoint decoders of this region
+ * @type: select whether this is an expander or accelerator (type-2 or type-3)
  *
  * This is the second step of region initialization. Regions exist within an
  * address space which is mapped by a @cxlrd.

...to patch 34.

> drivers/cxl/core/region.c:1719: warning: Function parameter or member 'cxlr' not described in 'devm_cxl_add_pmem_region'
> drivers/cxl/core/region.c:1719: warning: Excess function parameter 'host' description in 'devm_cxl_add_pmem_region'

Added:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 808148eef557..fa209fb649f7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1711,8 +1711,8 @@ static void cxlr_pmem_unregister(void *dev)
 }
 
 /**
- * devm_cxl_add_pmem_region() - add a cxl_region to nd_region bridge
- * @host: same host as @cxlmd
+ * devm_cxl_add_pmem_region() - add a cxl_region-to-nd_region bridge
+ * @cxlr: parent CXL region for this pmem region bridge device
  *
  * Return: 0 on success negative error code on failure.
  */

...to patch 46.

> whilst here, docs for generic_nvdimm_flush() need updating to reflect
> generic getting added to the name in 2019...

Sure, but not in this series.
Alison Schofield June 28, 2022, 3:12 a.m. UTC | #3
-snipped everything

These are commit message typos followed by one tidy-up request.

[PATCH 00/46] CXL PMEM Region Provisioning
s/usersapce/userspace
s/mangage/manage

[PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
s/accurracy/accuracy

[PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
s/platfom/platforma

[PATCH 14/46] cxl/hdm: Enumerate allocated DPA
s/provisioining/provisioning
s/comrpised/comprised
s/volaltile-ram/volatile-ram

[PATCH 23/46] tools/testing/cxl: Add partition support
s/mecahinisms/mechanisms

[PATCH 25/46] cxl/port: Record dport in endpoint references
s/endoint/endpoint

[PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
s/userpace/userspace
s/resonsible/responsible

[PATCH 35/46] cxl/region: Add a 'uuid' attribute
s/is operation/its operation

[PATCH 42/46] cxl/hdm: Commit decoder state to hardware
s/base-addres/base-address
s/intereleave/interleave

How about shortening the commit messages of Patch 10 & 11? They make my
git pretty one liner output ugly.
Dan Williams June 28, 2022, 3:34 a.m. UTC | #4
Alison Schofield wrote:
> 
> -snipped everything
> 
> These are commit message typos followed by one tidy-up request.
> 
> [PATCH 00/46] CXL PMEM Region Provisioning
> s/usersapce/userspace
> s/mangage/manage
> 
> [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
> s/accurracy/accuracy
> 
> [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
> s/platfom/platforma
> 
> [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
> s/provisioining/provisioning
> s/comrpised/comprised
> s/volaltile-ram/volatile-ram
> 
> [PATCH 23/46] tools/testing/cxl: Add partition support
> s/mecahinisms/mechanisms
> 
> [PATCH 25/46] cxl/port: Record dport in endpoint references
> s/endoint/endpoint
> 
> [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
> s/userpace/userspace
> s/resonsible/responsible
> 
> [PATCH 35/46] cxl/region: Add a 'uuid' attribute
> s/is operation/its operation
> 
> [PATCH 42/46] cxl/hdm: Commit decoder state to hardware
> s/base-addres/base-address
> s/intereleave/interleave

Thanks!

Wonder why my checkpatch run elided those.

> How about shortening the commit messages of Patch 10 & 11? They make my
> git pretty one liner output ugly.

I'll think about it if the whole series ends up needing a resend, but
changing subjects does confuse b4 version tracking.
Alison Schofield July 2, 2022, 2:26 a.m. UTC | #5
On Thu, Jun 23, 2022 at 07:45:00PM -0700, Dan Williams wrote:
> tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> to the PATCH SUMMARY below to find a subset of interest to jump into.
> 
> The series is also posted on the 'preview' branch [1]. Note that branch
> rebases, the tip of that branch at time of posting is:
> 
> 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview
>
Dan,

I'm seeing these smatch reports while working off of the preview branch.
Perhaps 0-day has already sent these reports aligned to patches. 

drivers/cxl/core/port.c:1482 cxl_decoder_alloc() warn: is 'alloc' large enough for 'struct cxl_root_decoder'? 0

drivers/cxl/core/port.c:1515 cxl_decoder_alloc() error: potentially dereferencing uninitialized 'cxld'.

drivers/cxl/core/hdm.c:457 cxld_set_interleave() error: uninitialized symbol 'eig'.
drivers/cxl/core/hdm.c:458 cxld_set_interleave() error: uninitialized symbol 'eiw'.
drivers/cxl/core/region.c:192 cxl_region_decode_commit() error: uninitialized symbol 'rc'.
drivers/cxl/core/region.c:201 cxl_region_decode_commit() error: uninitialized symbol 'rc'.
drivers/cxl/core/region.c:443 alloc_hpa() error: uninitialized symbol 'res'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'peig'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'peiw'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'eiw'.
drivers/cxl/core/region.c:968 cxl_port_setup_targets() error: uninitialized symbol 'peiw'.
drivers/cxl/core/region.c:969 cxl_port_setup_targets() error: uninitialized symbol 'peig'.
drivers/cxl/core/region.c:1557 create_pmem_region_store() warn: unsigned 'rc' is never less than zero.

> ---
snip
>