Message ID | 20250413-dcd-type2-upstream-v9-0-1d4911a0b365@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | DCD: Add support for Dynamic Capacity Devices (DCD) | expand |
On Sun, Apr 13, 2025 at 05:52:08PM -0500, Ira Weiny wrote: > A git tree of this series can be found here: > > https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13 > > This is now based on 6.15-rc2. > > Due to the stagnation of solid requirements for users of DCD I do not > plan to rev this work in Q2 of 2025 and possibly beyond. > > It is anticipated that this will support at least the initial > implementation of DCD devices, if and when they appear in the ecosystem. > The patch set should be reviewed with the limited set of functionality in > mind. Additional functionality can be added as devices support them. > > It is strongly encouraged for individuals or companies wishing to bring > DCD devices to market review this set with the customer use cases they > have in mind. Hi Ira, thanks for sending it out. I have not got a chance to check the code or test it extensively. I tried to test one specific case and hit issue. I tried to add some DC extents to the extent list on the device when the VM is launched by hacking qemu like below, diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 87fa308495..4049fc8dd9 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -826,6 +826,11 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp) QTAILQ_INIT(&ct3d->dc.extents); QTAILQ_INIT(&ct3d->dc.extents_pending); + cxl_insert_extent_to_extent_list(&ct3d->dc.extents, 0, + CXL_CAPACITY_MULTIPLIER, NULL, 0); + ct3d->dc.total_extent_count = 1; + ct3_set_region_block_backed(ct3d, 0, CXL_CAPACITY_MULTIPLIER); + return true; } Then after the VM is launched, I tried to create a DC region with commmand: cxl create-region -m mem0 -d decoder0.0 -s 1G -t dynamic_ram_a. It works fine. As you can see below, the region is created and the extent is showing correctly. root@debian:~# cxl list -r region0 -N [ { "region":"region0", "resource":79725330432, "size":1073741824, "interleave_ways":1, "interleave_granularity":256, "decode_state":"commit", "extents":[ { "offset":0, "length":268435456, "uuid":"00000000-0000-0000-0000-000000000000" } ] } ] However, after that, I tried to create a dax device as below, it failed. root@debian:~# daxctl create-device -r region0 -v libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax error creating devices: No such device or address created 0 devices root@debian:~# root@debian:~# ls /sys/class/dax ls: cannot access '/sys/class/dax': No such file or directory The dmesg shows the really_probe function returns early as resource presents before probe as below, [ 1745.505068] cxl_core:devm_cxl_add_dax_region:3251: cxl_region region0: region0: register dax_region0 [ 1745.506063] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x4801 [ 1745.506953] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms [ 1745.507911] cxl_core:__cxl_process_extent_list:1802: cxl_pci 0000:0d:00.0: Got extent list 0-0 of 1 generation Num:0 [ 1745.508958] cxl_core:__cxl_process_extent_list:1815: cxl_pci 0000:0d:00.0: Processing extent 0/1 [ 1745.509843] cxl_core:cxl_validate_extent:975: cxl_pci 0000:0d:00.0: DC extent DPA [range 0x0000000000000000-0x000000000fffffff] (DCR:[range 0x0000000000000000-0x000000007fffffff])(00000000-0000-0000-0000-000000000000) [ 1745.511748] cxl_core:__cxl_dpa_to_region:2869: cxl decoder2.0: dpa:0x0 mapped in region:region0 [ 1745.512626] cxl_core:cxl_add_extent:460: cxl decoder2.0: Checking ED ([mem 0x00000000-0x3fffffff flags 0x80000200]) for extent [range 0x0000000000000000-0x000000000fffffff] [ 1745.514143] cxl_core:cxl_add_extent:492: cxl decoder2.0: Add extent [range 0x0000000000000000-0x000000000fffffff] (00000000-0000-0000-0000-000000000000) [ 1745.515485] cxl_core:online_region_extent:176: extent0.0: region extent HPA [range 0x0000000000000000-0x000000000fffffff] [ 1745.516576] cxl_core:cxlr_notify_extent:285: cxl dax_region0: Trying notify: type 0 HPA [range 0x0000000000000000-0x000000000fffffff] [ 1745.517768] cxl_core:cxl_bus_probe:2087: cxl_region region0: probe: 0 [ 1745.524984] cxl dax_region0: Resources present before probing btw, I hit the same issue with the previous verson also. Fan > > Series info > =========== > > This series has 2 parts: > > Patch 1-17: Core DCD support > Patch 18-19: cxl_test support > > Background > ========== > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > device that allows memory capacity within a region to change > dynamically without the need for resetting the device, reconfiguring > HDM decoders, or reconfiguring software DAX regions. > > One of the biggest anticipated use cases for Dynamic Capacity is to > allow hosts to dynamically add or remove memory from a host within a > data center without physically changing the per-host attached memory nor > rebooting the host. > > The general flow for the addition or removal of memory is to have an > orchestrator coordinate the use of the memory. Generally there are 5 > actors in such a system, the Orchestrator, Fabric Manager, the Logical > device, the Host Kernel, and a Host User. > > An example work flow is shown below. > > Orchestrator FM Device Host Kernel Host User > > | | | | | > |-------------- Create region ------------------------>| > | | | | | > | | | |<-- Create ----| > | | | | Region | > | | | |(dynamic_ram_a)| > |<------------- Signal done ---------------------------| > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create ---->| > | | | | DAX dev |-- Use memory > | | | | | | > | | | | | | > | | | |<- Release ----| <-+ > | | | | DAX dev | > | | | | | > |<------------- Signal done ---------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create -----| > | | | | DAX dev |-- Use memory > | | | | | | > | | | |<- Release ----| <-+ > | | | | DAX dev | > |<------------- Signal done ---------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | |<- Create -----| > | | | | DAX dev |-- Use memory > | | | | | | > |-- Remove -->|- Release->|- Release ->| | | > | Capacity | Extent | Extent | | | > | | | | | | > | | | (Release Ignored) | | > | | | | | | > | | | |<- Release ----| <-+ > | | | | DAX dev | > |<------------- Signal done ---------------------------| > | | | | | > | |- Release->|- Release ->| | > | | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | |<- Destroy ----| > | | | | Region | > | | | | | > > Implementation > ============== > > This series requires the creation of regions and DAX devices to be > closely synchronized with the Orchestrator and Fabric Manager. The host > kernel will reject extents if a region is not yet created. It also > ignores extent release if memory is in use (DAX device created). These > synchronizations are not anticipated to be an issue with real > applications. > > Only a single dynamic ram partition is supported (dynamic_ram_a). The > requirements, use cases, and existence of actual hardware devices to > support more than one DC partition is unknown at this time. So a less > complex implementation was chosen. > > In order to allow for capacity to be added and removed a new concept of > a sparse DAX region is introduced. A sparse DAX region may have 0 or > more bytes of available space. The total space depends on the number > and size of the extents which have been added. > > It is anticipated that users of the memory will carefully coordinate the > surfacing of capacity with the creation of DAX devices which use that > capacity. Therefore, the allocation of the memory to DAX devices does > not allow for specific associations between DAX device and extent. This > keeps allocations of DAX devices similar to existing DAX region > behavior. > > To keep the DAX memory allocation aligned with the existing DAX devices > which do not have tags, extents are not allowed to have tags in this > implementation. Future support for tags can be added when real use > cases surface. > > Great care was taken to keep the extent tracking simple. Some xarray's > needed to be added but extra software objects are kept to a minimum. > > Region extents are tracked as sub-devices of the DAX region. This > ensures that region destruction cleans up all extent allocations > properly. > > The major functionality of this series includes: > > - Getting the dynamic capacity (DC) configuration information from cxl > devices > > - Configuring a DC partition found in hardware. > > - Enhancing the CXL and DAX regions for dynamic capacity support > a. Maintain a logical separation between hardware extents and > software managed extents. This provides an abstraction > between the layers and should allow for interleaving in the > future > > - Get existing hardware extent lists for endpoint decoders upon region > creation. > > - Respond to DC capacity events and adjust available region memory. > a. Add capacity Events > b. Release capacity events > > - Host response for add capacity > a. do not accept the extent if: > If the region does not exist > or an error occurs realizing the extent > b. If the region does exist > realize a DAX region extent with 1:1 mapping (no > interleave yet) > c. Support the event more bit by processing a list of extents > marked with the more bit together before setting up a > response. > > - Host response for remove capacity > a. If no DAX device references the extent; release the extent > b. If a reference does exist, ignore the request. > (Require FM to issue release again.) > c. Release extents flagged with the 'more' bit individually as > the specification allows for the asynchronous release of > memory and the implementation is simplified by doing so. > > - Modify DAX device creation/resize to account for extents within a > sparse DAX region > > - Trace Dynamic Capacity events for debugging > > - Add cxl-test infrastructure to allow for faster unit testing > (See new ndctl branch for cxl-dcd.sh test[1]) > > - Only support 0 value extent tags > > Fan Ni's upstream of Qemu DCD was used for testing. > > Remaining work: > > 1) Allow mapping to specific extents (perhaps based on > label/tag) > 1a) devise region size reporting based on tags > 2) Interleave support > > Possible additional work depending on requirements: > > 1) Accept a new extent which extends (but overlaps) already > accepted extent(s) > 2) Rework DAX device interfaces, memfd has been explored a bit > 3) Support more than 1 DC partition > > [1] https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13 > > --- > Changes in v9: > - djbw: pare down support to only a single DC parition > - djbw: adjust to the new core partition processing which aligns with > new type2 work. > - iweiny: address smaller comments from v8 > - iweiny: rebase off of 6.15-rc1 > - Link to v8: https://patch.msgid.link/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com > > --- > Ira Weiny (19): > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > cxl/mem: Read dynamic capacity configuration from the device > cxl/cdat: Gather DSMAS data for DCD partitions > cxl/core: Enforce partition order/simplify partition calls > cxl/mem: Expose dynamic ram A partition in sysfs > cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode > cxl/region: Add sparse DAX region support > cxl/events: Split event msgnum configuration from irq setup > cxl/pci: Factor out interrupt policy check > cxl/mem: Configure dynamic capacity interrupts > cxl/core: Return endpoint decoder information from region search > cxl/extent: Process dynamic partition events and realize region extents > cxl/region/extent: Expose region extent information in sysfs > dax/bus: Factor out dev dax resize logic > dax/region: Create resources on sparse DAX regions > cxl/region: Read existing extents on region creation > cxl/mem: Trace Dynamic capacity Event Record > tools/testing/cxl: Make event logs dynamic > tools/testing/cxl: Add DC Regions to mock mem data > > Documentation/ABI/testing/sysfs-bus-cxl | 100 ++- > drivers/cxl/core/Makefile | 2 +- > drivers/cxl/core/cdat.c | 11 + > drivers/cxl/core/core.h | 33 +- > drivers/cxl/core/extent.c | 495 +++++++++++++++ > drivers/cxl/core/hdm.c | 13 +- > drivers/cxl/core/mbox.c | 632 ++++++++++++++++++- > drivers/cxl/core/memdev.c | 87 ++- > drivers/cxl/core/port.c | 5 + > drivers/cxl/core/region.c | 76 ++- > drivers/cxl/core/trace.h | 65 ++ > drivers/cxl/cxl.h | 61 +- > drivers/cxl/cxlmem.h | 134 +++- > drivers/cxl/mem.c | 2 +- > drivers/cxl/pci.c | 115 +++- > drivers/dax/bus.c | 356 +++++++++-- > drivers/dax/bus.h | 4 +- > drivers/dax/cxl.c | 71 ++- > drivers/dax/dax-private.h | 40 ++ > drivers/dax/hmem/hmem.c | 2 +- > drivers/dax/pmem.c | 2 +- > include/cxl/event.h | 31 + > include/linux/ioport.h | 3 + > tools/testing/cxl/Kbuild | 3 +- > tools/testing/cxl/test/mem.c | 1021 +++++++++++++++++++++++++++---- > 25 files changed, 3102 insertions(+), 262 deletions(-) > --- > base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > Best regards, > -- > Ira Weiny <ira.weiny@intel.com> >
On Sun, 13 Apr 2025 17:52:08 -0500 Ira Weiny <ira.weiny@intel.com> wrote: > A git tree of this series can be found here: > > https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13 > > This is now based on 6.15-rc2. Hi Ira, Firstly thanks for the update and your hard work driving this forwards. > > Due to the stagnation of solid requirements for users of DCD I do not > plan to rev this work in Q2 of 2025 and possibly beyond. Hopefully there will be limited need to make changes (it looks pretty good to me - we'll run a bunch of tests though which I haven't done yet). I do have reason to want this code upstream and it is now simple enough that I hope it is not controversial. Let's discuss path forwards on the sync call tomorrow as I'm sure I'm not the only one. If needed I'm fine picking up the baton to keep this moving forwards (I'm even more happy to let someone else step up though!) To me we don't need to answer the question of whether we fully understand requirements, or whether this support covers them, but rather to ask if anyone has requirements that are not sensible to satisfy with additional work building on this? I'm not aware of any such blocker. For the things I care about the path forwards looks fine (particularly tagged capacity and sharing). > > It is anticipated that this will support at least the initial > implementation of DCD devices, if and when they appear in the ecosystem. > The patch set should be reviewed with the limited set of functionality in > mind. Additional functionality can be added as devices support them. Personally I think that's a chicken and egg problem but fully understand the desire to keep things simple in the short term. Getting initial DCD support in will help reduce the response (that I frequently hear) of 'the ecosystem isn't ready, let's leave that for a generation'. > > It is strongly encouraged for individuals or companies wishing to bring > DCD devices to market review this set with the customer use cases they > have in mind. > Absolutely. I can't share anything about devices at this time but you can read whatever you want into my willingness to help get this (and a bunch of things built on top of it) over the line. > Remaining work: > > 1) Allow mapping to specific extents (perhaps based on > label/tag) > 1a) devise region size reporting based on tags > 2) Interleave support I'd maybe label these as 'additional possible future features'. Personally I'm doubtful that hardware interleave of DCD is a short term feature and it definitely doesn't have to be there for this to be useful. Tags will matter but that is a 'next step' that this series does not seem to hinder. > > Possible additional work depending on requirements: > > 1) Accept a new extent which extends (but overlaps) already > accepted extent(s) > 2) Rework DAX device interfaces, memfd has been explored a bit > 3) Support more than 1 DC partition > > [1] https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13 Thanks, Jonathan
Fan Ni wrote: > On Sun, Apr 13, 2025 at 05:52:08PM -0500, Ira Weiny wrote: > > A git tree of this series can be found here: > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13 > > > > This is now based on 6.15-rc2. > > > > Due to the stagnation of solid requirements for users of DCD I do not > > plan to rev this work in Q2 of 2025 and possibly beyond. > > > > It is anticipated that this will support at least the initial > > implementation of DCD devices, if and when they appear in the ecosystem. > > The patch set should be reviewed with the limited set of functionality in > > mind. Additional functionality can be added as devices support them. > > > > It is strongly encouraged for individuals or companies wishing to bring > > DCD devices to market review this set with the customer use cases they > > have in mind. > > Hi Ira, > thanks for sending it out. > > I have not got a chance to check the code or test it extensively. > > I tried to test one specific case and hit issue. > > I tried to add some DC extents to the extent list on the device when the > VM is launched by hacking qemu like below, > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c > index 87fa308495..4049fc8dd9 100644 > --- a/hw/mem/cxl_type3.c > +++ b/hw/mem/cxl_type3.c > @@ -826,6 +826,11 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp) > QTAILQ_INIT(&ct3d->dc.extents); > QTAILQ_INIT(&ct3d->dc.extents_pending); > > + cxl_insert_extent_to_extent_list(&ct3d->dc.extents, 0, > + CXL_CAPACITY_MULTIPLIER, NULL, 0); > + ct3d->dc.total_extent_count = 1; > + ct3_set_region_block_backed(ct3d, 0, CXL_CAPACITY_MULTIPLIER); > + > return true; > } > > > Then after the VM is launched, I tried to create a DC region with > commmand: cxl create-region -m mem0 -d decoder0.0 -s 1G -t > dynamic_ram_a. > > It works fine. As you can see below, the region is created and the > extent is showing correctly. > > root@debian:~# cxl list -r region0 -N > [ > { > "region":"region0", > "resource":79725330432, > "size":1073741824, > "interleave_ways":1, > "interleave_granularity":256, > "decode_state":"commit", > "extents":[ > { > "offset":0, > "length":268435456, > "uuid":"00000000-0000-0000-0000-000000000000" > } > ] > } > ] > > > However, after that, I tried to create a dax device as below, it failed. > > root@debian:~# daxctl create-device -r region0 -v > libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax > error creating devices: No such device or address > created 0 devices > root@debian:~# > > root@debian:~# ls /sys/class/dax > ls: cannot access '/sys/class/dax': No such file or directory Have you update daxctl with cxl-cli? I was confused by this lack of /sys/class/dax and checked with Vishal. He says this is legacy. I have /sys/bus/dax and that works fine for me with the latest daxctl built from the ndctl code I sent out: https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13 Could you build and use the executables from that version? Ira > > The dmesg shows the really_probe function returns early as resource > presents before probe as below, > > [ 1745.505068] cxl_core:devm_cxl_add_dax_region:3251: cxl_region region0: region0: register dax_region0 > [ 1745.506063] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x4801 > [ 1745.506953] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms > [ 1745.507911] cxl_core:__cxl_process_extent_list:1802: cxl_pci 0000:0d:00.0: Got extent list 0-0 of 1 generation Num:0 > [ 1745.508958] cxl_core:__cxl_process_extent_list:1815: cxl_pci 0000:0d:00.0: Processing extent 0/1 > [ 1745.509843] cxl_core:cxl_validate_extent:975: cxl_pci 0000:0d:00.0: DC extent DPA [range 0x0000000000000000-0x000000000fffffff] (DCR:[range 0x0000000000000000-0x000000007fffffff])(00000000-0000-0000-0000-000000000000) > [ 1745.511748] cxl_core:__cxl_dpa_to_region:2869: cxl decoder2.0: dpa:0x0 mapped in region:region0 > [ 1745.512626] cxl_core:cxl_add_extent:460: cxl decoder2.0: Checking ED ([mem 0x00000000-0x3fffffff flags 0x80000200]) for extent [range 0x0000000000000000-0x000000000fffffff] > [ 1745.514143] cxl_core:cxl_add_extent:492: cxl decoder2.0: Add extent [range 0x0000000000000000-0x000000000fffffff] (00000000-0000-0000-0000-000000000000) > [ 1745.515485] cxl_core:online_region_extent:176: extent0.0: region extent HPA [range 0x0000000000000000-0x000000000fffffff] > [ 1745.516576] cxl_core:cxlr_notify_extent:285: cxl dax_region0: Trying notify: type 0 HPA [range 0x0000000000000000-0x000000000fffffff] > [ 1745.517768] cxl_core:cxl_bus_probe:2087: cxl_region region0: probe: 0 > [ 1745.524984] cxl dax_region0: Resources present before probing > > > btw, I hit the same issue with the previous verson also. > > Fan [snip]
On Mon, Apr 14, 2025 at 09:37:02PM -0500, Ira Weiny wrote: > Fan Ni wrote: > > On Sun, Apr 13, 2025 at 05:52:08PM -0500, Ira Weiny wrote: > > > A git tree of this series can be found here: > > > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13 > > > > > > This is now based on 6.15-rc2. > > > > > > Due to the stagnation of solid requirements for users of DCD I do not > > > plan to rev this work in Q2 of 2025 and possibly beyond. > > > > > > It is anticipated that this will support at least the initial > > > implementation of DCD devices, if and when they appear in the ecosystem. > > > The patch set should be reviewed with the limited set of functionality in > > > mind. Additional functionality can be added as devices support them. > > > > > > It is strongly encouraged for individuals or companies wishing to bring > > > DCD devices to market review this set with the customer use cases they > > > have in mind. > > > > Hi Ira, > > thanks for sending it out. > > > > I have not got a chance to check the code or test it extensively. > > > > I tried to test one specific case and hit issue. > > > > I tried to add some DC extents to the extent list on the device when the > > VM is launched by hacking qemu like below, > > > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c > > index 87fa308495..4049fc8dd9 100644 > > --- a/hw/mem/cxl_type3.c > > +++ b/hw/mem/cxl_type3.c > > @@ -826,6 +826,11 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp) > > QTAILQ_INIT(&ct3d->dc.extents); > > QTAILQ_INIT(&ct3d->dc.extents_pending); > > > > + cxl_insert_extent_to_extent_list(&ct3d->dc.extents, 0, > > + CXL_CAPACITY_MULTIPLIER, NULL, 0); > > + ct3d->dc.total_extent_count = 1; > > + ct3_set_region_block_backed(ct3d, 0, CXL_CAPACITY_MULTIPLIER); > > + > > return true; > > } > > > > > > Then after the VM is launched, I tried to create a DC region with > > commmand: cxl create-region -m mem0 -d decoder0.0 -s 1G -t > > dynamic_ram_a. > > > > It works fine. As you can see below, the region is created and the > > extent is showing correctly. > > > > root@debian:~# cxl list -r region0 -N > > [ > > { > > "region":"region0", > > "resource":79725330432, > > "size":1073741824, > > "interleave_ways":1, > > "interleave_granularity":256, > > "decode_state":"commit", > > "extents":[ > > { > > "offset":0, > > "length":268435456, > > "uuid":"00000000-0000-0000-0000-000000000000" > > } > > ] > > } > > ] > > > > > > However, after that, I tried to create a dax device as below, it failed. > > > > root@debian:~# daxctl create-device -r region0 -v > > libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax > > error creating devices: No such device or address > > created 0 devices > > root@debian:~# > > > > root@debian:~# ls /sys/class/dax > > ls: cannot access '/sys/class/dax': No such file or directory > > Have you update daxctl with cxl-cli? > > I was confused by this lack of /sys/class/dax and checked with Vishal. He > says this is legacy. > > I have /sys/bus/dax and that works fine for me with the latest daxctl > built from the ndctl code I sent out: > > https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13 > > Could you build and use the executables from that version? > > Ira That is my setup. root@debian:~# cxl list -r region0 -N [ { "region":"region0", "resource":79725330432, "size":2147483648, "interleave_ways":1, "interleave_granularity":256, "decode_state":"commit", "extents":[ { "offset":0, "length":268435456, "uuid":"00000000-0000-0000-0000-000000000000" } ] } ] root@debian:~# cd ndctl/ root@debian:~/ndctl# git branch * dcd-region3-2025-04-13 root@debian:~/ndctl# ./build/daxctl/daxctl create-device -r region0 -v libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax error creating devices: No such device or address created 0 devices root@debian:~/ndctl# cat .git/config [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = https://github.com/weiny2/ndctl.git fetch = +refs/heads/dcd-region3-2025-04-13:refs/remotes/origin/dcd-region3-2025-04-13 [branch "dcd-region3-2025-04-13"] remote = origin merge = refs/heads/dcd-region3-2025-04-13 Fan > > > > > The dmesg shows the really_probe function returns early as resource > > presents before probe as below, > > > > [ 1745.505068] cxl_core:devm_cxl_add_dax_region:3251: cxl_region region0: region0: register dax_region0 > > [ 1745.506063] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x4801 > > [ 1745.506953] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms > > [ 1745.507911] cxl_core:__cxl_process_extent_list:1802: cxl_pci 0000:0d:00.0: Got extent list 0-0 of 1 generation Num:0 > > [ 1745.508958] cxl_core:__cxl_process_extent_list:1815: cxl_pci 0000:0d:00.0: Processing extent 0/1 > > [ 1745.509843] cxl_core:cxl_validate_extent:975: cxl_pci 0000:0d:00.0: DC extent DPA [range 0x0000000000000000-0x000000000fffffff] (DCR:[range 0x0000000000000000-0x000000007fffffff])(00000000-0000-0000-0000-000000000000) > > [ 1745.511748] cxl_core:__cxl_dpa_to_region:2869: cxl decoder2.0: dpa:0x0 mapped in region:region0 > > [ 1745.512626] cxl_core:cxl_add_extent:460: cxl decoder2.0: Checking ED ([mem 0x00000000-0x3fffffff flags 0x80000200]) for extent [range 0x0000000000000000-0x000000000fffffff] > > [ 1745.514143] cxl_core:cxl_add_extent:492: cxl decoder2.0: Add extent [range 0x0000000000000000-0x000000000fffffff] (00000000-0000-0000-0000-000000000000) > > [ 1745.515485] cxl_core:online_region_extent:176: extent0.0: region extent HPA [range 0x0000000000000000-0x000000000fffffff] > > [ 1745.516576] cxl_core:cxlr_notify_extent:285: cxl dax_region0: Trying notify: type 0 HPA [range 0x0000000000000000-0x000000000fffffff] > > [ 1745.517768] cxl_core:cxl_bus_probe:2087: cxl_region region0: probe: 0 > > [ 1745.524984] cxl dax_region0: Resources present before probing > > > > > > btw, I hit the same issue with the previous verson also. > > > > Fan > > [snip]
Ira Weiny wrote: [..] > > However, after that, I tried to create a dax device as below, it failed. > > > > root@debian:~# daxctl create-device -r region0 -v > > libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax Note that /sys/class/dax support was removed from the kernel back in v5.17: 83762cb5c7c4 dax: Kill DEV_DAX_PMEM_COMPAT daxctl still supports pre-v5.17 kernels and always checks both subsystem types. This is a debug message just confirming that it is running on a new kernel, see dax_regions_init() in daxctl. > > error creating devices: No such device or address > > created 0 devices > > root@debian:~# > > > > root@debian:~# ls /sys/class/dax > > ls: cannot access '/sys/class/dax': No such file or directory > > Have you update daxctl with cxl-cli? > > I was confused by this lack of /sys/class/dax and checked with Vishal. He > says this is legacy. > > I have /sys/bus/dax and that works fine for me with the latest daxctl > built from the ndctl code I sent out: > > https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13 > > Could you build and use the executables from that version? The same debug message still exists in that version and will fire every time when debug is enabled.
Jonathan Cameron wrote: [..] > To me we don't need to answer the question of whether we fully understand > requirements, or whether this support covers them, but rather to ask > if anyone has requirements that are not sensible to satisfy with additional > work building on this? Wearing only my upstream kernel development hat, the question for merging is "what is the end user visible impact of merging this?". As long as DCD remains in proof-of-concept mode then leave the code out of tree until it is ready to graduate past that point. Same held for HDM-D support which was an out-of-tree POC until Alejandro arrived with the SFC consumer. DCD is joined by HDM-DB (awaiting an endpoint) and CXL Error Isolation (awaiting a production consumer) as solutions that have time to validate that the ecosystem is indeed graduating to consume them. There was no "chicken-egg" paradox for the ecosystem to deliver base static-memory-expander CXL support. The ongoing failure to get productive engagement on just how ruthlessly simple the implementation could be and still meet planned usages continues to give the impression that Linux is way out in front of hardware here. Uncomfortably so.
On Mon, 14 Apr 2025 21:50:31 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > Jonathan Cameron wrote: > [..] > > To me we don't need to answer the question of whether we fully understand > > requirements, or whether this support covers them, but rather to ask > > if anyone has requirements that are not sensible to satisfy with additional > > work building on this? > > Wearing only my upstream kernel development hat, the question for > merging is "what is the end user visible impact of merging this?". As > long as DCD remains in proof-of-concept mode then leave the code out of > tree until it is ready to graduate past that point. Hi Dan, Seems like we'll have to disagree on this. The only thing I can therefore do is help to keep this patch set in a 'ready to go' state. I would ask that people review it with that in mind so that we can merge it the day someone is willing to announce a product which is a lot more about marketing decisions than anything technical. Note that will be far too late for distro cycles so distro folk may have to pick up the fork (which they will hate). Hopefully that 'fork' will provide a base on which we can build the next set of key features. > > Same held for HDM-D support which was an out-of-tree POC until > Alejandro arrived with the SFC consumer. Obviously I can't comment on status of that hardware! > > DCD is joined by HDM-DB (awaiting an endpoint) and CXL Error Isolation > (awaiting a production consumer) as solutions that have time to validate > that the ecosystem is indeed graduating to consume them. Those I'm fine with waiting on, though obviously others may not be! > There was no > "chicken-egg" paradox for the ecosystem to deliver base > static-memory-expander CXL support. That is (at least partly) because the ecosystem for those was initially BIOS only. That's not true for DCD. So people built devices on basis they didn't need any kernel support. Lots of disadvantages to that but it's what happened. As a side note, I'd much rather that path had never been there as it is continuing to make a mess for Gregory and others. > > The ongoing failure to get productive engagement on just how ruthlessly > simple the implementation could be and still meet planned usages > continues to give the impression that Linux is way out in front of > hardware here. Uncomfortably so. I'll keep pushing for others to engage with this. I also have on my list writing a document on the future of DCD and proposing at least one way to add all features on that roadmap. A major intent of that being to show that there is no blocker to what we have here. I.e. we can extend it in a logical fashion to exactly what is needed. Reality is I cannot say anything about unannounced products. Whilst some companies will talk about stuff well ahead of hardware being ready for customers we do not do that (normally we announce long after customers have it.) Hence it seems I have no way to get this upstream other than hope someone else has a more flexible policy. Jonathan
Jonathan Cameron wrote: > On Mon, 14 Apr 2025 21:50:31 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > Jonathan Cameron wrote: > > [..] > > > To me we don't need to answer the question of whether we fully understand > > > requirements, or whether this support covers them, but rather to ask > > > if anyone has requirements that are not sensible to satisfy with additional > > > work building on this? > > > > Wearing only my upstream kernel development hat, the question for > > merging is "what is the end user visible impact of merging this?". As > > long as DCD remains in proof-of-concept mode then leave the code out of > > tree until it is ready to graduate past that point. > > Hi Dan, > > Seems like we'll have to disagree on this. The only thing I can > therefore do is help to keep this patch set in a 'ready to go' state. > > I would ask that people review it with that in mind so that we can > merge it the day someone is willing to announce a product which > is a lot more about marketing decisions than anything technical. > Note that will be far too late for distro cycles so distro folk > may have to pick up the fork (which they will hate). This is overstated. Distros say "no" to supporting even *shipping* hardware when there is insufficient customer pull through. If none of the distros' customers can get their hands on DCD hardware that contraindicates merge and distro intercept decisions. > Hopefully that 'fork' will provide a base on which we can build > the next set of key features. They are only key features when the adoption approaches inevitability. The LSF/MM discussions around the ongoing challenges of managing disparate performance memory pools still has me uneasy about whether Linux yet has the right ABI in hand for dedicated-memory. What folks seems to want is an anon-only memory provider that does not ever leak into kernel allocations, and optionally a filesystem abstraction to provide file backed allocation of dedicate memory. What they do not want is to teach their applications anything beyond "malloc()" for anon. [..] > That is (at least partly) because the ecosystem for those was initially BIOS > only. That's not true for DCD. So people built devices on basis they didn't > need any kernel support. Lots of disadvantages to that but it's what happened. > As a side note, I'd much rather that path had never been there as it is > continuing to make a mess for Gregory and others. The mess is driven by insufficient communication between platform firmware implementations and Linux expectations. That is a tractable problem.