Message ID | 20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com |
---|---|
Headers | show |
Series | DCD: Add support for Dynamic Capacity Devices (DCD) | expand |
Please ignore this series __and__ the RESEND. The series did not get sent properly. Something went wrong with my smtp server in the middle. [PATCH v2 22/25] cxl/region: Read existing extents on region creation CRITICAL: Error running /usr/bin/msmtp -i: msmtp: cannot locate host smtpauth.intel.com: No address associated with hostname msmtp: could not send mail (account default from /home/iweiny/.msmtprc) Then I used b4 --resend v2. But glossed over the fact that it was going to do something very bad and send a very old version. https://lore.kernel.org/all/20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com/ So please ignore that too. :-( At this point I'm going to send v3. <fingers crossed> Ira Ira Weiny wrote: > A git tree of this series can be found here: > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-08-15 > > This series requires the CXL memory notifier lock change: > > https://lore.kernel.org/all/20240814-fix-notifiers-v2-1-6bab38192c7c@intel.com/ > > Background > ========== > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > device that allows memory capacity within a region to change > dynamically without the need for resetting the device, reconfiguring > HDM decoders, or reconfiguring software DAX regions. > > One of the biggest use cases for Dynamic Capacity is to allow hosts to > share memory dynamically within a data center without increasing the > per-host attached memory. > > The general flow for the addition or removal of memory is to have an > orchestrator coordinate the use of the memory. Generally there are 5 > actors in such a system, the Orchestrator, Fabric Manager, the Logical > device, the Host Kernel, and a Host User. > > Typical work flows are shown below. > > Orchestrator FM Device Host Kernel Host User > > | | | | | > |-------------- Create region ----------------------->| > | | | | | > | | | |<-- Create ---| > | | | | Region | > |<------------- Signal done --------------------------| > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create --->| > | | | | DAX dev |-- Use memory > | | | | | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > | | | | | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > |-- Remove -->|- Release->|- Release ->| | | > | Capacity | Extent | Extent | | | > | | | | | | > | | | (Release Ignored) | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > | |- Release->|- Release ->| | > | | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | |<- Destroy ---| > | | | | Region | > | | | | | > > Previous versions of this series[0] resulted in architectural comments > as well as confusion on the architecture based on the organization of > patch series itself. > > This version has reordered the patches to clarify the architecture. > It also streamlines extent handling more. > > The series still requires the creation of regions and DAX devices to be > synchronized with the Orchestrator and Fabric Manager. The host kernel > will reject an add extent event if the region is not created yet. It > will also ignore a release if the DAX device is created and referencing > an extent. > > These synchronizations are not anticipated to be an issue with real > applications. > > In order to allow for capacity to be added and removed a new concept of > a sparse DAX region is introduced. A sparse DAX region may have 0 or > more bytes of available space. The total space depends on the number > and size of the extents which have been added. > > Initially it is anticipated that users of the memory will carefully > coordinate the surfacing of additional capacity with the creation of DAX > devices which use that capacity. Therefore, the allocation of the > memory to DAX devices does not allow for specific associations between > DAX device and extent. This keeps allocations very similar to existing > DAX region behavior. > > Great care was taken to keep the extent tracking simple. Some xarray's > needed to be added but extra software objects were kept to a minimum. > > Region extents continue to be tracked as sub-devices of the DAX region. > This ensures that region destruction cleans up all extent allocations > properly. > > Due to these major changes all reviews were removed from the larger > patches. A few of the straight forward patches have kept the tags. > > In summary the major functionality of this series includes: > > - Getting the dynamic capacity (DC) configuration information from cxl > devices > > - Configuring the DC partitions reported by hardware > > - Enhancing the CXL and DAX regions for dynamic capacity support > a. Maintain a logical separation between hardware extents and > software managed region extents. This provides an > abstraction between the layers and should allow for > interleaving in the future > > - Get hardware extent lists for endpoint decoders upon > region creation. > > - Adjust extent/region memory available on the following events. > a. Add capacity Events > b. Release capacity events > > - Host response for add capacity > a. do not accept the extent if: > If the region does not exist > or an error occurs realizing the extent > b. If the region does exist > realize a DAX region extent with 1:1 mapping (no > interleave yet) > c. Support the more bit by processing a list of extents marked > with the more bit together before setting up a response. > > - Host response for remove capacity > a. If no DAX device references the extent; release the extent > b. If a reference does exist, ignore the request. > (Require FM to issue release again.) > > - Modify DAX device creation/resize to account for extents within a > sparse DAX region > > - Trace Dynamic Capacity events for debugging > > - Add cxl-test infrastructure to allow for faster unit testing > (See new ndctl branch for cxl-dcd.sh test[1]) > > Fan Ni's upstream of Qemu DCD was used for testing. > > Remaining work: > > 1) Integrate the QoS work from Dave Jiang > 2) Interleave support > > Possible additional work depending on requirements: > > 1) Allow mapping to specific extents (perhaps based on > label/tag) > 2) Release extents when DAX devices are released if a release > was previously seen from the device > 3) Accept a new extent which extends (but overlaps) an existing > extent(s) > 4) Rework DAX device interfaces, memfd has been explored a bit > > [0] v1: https://lore.kernel.org/all/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/ > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-08-15 > > --- > Major changes: > - Jonathan: support the more bit > - djbw: Allow more than 1 region per DC partition > - All: Address the many comments on the series. > - iweiny: rebase > - iweiny: Rework the series to make it easier to review and understand > the flow > - Link to v1: https://lore.kernel.org/r/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com > > --- > Ira Weiny (13): > range: Add range_overlaps() > printk: Add print format (%par) for struct range > dax: Document dax dev range tuple > cxl/pci: Delay event buffer allocation > cxl/region: Refactor common create region code > cxl/events: Split event msgnum configuration from irq setup > cxl/pci: Factor out interrupt policy check > cxl/core: Return endpoint decoder information from region search > dax/bus: Factor out dev dax resize logic > dax/region: Create resources on sparse DAX regions > cxl/region: Read existing extents on region creation > tools/testing/cxl: Make event logs dynamic > tools/testing/cxl: Add DC Regions to mock mem data > > Navneet Singh (12): > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > cxl/mem: Read dynamic capacity configuration from the device > cxl/core: Separate region mode from decoder mode > cxl/region: Add dynamic capacity decoder and region modes > cxl/hdm: Add dynamic capacity size support to endpoint decoders > cxl/port: Add endpoint decoder DC mode support to sysfs > cxl/mem: Expose DCD partition capabilities in sysfs > cxl/region: Add sparse DAX region support > cxl/mem: Configure dynamic capacity interrupts > cxl/extent: Process DCD events and realize region extents > cxl/region/extent: Expose region extent information in sysfs > cxl/mem: Trace Dynamic capacity Event Record > > Documentation/ABI/testing/sysfs-bus-cxl | 68 ++- > Documentation/core-api/printk-formats.rst | 14 + > drivers/cxl/core/Makefile | 2 +- > drivers/cxl/core/core.h | 33 +- > drivers/cxl/core/extent.c | 467 ++++++++++++++ > drivers/cxl/core/hdm.c | 206 ++++++- > drivers/cxl/core/mbox.c | 578 +++++++++++++++++- > drivers/cxl/core/memdev.c | 101 ++- > drivers/cxl/core/port.c | 13 +- > drivers/cxl/core/region.c | 173 ++++-- > drivers/cxl/core/trace.h | 65 ++ > drivers/cxl/cxl.h | 122 +++- > drivers/cxl/cxlmem.h | 128 +++- > drivers/cxl/pci.c | 123 +++- > drivers/dax/bus.c | 352 +++++++++-- > drivers/dax/bus.h | 4 +- > drivers/dax/cxl.c | 73 ++- > drivers/dax/dax-private.h | 39 +- > drivers/dax/hmem/hmem.c | 2 +- > drivers/dax/pmem.c | 2 +- > fs/btrfs/ordered-data.c | 10 +- > include/linux/cxl-event.h | 32 + > include/linux/range.h | 7 + > lib/vsprintf.c | 37 ++ > tools/testing/cxl/Kbuild | 3 +- > tools/testing/cxl/test/mem.c | 981 ++++++++++++++++++++++++++---- > 26 files changed, 3327 insertions(+), 308 deletions(-) > --- > base-commit: 3cef9316df4cda21b5bf25e4230221b02050dfa1 > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > Best regards, > -- > Ira Weiny <ira.weiny@intel.com> >