Message ID | 20240223143723.1574-1-shiju.jose@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | memory: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers | expand |
On Fri, Feb 23, 2024 at 10:37:11PM +0800, shiju.jose@huawei.com wrote: > CXL patrol scrub and DDR5 ECS and ACPI RAS2 HW based memory > patrol scrub features are added as use cases for the scrub > subsystem to expose the scrub controls to the user. WTH is a RAS2 driver and where is the RAS1 driver?
On Fri, 23 Feb 2024 16:42:51 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Fri, Feb 23, 2024 at 10:37:11PM +0800, shiju.jose@huawei.com wrote: > > CXL patrol scrub and DDR5 ECS and ACPI RAS2 HW based memory > > patrol scrub features are added as use cases for the scrub > > subsystem to expose the scrub controls to the user. > > WTH is a RAS2 driver and where is the RAS1 driver? > Odd ACPI naming. RASF (which is very limited) was replaced by RAS2. Both are ACPI tables that describe the Platform Communication Channels (mailbox) + provide a definition of the protocol that is sent over it in order to control memory scrub. RAS2 has some other uses this driver isn't touching on. So in short, it's an ACPI spec defined interface for controlling Memory Scrub engines. https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#acpi-ras2-feature-table-ras2 Jonathan
On Fri, Feb 23, 2024 at 04:25:51PM +0000, Jonathan Cameron wrote: > So in short, it's an ACPI spec defined interface for controlling > Memory Scrub engines. Ok, let's call it acpi_ras2 then so that it is clear. We shouldn't be perpetuating the ACPI poor naming in the kernel. This way at least it is namespaced and says exactly where it is coming from. Thx.
[AMD Official Use Only - General] BTW, what is the encoding (units) of the 8-bit "scrub rate"? (the ACPI spec does not seem to define that detail) Thanks, Leo. -----Original Message----- From: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Sent: Friday, February 23, 2024 11:26 AM To: Borislav Petkov <bp@alien8.de> Cc: shiju.jose@huawei.com; linux-cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com; dave@stgolabs.net; dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; david@redhat.com; Sridharan, Vilas <Vilas.Sridharan@amd.com>; Duran, Leo <leo.duran@amd.com>; Ghannam, Yazen <Yazen.Ghannam@amd.com>; rientjes@google.com; jiaqiyan@google.com; tony.luck@intel.com; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; wschwartz@amperecomputing.com; dferguson@amperecomputing.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com; kangkang.shen@futurewei.com; wanghuiqiang@huawei.com; linuxarm@huawei.com Subject: Re: [RFC PATCH v7 00/12] memory: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers On Fri, 23 Feb 2024 16:42:51 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Fri, Feb 23, 2024 at 10:37:11PM +0800, shiju.jose@huawei.com wrote: > > CXL patrol scrub and DDR5 ECS and ACPI RAS2 HW based memory patrol > > scrub features are added as use cases for the scrub subsystem to > > expose the scrub controls to the user. > > WTH is a RAS2 driver and where is the RAS1 driver? > Odd ACPI naming. RASF (which is very limited) was replaced by RAS2. Both are ACPI tables that describe the Platform Communication Channels (mailbox) + provide a definition of the protocol that is sent over it in order to control memory scrub. RAS2 has some other uses this driver isn't touching on. So in short, it's an ACPI spec defined interface for controlling Memory Scrub engines. https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#acpi-ras2-feature-table-ras2 Jonathan
> RAS2 HW based memory patrol scrub needs RAS2 PCC interfaces > and ACPI RAS2 driver for communication b/w kernel and firmware. > ACPI RAS2 Driver adds platform device, for each memory feature, > which binds to the RAS2 memory driver. > Memory RAS2 driver registers with the memory scrub subsystem to > expose the RAS2 scrub controls to the user. Hi Shiju, Thanks for this work. This has been very useful for us, as we've been using it to test a RAS2 implementation here at Ampere Computing. In general, the pieces implementing RAS2 are useful and functional. We did, however, encounter a few places where we had to make some bug fixes. I'll reply to specific patches with changes I can recommend. Also, there are a few implementation choices that I will have questions about. Here is an outline containing the generalized comments I plan on making within/near relevant patches: 1) Found a bug in ras2.c in how the pcc_desc_list pointer is incremented. 2) Executing a RAS2 command seems unnecessary for feature detection. specifically, in the functions ras2_is_patrol_scrub_support() and ras2_get_patrol_scrub_params() 3) Consider adding more error detection/reporting in ras2_check_pcc_chan() 4) Concerns with multiple scrub devices attempting to map the same PCC channel. -Daniel
Hi Daniel, Thanks for the feedbacks. >-----Original Message----- >From: Daniel Ferguson <danielf@os.amperecomputing.com> >Sent: 28 March 2024 23:40 >To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org; linux- >acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com; >dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>; >dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com; >ira.weiny@intel.com >Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com; >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com; >tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com; >rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; >james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com; >erdemaktas@google.com; pgonda@google.com; duenwen@google.com; >mike.malvestuto@intel.com; gthelen@google.com; >wschwartz@amperecomputing.com; dferguson@amperecomputing.com; >tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>; >Linuxarm <linuxarm@huawei.com>; wbs@os.amperecomputing.com >Subject: Re: [RFC PATCH v7 00/12] memory: scrub: introduce subsystem + >CXL/ACPI-RAS2 drivers > >> RAS2 HW based memory patrol scrub needs RAS2 PCC interfaces and ACPI >> RAS2 driver for communication b/w kernel and firmware. >> ACPI RAS2 Driver adds platform device, for each memory feature, which >> binds to the RAS2 memory driver. >> Memory RAS2 driver registers with the memory scrub subsystem to expose >> the RAS2 scrub controls to the user. > >Hi Shiju, >Thanks for this work. This has been very useful for us, as we've been using it to >test a RAS2 implementation here at Ampere Computing. > >In general, the pieces implementing RAS2 are useful and functional. We did, >however, encounter a few places where we had to make some bug fixes. >I'll reply to specific patches with changes I can recommend. Also, there are a >few implementation choices that I will have questions about. > >Here is an outline containing the generalized comments I plan on making >within/near relevant patches: >1) Found a bug in ras2.c in how the pcc_desc_list pointer is incremented. This bug was fixed in the local v8 code. >2) Executing a RAS2 command seems unnecessary for feature detection. >specifically, in the functions ras2_is_patrol_scrub_support() and >ras2_get_patrol_scrub_params() I will check and modify in the next version. >3) Consider adding more error detection/reporting in ras2_check_pcc_chan() Sure. >4) Concerns with multiple scrub devices attempting to map the same PCC >channel. I will reply in the patch. > >-Daniel > Thanks, Shiju
From: Shiju Jose <shiju.jose@huawei.com> 'Previously known as: cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features' https://lore.kernel.org/lkml/20240215111455.1462-1-shiju.jose@huawei.com/ Introduce generic memory scrub subsystem which allows user to control underlying memory scrubbers in the system via the sysfs scrub control interface. Memory scrub is a feature where an ECC engine reads data from each memory media location, corrects with an ECC if necessary and writes the corrected data back to the same memory media location. More details can be found in Reference [1]. CXL patrol scrub and DDR5 ECS and ACPI RAS2 HW based memory patrol scrub features are added as use cases for the scrub subsystem to expose the scrub controls to the user. CXL device patrol scrub and DDR5 ECS features needs support for the CXL feature mail box commands. CXL device scrub driver registers with the memory scrub subsystem to expose the scrub controls for CXL device patrol and ECS scrubs to the user. RAS2 HW based memory patrol scrub needs RAS2 PCC interfaces and ACPI RAS2 driver for communication b/w kernel and firmware. ACPI RAS2 Driver adds platform device, for each memory feature, which binds to the RAS2 memory driver. Memory RAS2 driver registers with the memory scrub subsystem to expose the RAS2 scrub controls to the user. Series adds, 1. scrub subsystem driver supports configuring memory scrubs in the system. 2. support for CXL feature mailbox commands. 3. CXL device scrub driver supporting patrol scrub control and ECS control features. 4. register CXL device patrol scrub and ECS with scrub subsystem. 5. common library for RAS2 PCC interfaces. 6. ACPI RAS2 driver for ACPI RAS2 feature table (RAS2). 7. memory RAS2 driver and registers with scrub subsystem. The QEMU series to support the CXL specific scrub features is available here, https://lore.kernel.org/qemu-devel/20240223085902.1549-1-shiju.jose@huawei.com/ References: 1. Discussions on kernel support of memory error detection and patrol scrubber can be found here. https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/ 2. Discussions on RASF: https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/ Changes v6 -> v7: 1. Main changes for comments from Jonathan. 1.1. CXL - Changes for deal with small mail box and supporting multipart feature data transfers. - Provide more specific parameters to mbox supported/get/set features interface functions. - kvmalloc -> kmalloc in CXL scrub mem allocation for feature commands. - Changed the way using __free(kfree) - Removed readback and verify for setting CXL scrub patrol and ECS parameters. Could be added later if needed. - In is_visible() callback functions for scrub control sysfs attrs changed to writeback the default attribute mode value instead of setting per attrs. - Add documentation for sysfs interfaces for CXL ECS scrub control. 1.2. RAS2 - In rasf common code, rename rasf to ras2 because RASF seems obselete. - Replace pr_* with dev_* log function calls from ACPI RAS2 and memory RAS2 drivers. - In rasf common code, rename rasf to ras2. - Removed including unnecessary .h file from memory RAS2 driver. - In is_visible() callback functions for scrub control sysfs attrs changed to writeback the default attribute mode value instead of setting per attribute. 2. Changes for comments from Fan. - Add debug message if cxl patrol scrub and ecs init function calls fail. 3. Updated cover letter for feedback from Dan Williams. v5 -> v6: 1. Changes for comments from Davidlohr, Thanks. - Update CXL feature code based on spec 3.1. - attrb -> attr - Use enums with default counting. 2. Rebased to the latest kernel. v4 -> v5: 1. Following are the main changes made based on the feedback from Dan Williams on v4. 1.1. In the scrub subsystem the common scrub control attributes are statically defined instead of dynamically created. 1.2. Add scrub subsystem support externally defined attribute group. Add CXL ECS driver define ECS specific attribute group and pass to the scrub subsystem. 1.3. Move cxl_mem_ecs_init() to cxl/core/region.c so that the CXL region_id is used in the registration with the scrub subsystem. 1.4. Add previously posted RASF common and RAS2 patches to this scrub series. 2. Add support for the 'enable_background_scrub' attribute for RAS2, on request from Bill Schwartz(wschwartz@amperecomputing.com). v3 -> v4: 1. Fixes for the warnings/errors reported by kernel test robot. 2. Add support for reading the 'enable' attribute of CXL patrol scrub. Changes v2 -> v3: 1. Changes for comments from Davidlohr, Thanks. - Updated cxl scrub kconfig - removed usage of the flag is_support_feature from the function cxl_mem_get_supported_feature_entry(). - corrected spelling error. - removed unnecessary debug message. - removed export feature commands to the userspace. 2. Possible fix for the warnings/errors reported by kernel test robot. 3. Add documentation for the common scrub configure atrributes. v1 -> v2: 1. Changes for comments from Dave Jiang, Thanks. - Split patches. - reversed xmas tree declarations. - declared flags as enums. - removed few unnecessary variable initializations. - replaced PTR_ERR_OR_ZERO() with IS_ERR() and PTR_ERR(). - add auto clean declarations. - replaced while loop with for loop. - Removed allocation from cxl_get_supported_features() and cxl_get_feature() and make change to take allocated memory pointer from the caller. - replaced if/else with switch case. - replaced sprintf() with sysfs_emit() in 2 places. - replaced goto label with return in few functions. 2. removed unused code for supported attributes from ecs. 3. Included following common patch for scrub configure driver to this series. "memory: scrub: Add scrub driver supports configuring memory scrubbers in the system" A Somasundaram (1): ACPI:RAS2: Add common library for RAS2 PCC interfaces Shiju Jose (11): cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command cxl/mbox: Add GET_FEATURE mailbox command cxl/mbox: Add SET_FEATURE mailbox command cxl/memscrub: Add CXL device patrol scrub control feature cxl/memscrub: Add CXL device ECS control feature memory: scrub: Add scrub subsystem driver supports configuring memory scrubs in the system cxl/memscrub: Register CXL device patrol scrub with scrub subsystem driver cxl/memscrub: Register CXL device ECS with scrub subsystem driver ACPICA: ACPI 6.5: Add support for RAS2 table ACPI:RAS2: Add driver for ACPI RAS2 feature table (RAS2) memory: RAS2: Add memory RAS2 driver .../ABI/testing/sysfs-class-cxl-ecs-configure | 79 ++ .../ABI/testing/sysfs-class-scrub-configure | 91 ++ drivers/acpi/Kconfig | 14 + drivers/acpi/Makefile | 1 + drivers/acpi/ras2_acpi.c | 97 ++ drivers/acpi/ras2_acpi_common.c | 272 +++++ drivers/cxl/Kconfig | 21 + drivers/cxl/core/Makefile | 1 + drivers/cxl/core/mbox.c | 143 +++ drivers/cxl/core/memscrub.c | 954 ++++++++++++++++++ drivers/cxl/core/region.c | 3 + drivers/cxl/cxlmem.h | 124 +++ drivers/cxl/pci.c | 4 + drivers/memory/Kconfig | 15 + drivers/memory/Makefile | 3 + drivers/memory/ras2.c | 364 +++++++ drivers/memory/ras2_common.c | 282 ++++++ drivers/memory/scrub/Kconfig | 11 + drivers/memory/scrub/Makefile | 6 + drivers/memory/scrub/memory-scrub.c | 369 +++++++ include/acpi/actbl2.h | 137 +++ include/acpi/ras2_acpi.h | 59 ++ include/memory/memory-scrub.h | 79 ++ include/memory/ras2.h | 88 ++ 24 files changed, 3217 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-cxl-ecs-configure create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure create mode 100755 drivers/acpi/ras2_acpi.c create mode 100755 drivers/acpi/ras2_acpi_common.c create mode 100644 drivers/cxl/core/memscrub.c create mode 100644 drivers/memory/ras2.c create mode 100644 drivers/memory/ras2_common.c create mode 100644 drivers/memory/scrub/Kconfig create mode 100644 drivers/memory/scrub/Makefile create mode 100755 drivers/memory/scrub/memory-scrub.c create mode 100644 include/acpi/ras2_acpi.h create mode 100755 include/memory/memory-scrub.h create mode 100755 include/memory/ras2.h