mbox series

[RFC,v6,00/12] cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features

Message ID 20240215111455.1462-1-shiju.jose@huawei.com
Headers show
Series cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features | expand

Message

Shiju Jose Feb. 15, 2024, 11:14 a.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

1. Add support for CXL feature mailbox commands.
2. Add CXL device scrub driver supporting patrol scrub control and ECS
control features.
3. Add scrub subsystem driver supports configuring memory scrubs in the system.
4. Register CXL device patrol scrub and ECS with scrub subsystem.
5. Add common library for RASF and RAS2 PCC interfaces.
6. Add driver for ACPI RAS2 feature table (RAS2).
7. Add memory RAS2 driver and register with scrub subsystem.

The QEMU series to support the CXL specific features is available here,
https://lore.kernel.org/qemu-devel/20240215110146.1444-1-shiju.jose@huawei.com/T/#t

Changes
v5 -> v6:
1. Changes for comments from Davidlohr, Thanks.
 - Update CXL feature code based on spec 3.1.
 - Rename attrb -> attr
 - Use enums with default counting.  
2. Rebased to the recent kernel.

v4 -> v5:
1. Following are the main changes made based on the feedback from Dan Williams on v4.
1.1. In the scrub subsystem the common scrub control attributes are statically defined
     instead of dynamically created.
1.2. Add scrub subsystem support externally defined attribute group.
     Add CXL ECS driver define ECS specific attribute group and pass to
	 the scrub subsystem.
1.3. Move cxl_mem_ecs_init() to cxl/core/region.c so that the CXL region_id
     is used in the registration with the scrub subsystem. 	 
1.4. Add previously posted RASF common and RAS2 patches to this scrub series.
	 
2. Add support for the 'enable_background_scrub' attribute
   for RAS2, on request from Bill Schwartz(wschwartz@amperecomputing.com).

v3 -> v4:
1. Fixes for the warnings/errors reported by kernel test robot.
2. Add support for reading the 'enable' attribute of CXL patrol scrub.

Changes
v2 -> v3:
1. Changes for comments from Davidlohr, Thanks.
 - Updated cxl scrub kconfig
 - removed usage of the flag is_support_feature from
   the function cxl_mem_get_supported_feature_entry().
 - corrected spelling error.
 - removed unnecessary debug message.
 - removed export feature commands to the userspace.
2. Possible fix for the warnings/errors reported by kernel
   test robot.
3. Add documentation for the common scrub configure atrributes.

v1 -> v2:
1. Changes for comments from Dave Jiang, Thanks.
 - Split patches.
 - reversed xmas tree declarations.
 - declared flags as enums.
 - removed few unnecessary variable initializations.
 - replaced PTR_ERR_OR_ZERO() with IS_ERR() and PTR_ERR().
 - add auto clean declarations.
 - replaced while loop with for loop.
 - Removed allocation from cxl_get_supported_features() and
   cxl_get_feature() and make change to take allocated memory
   pointer from the caller.
 - replaced if/else with switch case.
 - replaced sprintf() with sysfs_emit() in 2 places.
 - replaced goto label with return in few functions.
2. removed unused code for supported attributes from ecs.
3. Included following common patch for scrub configure driver
   to this series.
   "memory: scrub: Add scrub driver supports configuring memory scrubbers
    in the system"

A Somasundaram (1):
  ACPI:RASF: Add common library for RASF and RAS2 PCC interfaces

Shiju Jose (11):
  cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command
  cxl/mbox: Add GET_FEATURE mailbox command
  cxl/mbox: Add SET_FEATURE mailbox command
  cxl/memscrub: Add CXL device patrol scrub control feature
  cxl/memscrub: Add CXL device ECS control feature
  memory: scrub: Add scrub subsystem driver supports configuring memory
    scrubs in the system
  cxl/memscrub: Register CXL device patrol scrub with scrub configure
    driver
  cxl/memscrub: Register CXL device ECS with scrub configure driver
  ACPICA: ACPI 6.5: Add support for RAS2 table
  ACPI:RAS2: Add driver for ACPI RAS2 feature table (RAS2)
  memory: RAS2: Add memory RAS2 driver

 .../ABI/testing/sysfs-class-scrub-configure   |   91 ++
 drivers/acpi/Kconfig                          |   15 +
 drivers/acpi/Makefile                         |    1 +
 drivers/acpi/ras2_acpi.c                      |   97 ++
 drivers/acpi/rasf_acpi_common.c               |  272 +++++
 drivers/cxl/Kconfig                           |   23 +
 drivers/cxl/core/Makefile                     |    1 +
 drivers/cxl/core/mbox.c                       |   59 +
 drivers/cxl/core/memscrub.c                   | 1009 +++++++++++++++++
 drivers/cxl/core/region.c                     |    1 +
 drivers/cxl/cxlmem.h                          |  123 ++
 drivers/cxl/pci.c                             |    5 +
 drivers/memory/Kconfig                        |   15 +
 drivers/memory/Makefile                       |    3 +
 drivers/memory/ras2.c                         |  354 ++++++
 drivers/memory/rasf_common.c                  |  269 +++++
 drivers/memory/scrub/Kconfig                  |   11 +
 drivers/memory/scrub/Makefile                 |    6 +
 drivers/memory/scrub/memory-scrub.c           |  367 ++++++
 include/acpi/actbl2.h                         |  137 +++
 include/acpi/rasf_acpi.h                      |   58 +
 include/memory/memory-scrub.h                 |   78 ++
 include/memory/rasf.h                         |   88 ++
 23 files changed, 3083 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
 create mode 100755 drivers/acpi/ras2_acpi.c
 create mode 100755 drivers/acpi/rasf_acpi_common.c
 create mode 100644 drivers/cxl/core/memscrub.c
 create mode 100644 drivers/memory/ras2.c
 create mode 100644 drivers/memory/rasf_common.c
 create mode 100644 drivers/memory/scrub/Kconfig
 create mode 100644 drivers/memory/scrub/Makefile
 create mode 100755 drivers/memory/scrub/memory-scrub.c
 create mode 100644 include/acpi/rasf_acpi.h
 create mode 100755 include/memory/memory-scrub.h
 create mode 100755 include/memory/rasf.h

Comments

Dan Williams Feb. 22, 2024, 12:20 a.m. UTC | #1
shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> 1. Add support for CXL feature mailbox commands.
> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> control features.
> 3. Add scrub subsystem driver supports configuring memory scrubs in the system.
> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> 5. Add common library for RASF and RAS2 PCC interfaces.
> 6. Add driver for ACPI RAS2 feature table (RAS2).
> 7. Add memory RAS2 driver and register with scrub subsystem.

I stepped away from this patch set to focus on the changes that landed
for v6.8 and the follow-on regression fixups. Now that v6.8 CXL work has
quieted down and I circle back to this set for v6.9 I find the lack of
story in this cover letter to be unsettling. As a reviewer I should not
have to put together the story on why Linux should care about this
feature and independently build up the maintainence-burden vs benefit
tradeoff analysis.

Maybe it is self evident to others, but for me there is little in these
changelogs besides "mechanism exists, enable it". There are plenty of
platform or device mechanisms that get specified that Linux does not
enable for one reason or another.

The cover letter needs to answer why it matters, and what are the
tradeoffs. Mind you, in my submissions I do not always get this right in
the cover letter [1], but hopefully at least one of the patches tells
the story [2].

In other words, imagine you are writing the pull request to Linus or
someone else with limited time who needs to make a risk decision on a
pull request with a diffstat of:

    23 files changed, 3083 insertions(+)

...where the easiest decision is to just decline. As is, these
changelogs are not close to tipping the scale to "accept".

[sidebar: how did this manage to implement a new subsystem with 2
consumers (CXL + ACPI), without modifying a single existing line? Zero
deletions? That is either an indication that Linux perfectly anticipated
this future use case (unlikely), or more work needs to be done to digest
an integrate these concepts into existing code paths]

One of the first questions for me is why CXL and RAS2 as the first
consumers and not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the
maintenance burden tradeoff is providing a migration path for legacy on
the way to adding the new thing. If old scrub implementations could be
deprecated / deleted on the way to supporting new scrub use cases that
becomes interesting.

[1]: http://lore.kernel.org/r/20240208220909.GA975234@bhelgaas
[2]: http://lore.kernel.org/r/20240208221305.GA975512@bhelgaas
Shiju Jose Feb. 23, 2024, 12:16 p.m. UTC | #2
Hi Dan,

Thanks for the feedback.

Please find reply inline.

>-----Original Message-----
>From: Dan Williams <dan.j.williams@intel.com>
>Sent: 22 February 2024 00:21
>To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com;
>dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
>ira.weiny@intel.com
>Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
>Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
>CXL device patrol scrub control and DDR5 ECS control features
>
>shiju.jose@ wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> 1. Add support for CXL feature mailbox commands.
>> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
>> control features.
>> 3. Add scrub subsystem driver supports configuring memory scrubs in the
>system.
>> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
>> 5. Add common library for RASF and RAS2 PCC interfaces.
>> 6. Add driver for ACPI RAS2 feature table (RAS2).
>> 7. Add memory RAS2 driver and register with scrub subsystem.
>
>I stepped away from this patch set to focus on the changes that landed for v6.8
>and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
>and I circle back to this set for v6.9 I find the lack of story in this cover letter to
>be unsettling. As a reviewer I should not have to put together the story on why
>Linux should care about this feature and independently build up the
>maintainence-burden vs benefit tradeoff analysis.
I will add more details to the cover letter.
 
>
>Maybe it is self evident to others, but for me there is little in these changelogs
>besides "mechanism exists, enable it". There are plenty of platform or device
>mechanisms that get specified that Linux does not enable for one reason or
>another.
>
>The cover letter needs to answer why it matters, and what are the tradeoffs.
>Mind you, in my submissions I do not always get this right in the cover letter [1],
>but hopefully at least one of the patches tells the story [2].
>
>In other words, imagine you are writing the pull request to Linus or someone
>else with limited time who needs to make a risk decision on a pull request with a
>diffstat of:
>
>    23 files changed, 3083 insertions(+)
>
>...where the easiest decision is to just decline. As is, these changelogs are not
>close to tipping the scale to "accept".
>
>[sidebar: how did this manage to implement a new subsystem with 2 consumers
>(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
>either an indication that Linux perfectly anticipated this future use case
>(unlikely), or more work needs to be done to digest an integrate these concepts
>into existing code paths]
>
>One of the first questions for me is why CXL and RAS2 as the first consumers and
>not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
We don't personally care about NVDIMMS but would welcome drivers from others.

Regarding RASF patrol scrub no one cared about it as it's useless and
any new implementation should be RAS2.
Previous discussions in the community about RASF and scrub could be find here.
https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
and some old ones,
https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/

https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/

>tradeoff is providing a migration path for legacy on the way to adding the new
>thing. If old scrub implementations could be deprecated / deleted on the way to
>supporting new scrub use cases that becomes interesting.
>
>[1]: http://lore.kernel.org/r/20240208220909.GA975234@bhelgaas
>[2]: http://lore.kernel.org/r/20240208221305.GA975512@bhelgaas

Thanks,
Shiju
Dan Williams Feb. 23, 2024, 7:42 p.m. UTC | #3
Shiju Jose wrote:
> Hi Dan,
> 
> Thanks for the feedback.
> 
> Please find reply inline.
> 
> >-----Original Message-----
> >From: Dan Williams <dan.j.williams@intel.com>
> >Sent: 22 February 2024 00:21
> >To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org; linux-
> >acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com;
> >dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
> >dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
> >ira.weiny@intel.com
> >Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
> >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
> >tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
> >rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com;
> >james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
> >erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> >mike.malvestuto@intel.com; gthelen@google.com;
> >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
> >tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
> >Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
> >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> >CXL device patrol scrub control and DDR5 ECS control features
> >
> >shiju.jose@ wrote:
> >> From: Shiju Jose <shiju.jose@huawei.com>
> >>
> >> 1. Add support for CXL feature mailbox commands.
> >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> >> control features.
> >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> >system.
> >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> >> 5. Add common library for RASF and RAS2 PCC interfaces.
> >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> >> 7. Add memory RAS2 driver and register with scrub subsystem.
> >
> >I stepped away from this patch set to focus on the changes that landed for v6.8
> >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> >be unsettling. As a reviewer I should not have to put together the story on why
> >Linux should care about this feature and independently build up the
> >maintainence-burden vs benefit tradeoff analysis.
> I will add more details to the cover letter.
>  
> >
> >Maybe it is self evident to others, but for me there is little in these changelogs
> >besides "mechanism exists, enable it". There are plenty of platform or device
> >mechanisms that get specified that Linux does not enable for one reason or
> >another.
> >
> >The cover letter needs to answer why it matters, and what are the tradeoffs.
> >Mind you, in my submissions I do not always get this right in the cover letter [1],
> >but hopefully at least one of the patches tells the story [2].
> >
> >In other words, imagine you are writing the pull request to Linus or someone
> >else with limited time who needs to make a risk decision on a pull request with a
> >diffstat of:
> >
> >    23 files changed, 3083 insertions(+)
> >
> >...where the easiest decision is to just decline. As is, these changelogs are not
> >close to tipping the scale to "accept".
> >
> >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> >either an indication that Linux perfectly anticipated this future use case
> >(unlikely), or more work needs to be done to digest an integrate these concepts
> >into existing code paths]
> >
> >One of the first questions for me is why CXL and RAS2 as the first consumers and
> >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> We don't personally care about NVDIMMS but would welcome drivers from others.

Upstream would also welcome consideration of maintenance burden
reduction before piling on, at least include *some* consideration of the
implications vs this response that comes off as "that's somebody else's
problem".

> Regarding RASF patrol scrub no one cared about it as it's useless and
> any new implementation should be RAS2.

The assertion that "RASF patrol scrub no one cared about it as it's
useless and any new implementation should be RAS2" needs evidence.

For example, what platforms are going to ship with RAS2 support, what
are the implications of Linux not having RAS2 scrub support in a month,
or in year? There are parts of the ACPI spec that have never been
implemented what is the evidence that RAS2 is not going to suffer the
same fate as RASF? There are parts of the CXL specification that have
never been implemented in mass market products.

> Previous discussions in the community about RASF and scrub could be find here.
> https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> and some old ones,
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> 

Do not make people hunt for old discussions, if there are useful points
in that discussion that make the case for the patch set include those in
the next submission, don't make people hunt for the latest state of the
story.

> https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/

Yes, now that is a useful changelog, thank you for highlighting it,
please follow its example.
Jonathan Cameron Feb. 26, 2024, 10:29 a.m. UTC | #4
On Fri, 23 Feb 2024 11:42:24 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Shiju Jose wrote:
> > Hi Dan,
> > 
> > Thanks for the feedback.
> > 
> > Please find reply inline.
> >   
> > >-----Original Message-----
> > >From: Dan Williams <dan.j.williams@intel.com>
> > >Sent: 22 February 2024 00:21
> > >To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org; linux-
> > >acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com;
> > >dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
> > >dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
> > >ira.weiny@intel.com
> > >Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> > >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
> > >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
> > >tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
> > >rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com;
> > >james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
> > >erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> > >mike.malvestuto@intel.com; gthelen@google.com;
> > >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
> > >tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
> > >Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
> > >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> > >CXL device patrol scrub control and DDR5 ECS control features
> > >
> > >shiju.jose@ wrote:  
> > >> From: Shiju Jose <shiju.jose@huawei.com>
> > >>
> > >> 1. Add support for CXL feature mailbox commands.
> > >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> > >> control features.
> > >> 3. Add scrub subsystem driver supports configuring memory scrubs in the  
> > >system.  
> > >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> > >> 5. Add common library for RASF and RAS2 PCC interfaces.
> > >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> > >> 7. Add memory RAS2 driver and register with scrub subsystem.  
> > >
> > >I stepped away from this patch set to focus on the changes that landed for v6.8
> > >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> > >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> > >be unsettling. As a reviewer I should not have to put together the story on why
> > >Linux should care about this feature and independently build up the
> > >maintainence-burden vs benefit tradeoff analysis.  
> > I will add more details to the cover letter.
> >    
> > >
> > >Maybe it is self evident to others, but for me there is little in these changelogs
> > >besides "mechanism exists, enable it". There are plenty of platform or device
> > >mechanisms that get specified that Linux does not enable for one reason or
> > >another.
> > >
> > >The cover letter needs to answer why it matters, and what are the tradeoffs.
> > >Mind you, in my submissions I do not always get this right in the cover letter [1],
> > >but hopefully at least one of the patches tells the story [2].
> > >
> > >In other words, imagine you are writing the pull request to Linus or someone
> > >else with limited time who needs to make a risk decision on a pull request with a
> > >diffstat of:
> > >
> > >    23 files changed, 3083 insertions(+)
> > >
> > >...where the easiest decision is to just decline. As is, these changelogs are not
> > >close to tipping the scale to "accept".
> > >
> > >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> > >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> > >either an indication that Linux perfectly anticipated this future use case
> > >(unlikely), or more work needs to be done to digest an integrate these concepts
> > >into existing code paths]
> > >
> > >One of the first questions for me is why CXL and RAS2 as the first consumers and
> > >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden  
> > We don't personally care about NVDIMMS but would welcome drivers from others.  
> 
> Upstream would also welcome consideration of maintenance burden
> reduction before piling on, at least include *some* consideration of the
> implications vs this response that comes off as "that's somebody else's
> problem".

We can do analysis of whether the interfaces are suitable etc but
have no access to test hardware or emulation. I guess I can hack something
together easily enough. Today ndctl has some support. Interestingly the model
is different from typical volatile scrubbing as it's all on demand - that
could be easily wrapped up in a software scrub scheduler though, but we'd need
input from you and other Intel people on how this is actually used. 

The use model is a lot less obvious than autonomous scrubbers - I assume because
the persistence means you need to do this rarely if at all (though ARS does
support scrubbing volatile memory on nvdimms)

So initial conclusion is it would need a few more controls or it needs
some software handling of scan scheduling to map it to the interface type
that is common to CXL and RAS2 scrub controls.

Intent of the comment was to keep scope somewhat confined, and to 
invite others to get involved, not to rule out doing some light weight
analysis of whether this feature would work for another potential user
which we weren't even aware of until you mentioned it (thanks!).

> 
> > Regarding RASF patrol scrub no one cared about it as it's useless and
> > any new implementation should be RAS2.  
> 
> The assertion that "RASF patrol scrub no one cared about it as it's
> useless and any new implementation should be RAS2" needs evidence.
> 
> For example, what platforms are going to ship with RAS2 support, what
> are the implications of Linux not having RAS2 scrub support in a month,
> or in year? There are parts of the ACPI spec that have never been
> implemented what is the evidence that RAS2 is not going to suffer the
> same fate as RASF? 

From discussions with various firmware folk we have a chicken and egg
situation on RAS2. They will stick to their custom solutions unless there is
plausible support in Linux for it - so right now it's a question mark
on roadmaps. Trying to get rid of that question mark is why Shiju and I
started looking at this in the first place. To get rid of that question
mark we don't necessarily need to have it upstream, but we do need
to be able to make the argument that there will be a solution ready
soon after they release the BIOS image.  (Some distros will take years
to catch up though).

If anyone else an speak up on this point please do. Discussions and
feedback communicated to Shiju and I off list aren't going to
convince people :(
Negatives perhaps easier to give than positives given this is seen as
a potential feature for future platforms so may be confidential.

> There are parts of the CXL specification that have
> never been implemented in mass market products.

Obviously can't talk about who was involved in this feature
in it's definition, but I have strong confidence it will get implemented
for reasons I can point at on a public list. 
a) There will be scrubbing on devices.
b) It will need control (evidence for this is the BIOS controls mentioned below
   for equivalent main memory).
c) Hotplug means that control must be done by OS driver (or via very fiddly
   pre hotplug hacks that I think we can all agree should not be necessary
   and aren't even an option on all platforms)
d) No one likes custom solutions.
This isn't a fancy feature with a high level of complexity which helps.

Today there is the option for main memory of leaving it to BIOS parameters.
A quick google gave me some examples (to make sure they are public):
Dell: PowerEdge R640 BIOS and UEFI Reference Guide
  - Memory patrol scrub - Sets the memory patrol scrub frequency.
HP UEFI System Utilities for HPE ProLiant Gen 11 SErvers
  - Enabling or disable patrol scrub
Spec list of flags for lenovo systems (tells you that turning patrol scrub
   off is a good idea ;)
Huawei Kunpeng 920 RAS config menu. 
  - Active Scrub, Active Scrub interval etc.

> 
> > Previous discussions in the community about RASF and scrub could be find here.
> > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > and some old ones,
> > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> >   
> 
> Do not make people hunt for old discussions, if there are useful points
> in that discussion that make the case for the patch set include those in
> the next submission, don't make people hunt for the latest state of the
> story.

Sure, more of an essay needed along with links given we are talking
about the views of others.

Quick summary from a reread of the linked threads.
AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
about inflexibility of RAS2 spec today. They were looking at some spec
changes to improve this + other functions to be added to RAS2.
I agree with it being limited, but think extending with backwards
compatibility isn't a problem (and ACPI spec rules in theory guarantee
it won't break).  I'm keen on working with current version
so that we can ensure the ABI design for CXL encompasses it.

Intel folk were cc'd but not said anything on that thread, but Tony Luck
did comment in Jiaqi Yan's software scrubbing discussion linked below.
He observed that a hardware implementation can be complex if doing range
based scrubbing due to interleave etc. RAS2 and CXL both side step this
somewhat by making it someone elses problem. In RAS2 the firmware gets
to program multiple scrubbers to cover the range requested. In CXL
for now this leaves the problem for userspace, but we can definitely
consider a region interface if it makes sense.

I'd also like to see inputs from a wider range of systems folk + other
CPU companies.  How easy this is to implement is heavily dependent on
what entity in your system is responsible for this sort of runtime
service and that varies a lot.

> 
> > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/  
> 
> Yes, now that is a useful changelog, thank you for highlighting it,
> please follow its example.

It's not a changelog as such but a RFC in text only form.
However indeed lots of good info in there.

Jonathan
John Groves Feb. 28, 2024, 10:26 p.m. UTC | #5
On 24/02/23 11:42AM, Dan Williams wrote:
> Shiju Jose wrote:
> > Hi Dan,
> > 
> > Thanks for the feedback.
> > 
> > Please find reply inline.
> > 
> > >-----Original Message-----
> > >From: Dan Williams <dan.j.williams@intel.com>
> > >Sent: 22 February 2024 00:21
> > >To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org; linux-
> > >acpi@vger.kernel.org; linux-mm@kvack.org; dan.j.williams@intel.com;
> > >dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
> > >dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com;
> > >ira.weiny@intel.com
> > >Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> > >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
> > >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
> > >tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
> > >rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com;
> > >james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
> > >erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> > >mike.malvestuto@intel.com; gthelen@google.com;
> > >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
> > >tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
> > >Linuxarm <linuxarm@huawei.com>; Shiju Jose <shiju.jose@huawei.com>
> > >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> > >CXL device patrol scrub control and DDR5 ECS control features
> > >
> > >shiju.jose@ wrote:
> > >> From: Shiju Jose <shiju.jose@huawei.com>
> > >>
> > >> 1. Add support for CXL feature mailbox commands.
> > >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> > >> control features.
> > >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> > >system.
> > >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> > >> 5. Add common library for RASF and RAS2 PCC interfaces.
> > >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> > >> 7. Add memory RAS2 driver and register with scrub subsystem.
> > >
> > >I stepped away from this patch set to focus on the changes that landed for v6.8
> > >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> > >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> > >be unsettling. As a reviewer I should not have to put together the story on why
> > >Linux should care about this feature and independently build up the
> > >maintainence-burden vs benefit tradeoff analysis.
> > I will add more details to the cover letter.
> >  
> > >
> > >Maybe it is self evident to others, but for me there is little in these changelogs
> > >besides "mechanism exists, enable it". There are plenty of platform or device
> > >mechanisms that get specified that Linux does not enable for one reason or
> > >another.
> > >
> > >The cover letter needs to answer why it matters, and what are the tradeoffs.
> > >Mind you, in my submissions I do not always get this right in the cover letter [1],
> > >but hopefully at least one of the patches tells the story [2].
> > >
> > >In other words, imagine you are writing the pull request to Linus or someone
> > >else with limited time who needs to make a risk decision on a pull request with a
> > >diffstat of:
> > >
> > >    23 files changed, 3083 insertions(+)
> > >
> > >...where the easiest decision is to just decline. As is, these changelogs are not
> > >close to tipping the scale to "accept".
> > >
> > >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> > >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> > >either an indication that Linux perfectly anticipated this future use case
> > >(unlikely), or more work needs to be done to digest an integrate these concepts
> > >into existing code paths]
> > >
> > >One of the first questions for me is why CXL and RAS2 as the first consumers and
> > >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> > We don't personally care about NVDIMMS but would welcome drivers from others.
> 
> Upstream would also welcome consideration of maintenance burden
> reduction before piling on, at least include *some* consideration of the
> implications vs this response that comes off as "that's somebody else's
> problem".
> 
> > Regarding RASF patrol scrub no one cared about it as it's useless and
> > any new implementation should be RAS2.
> 
> The assertion that "RASF patrol scrub no one cared about it as it's
> useless and any new implementation should be RAS2" needs evidence.
> 
> For example, what platforms are going to ship with RAS2 support, what
> are the implications of Linux not having RAS2 scrub support in a month,
> or in year? There are parts of the ACPI spec that have never been
> implemented what is the evidence that RAS2 is not going to suffer the
> same fate as RASF? There are parts of the CXL specification that have
> never been implemented in mass market products.
> 
> > Previous discussions in the community about RASF and scrub could be find here.
> > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > and some old ones,
> > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > 
> 
> Do not make people hunt for old discussions, if there are useful points
> in that discussion that make the case for the patch set include those in
> the next submission, don't make people hunt for the latest state of the
> story.
> 
> > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
> 
> Yes, now that is a useful changelog, thank you for highlighting it,
> please follow its example.

Just a comment that is not directed at the implementation details: at Micron we
see demand for the scrub control feature, so we do hope to see this support
go in sooner rather than later.

Regards,
John
Dan Williams Feb. 29, 2024, 7:51 p.m. UTC | #6
Jonathan Cameron wrote:

Thanks for taking the time Jonathan, this really helps.

[..]
> We can do analysis of whether the interfaces are suitable etc but
> have no access to test hardware or emulation. I guess I can hack something
> together easily enough. Today ndctl has some support. Interestingly the model
> is different from typical volatile scrubbing as it's all on demand - that
> could be easily wrapped up in a software scrub scheduler though, but we'd need
> input from you and other Intel people on how this is actually used. 
> 
> The use model is a lot less obvious than autonomous scrubbers - I assume because
> the persistence means you need to do this rarely if at all (though ARS does
> support scrubbing volatile memory on nvdimms)
> 
> So initial conclusion is it would need a few more controls or it needs
> some software handling of scan scheduling to map it to the interface type
> that is common to CXL and RAS2 scrub controls.
> 
> Intent of the comment was to keep scope somewhat confined, and to 
> invite others to get involved, not to rule out doing some light weight
> analysis of whether this feature would work for another potential user
> which we weren't even aware of until you mentioned it (thanks!).

Ok, Fair enough.

> > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > any new implementation should be RAS2.  
> > 
> > The assertion that "RASF patrol scrub no one cared about it as it's
> > useless and any new implementation should be RAS2" needs evidence.
> > 
> > For example, what platforms are going to ship with RAS2 support, what
> > are the implications of Linux not having RAS2 scrub support in a month,
> > or in year? There are parts of the ACPI spec that have never been
> > implemented what is the evidence that RAS2 is not going to suffer the
> > same fate as RASF? 
> 
> From discussions with various firmware folk we have a chicken and egg
> situation on RAS2. They will stick to their custom solutions unless there is
> plausible support in Linux for it - so right now it's a question mark
> on roadmaps. Trying to get rid of that question mark is why Shiju and I
> started looking at this in the first place. To get rid of that question
> mark we don't necessarily need to have it upstream, but we do need
> to be able to make the argument that there will be a solution ready
> soon after they release the BIOS image.  (Some distros will take years
> to catch up though).
> 
> If anyone else an speak up on this point please do. Discussions and
> feedback communicated to Shiju and I off list aren't going to
> convince people :(
> Negatives perhaps easier to give than positives given this is seen as
> a potential feature for future platforms so may be confidential.

So one of the observations from efforts like RAS API [1] is that CXL is
definining mechanisms that others are using for non-CXL use cases. I.e.
a CXL-like mailbox that supports events is a generic transport that can
be used for many RAS scenarios not just CXL endpoints. It supplants
building new ACPI interfaces for these things because the expectation is
that an OS just repurposes its CXL Type-3 infrastructure to also drive
event collection for RAS API compliant devices in the topology.

[1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream

So when considering whether Linux should build support for ACPI RASF,
ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
of those can supplant the others.

Speaking only for myself with my Linux kernel maintainer hat on, I am
much more attracted to proposals like RAS API where native drivers can
be deployed vs ACPI which brings ACPI static definition baggage and a
3rd component to manage. RAS API is kernel driver + device-firmware
while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
device-firmware.

In other words, this patch proposal enables both CXL memscrub and ACPI
RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
and your clarification asserts that RAS2 is basically dead until Linux
adopts it. So then the question becomes why should Linux breath air into
the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?

The RAS API example seems to indicate that one way to get scrub support
for non-CXL memory controllers would be to reuse CXL memscrub
infrastructure. In a world where there is kernel mechanism to understand
CXL-like scrub mechanisms, why not nudge the industry in that direction
instead of continuing to build new and different ACPI mechanisms?

> > There are parts of the CXL specification that have
> > never been implemented in mass market products.
> 
> Obviously can't talk about who was involved in this feature
> in it's definition, but I have strong confidence it will get implemented
> for reasons I can point at on a public list. 
> a) There will be scrubbing on devices.
> b) It will need control (evidence for this is the BIOS controls mentioned below
>    for equivalent main memory).
> c) Hotplug means that control must be done by OS driver (or via very fiddly
>    pre hotplug hacks that I think we can all agree should not be necessary
>    and aren't even an option on all platforms)
> d) No one likes custom solutions.
> This isn't a fancy feature with a high level of complexity which helps.

That does help, it would help even more if the maintenance burden of CXL
scrub precludes needing to carry the burden of other implementations.

[..]
> > 
> > > Previous discussions in the community about RASF and scrub could be find here.
> > > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > > and some old ones,
> > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > >   
> > 
> > Do not make people hunt for old discussions, if there are useful points
> > in that discussion that make the case for the patch set include those in
> > the next submission, don't make people hunt for the latest state of the
> > story.
> 
> Sure, more of an essay needed along with links given we are talking
> about the views of others.
> 
> Quick summary from a reread of the linked threads.
> AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> about inflexibility of RAS2 spec today. They were looking at some spec
> changes to improve this + other functions to be added to RAS2.
> I agree with it being limited, but think extending with backwards
> compatibility isn't a problem (and ACPI spec rules in theory guarantee
> it won't break).  I'm keen on working with current version
> so that we can ensure the ABI design for CXL encompasses it.
> 
> Intel folk were cc'd but not said anything on that thread, but Tony Luck
> did comment in Jiaqi Yan's software scrubbing discussion linked below.
> He observed that a hardware implementation can be complex if doing range
> based scrubbing due to interleave etc. RAS2 and CXL both side step this
> somewhat by making it someone elses problem. In RAS2 the firmware gets
> to program multiple scrubbers to cover the range requested. In CXL
> for now this leaves the problem for userspace, but we can definitely
> consider a region interface if it makes sense.
> 
> I'd also like to see inputs from a wider range of systems folk + other
> CPU companies.  How easy this is to implement is heavily dependent on
> what entity in your system is responsible for this sort of runtime
> service and that varies a lot.

This answers my main question of whether RAS2 is a done deal with
shipping platforms making it awkward for Linux to *not* support RAS2, or
if this is the start of an industry conversation that wants some Linux
ecosystem feedback. It sounds more like the latter.

> > > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/  
> > 
> > Yes, now that is a useful changelog, thank you for highlighting it,
> > please follow its example.
> 
> It's not a changelog as such but a RFC in text only form.
> However indeed lots of good info in there.
> 
> Jonathan

Thanks again for taking the time Jonathan.
Luck, Tony Feb. 29, 2024, 8:41 p.m. UTC | #7
> Obviously can't talk about who was involved in this feature
> in it's definition, but I have strong confidence it will get implemented
> for reasons I can point at on a public list. 
> a) There will be scrubbing on devices.
> b) It will need control (evidence for this is the BIOS controls mentioned below
>    for equivalent main memory).
> c) Hotplug means that control must be done by OS driver (or via very fiddly
>    pre hotplug hacks that I think we can all agree should not be necessary
>    and aren't even an option on all platforms)
> d) No one likes custom solutions.
> This isn't a fancy feature with a high level of complexity which helps.

But how will users know what are appropriate scrubbing
parameters for these devices?

Car analogy: Fuel injection systems on internal combustion engines
have tweakable controls. But no auto manufacturer wires them up to
a user accessible dashboad control.

Back to computers:

I'd expect the OEMs that produce memory devices to set appropriate
scrubbing rates based on their internal knowledge of the components
used in construction.

What is the use case where some user would need to override these
parameters and scrub and a faster/slower rate than that set by the
manufacturer?

-Tony
Jonathan Cameron March 1, 2024, 1:19 p.m. UTC | #8
On Thu, 29 Feb 2024 12:41:53 -0800
Tony Luck <tony.luck@intel.com> wrote:

> > Obviously can't talk about who was involved in this feature
> > in it's definition, but I have strong confidence it will get implemented
> > for reasons I can point at on a public list. 
> > a) There will be scrubbing on devices.
> > b) It will need control (evidence for this is the BIOS controls mentioned below
> >    for equivalent main memory).
> > c) Hotplug means that control must be done by OS driver (or via very fiddly
> >    pre hotplug hacks that I think we can all agree should not be necessary
> >    and aren't even an option on all platforms)
> > d) No one likes custom solutions.
> > This isn't a fancy feature with a high level of complexity which helps.  

Hi Tony,
> 
> But how will users know what are appropriate scrubbing
> parameters for these devices?
> 
> Car analogy: Fuel injection systems on internal combustion engines
> have tweakable controls. But no auto manufacturer wires them up to
> a user accessible dashboad control.

Good analogy - I believe performance tuning 3rd parties will change
them for you. So the controls are used - be it not by every user.

> 
> Back to computers:
> 
> I'd expect the OEMs that produce memory devices to set appropriate
> scrubbing rates based on their internal knowledge of the components
> used in construction.

Absolutely agree that they will set a default / baseline value,
but reality is that 'everyone' (for the first few OEMs I googled)
exposes tuning controls in their shipping BIOS menus to configure
this because there are users who want to change it.  I'd expect
them to clamp the minimum scrub frequency to something that avoids
them getting hardware returned on mass for reliability and the
maximum at whatever ensures the perf is good enough that they sell
hardware in the first place.  I'd also expect a bios menu to
allow cloud hosts etc to turn off exposing RAS2 or similar.

> 
> What is the use case where some user would need to override these
> parameters and scrub and a faster/slower rate than that set by the
> manufacturer?

Its a performance vs reliability trade off.  If your larger scale
architecture (many servers) requires a few nodes to be super stable you
will pay pretty much any cost to keep them running. If a single node
failure makes little or no difference, you'll happily crank this down
(same with refresh) in order to save some power / get a small
performance lift.  Or if you care about latency tails, more than
reliability you'll turn this off.

For comedy value, some BIOS guides point out that leaving scrub on may
affect performance benchmarking. Obviously not a good data point, but
a hint at the sort of market that cares.  Same market that buy cheaper
RAM knowing they are going to have more system crashes.

There is probably a description gap. That might be a paperwork
question as part of system specification.
What is relationship between scrub rate and error rate under particular
styles of workload (because you get a free scrub whenever you access
the memory)?  The RAM dimms themselves could in theory provide inputs
but the workload dependence makes this hard. Probably fallback on a
a test and tune loop over very long runs.  Single bit error rates
used to detect when getting below a level people are happy with for
instance.

With the fancier units that can be supported, you can play more reliable
memory games by scanning subsets of the memory more frequently.

Though it was about a kernel daemon doing scrub, Jiaqi's RFC document here
https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
provided justification for on demand scrub - some interesting stuff in the
bit on hardware patrol scrubbing.  I see you commented on the thread
and complexity of hardware solutions.

- Cheap memory makes this all more important.
- Need for configuration of how fast and when depending on system state.
- Lack of flexibility of what is scanned (RAS2 provides some by association
  with NUMA node + option to request particular ranges, CXL provides per
  end point controls).

There are some gaps on hardware scrubbers, but offloading this problem
definitely attractive.

So my understanding is there is demand to tune this but it won't be exposed
on every system.

Jonathan

 
> 
> -Tony
Jonathan Cameron March 1, 2024, 2:41 p.m. UTC | #9
> 
> > > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > > any new implementation should be RAS2.    
> > > 
> > > The assertion that "RASF patrol scrub no one cared about it as it's
> > > useless and any new implementation should be RAS2" needs evidence.
> > > 
> > > For example, what platforms are going to ship with RAS2 support, what
> > > are the implications of Linux not having RAS2 scrub support in a month,
> > > or in year? There are parts of the ACPI spec that have never been
> > > implemented what is the evidence that RAS2 is not going to suffer the
> > > same fate as RASF?   
> > 
> > From discussions with various firmware folk we have a chicken and egg
> > situation on RAS2. They will stick to their custom solutions unless there is
> > plausible support in Linux for it - so right now it's a question mark
> > on roadmaps. Trying to get rid of that question mark is why Shiju and I
> > started looking at this in the first place. To get rid of that question
> > mark we don't necessarily need to have it upstream, but we do need
> > to be able to make the argument that there will be a solution ready
> > soon after they release the BIOS image.  (Some distros will take years
> > to catch up though).
> > 
> > If anyone else an speak up on this point please do. Discussions and
> > feedback communicated to Shiju and I off list aren't going to
> > convince people :(
> > Negatives perhaps easier to give than positives given this is seen as
> > a potential feature for future platforms so may be confidential.  
> 
> So one of the observations from efforts like RAS API [1] is that CXL is
> definining mechanisms that others are using for non-CXL use cases. I.e.
> a CXL-like mailbox that supports events is a generic transport that can
> be used for many RAS scenarios not just CXL endpoints. It supplants
> building new ACPI interfaces for these things because the expectation is
> that an OS just repurposes its CXL Type-3 infrastructure to also drive
> event collection for RAS API compliant devices in the topology.
> 
> [1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream
> 
> So when considering whether Linux should build support for ACPI RASF,
> ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
> of those can supplant the others.

RAS API is certainly interesting but the bit of the discussion
that matters here will equally apply to CXL RAS controls as of today
(will ship before OCP) and Open Compute's RAS API (sometime in the future).

The subsystem presented here was to address the "show us your code" that
was inevitable feedback if we'd gone for a discussion Doc style RFC.

What really matters here is whether a common ABI is necessary and what
it looks like.
Not even the infrastructure, just whether it's sysfs and what the controls)
Sure there is less code if it all looks like that CXL get feature,
but not that much less.  + I'm hoping we'll also end up sharing with
the various embedded device solutions out there today.

I notice a few familiar names in the meeting recordings. Anyone want
to provide a summary of overlap etc and likely end result?
I scan read the docs and caught up with some meetings at high speed,
but that's not the same as day to day involvement in the spec development. 
Maybe the lesson to take away from this is a more general interface is
needed incorporating scrub control (which at this stage is probably
just a name change!)

I see that patrol scrub is on the RAS actions list which is great.

> 
> Speaking only for myself with my Linux kernel maintainer hat on, I am
> much more attracted to proposals like RAS API where native drivers can
> be deployed vs ACPI which brings ACPI static definition baggage and a
> 3rd component to manage. RAS API is kernel driver + device-firmware
> while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
> device-firmware.

Not really. The only thing needed from BIOS firmware is a static table
 to OS to describe where to find the hardware (RAS2 is a header and (1+)
pointers to the PCCT table entry that tells you where the mailbox(s)
(PCC Channel) are and their interrupts etc.  It's all of 48 bytes of
static data to parse.

Could have been done that in DSDT (where you will find other PCC channels
as many methods can use them under the hood to chat to firmware + there
are some other users where they are the only option) but my guess is
assumption is RAS might be needed pre AML interpreter so it's a static table.

A PCC channel is the ACPI spec standard mailbox design (well several
options for how to do it, but given the code is upstream and in use
for other purposes, no new maintenance burden for us :)
PCC channels can be shared resources handling multiple protocols.
They are used for various other things where the OS needs
to talk to firmware and have been upstream for a while.


ACPI driver --------<PCC Mailbox>---> Device Firmware
vs
RAS API Driver-----<CXL Mailbox>----> Device Firmware
or
CXL Driver --------<CXL Mailbox>----> Device Firmware

The new complexity (much like the CXL solution) lies in the
control protocol sent over the mailbox (which is pretty simple!)

Some of the complexity in the driver is left over from earlier
version doing RASF and RAS2 so we'll flatten that layering out
and it'll be even simpler in next RFC and perhaps not hint at
false complexity or maintenance burden.

The only significant burden I really see form incorporating RAS2
is the need for an interface that works for both (very similar)
configuration control sets.  Once that is defined we need to support
the ABI for ever anyway so may be sysfs attribute of extra ABI to
support in current design?


> 
> In other words, this patch proposal enables both CXL memscrub and ACPI
> RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
> and your clarification asserts that RAS2 is basically dead until Linux
> adopts it. So then the question becomes why should Linux breath air into
> the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?

A fair question and one where I'm looking for inputs from others.

However I think you may be assuming a lot more than is actually
involved in the RAS2 approach - see below.

> 
> The RAS API example seems to indicate that one way to get scrub support
> for non-CXL memory controllers would be to reuse CXL memscrub
> infrastructure. In a world where there is kernel mechanism to understand
> CXL-like scrub mechanisms, why not nudge the industry in that direction
> instead of continuing to build new and different ACPI mechanisms?

There may be some shared elements of course (and it seems the RAS API
stuff has severak sets of proposals for interfacing approaches), but ultimately
a RAS API element still hangs off something that isn't a CXL device, so
still demands some common infrastructure (e.g. a class or similar) or
we are going to find the RAS tools buried under a bunch of different individual
drivers.
1) Maybe shared for system components (maybe not from some of the diagrams!)
   But likely 1 interface per socket. Probably PCI, but maybe platform devices
   (I'd not be surprised to see a PCC channel type added for this mailbox)
   /sys/bus/pci/devices/pcixxx/rasstuff/etc
2) CXL devices say /sys/bus/cxl/devices/mem0/rasstuff/etc.
3) Other system components such as random PCI drivers.

Like other cases of common infrastructure, I'd argue for a nice class with
the devices parentage linking back to the underlying EP driver.
/sys/class/ras/ras0 parent ->   /sys/bus/cxl/devices/mem0/
/sys/class/ras/ras1 parent ->   /sys/bus/pci/device/pcixxx/ RAS API device.
etc

Same as if we had a bunch of devices that happened to have an LED on them
and wanted common userspace controls so registered with /sys/class/led

So to me RAS API looks like another user of this proposal that indeed
shares a bunch of common code with the CXL driver stack (hopefully they'll
move to PCI MMPT from current definition based on CXL 2.0 mailbox so the
discoverability isn't CXL spec based. (I may not have latest version of course!)

> 
> > > There are parts of the CXL specification that have
> > > never been implemented in mass market products.  
> > 
> > Obviously can't talk about who was involved in this feature
> > in it's definition, but I have strong confidence it will get implemented
> > for reasons I can point at on a public list. 
> > a) There will be scrubbing on devices.
> > b) It will need control (evidence for this is the BIOS controls mentioned below
> >    for equivalent main memory).
> > c) Hotplug means that control must be done by OS driver (or via very fiddly
> >    pre hotplug hacks that I think we can all agree should not be necessary
> >    and aren't even an option on all platforms)
> > d) No one likes custom solutions.
> > This isn't a fancy feature with a high level of complexity which helps.  
> 
> That does help, it would help even more if the maintenance burden of CXL
> scrub precludes needing to carry the burden of other implementations.

I think we disagree on whether the burden is significant - sure
we can spin interfaces differently to make it easier for CXL and we can
just stick it on the individual endpoints for now.

Key here is ABI, I don't really care about whether we wrap it up in a subsystem
(mostly we do that to enforce compliance with the ABI design as easier than
 reviewing against a document!)

I want to see userspace ABI that is general enough to extend to other
devices and doesn't require a horrible hydra of a userspace program on top
of incompatible controls because everyone wanted to do it slightly
differently.  The exercise of including RAS2 (and earlier RASF which
we dropped) was about establishing commonality and I think that was very
useful.

I'm reluctant to say it will never be necessary to support RAS2 (because
I want to see solutions well before anyone will have built OCPs proposal
and RAS2 works on many of today's systems with a small amount of firmware
work, many have existing PCC channels to appropriate management controllers
and as I understand it non standard interfaces to control the scrubbing
engines).

So I think not considering an ABI that is designed to be general is just
storing up pain for us in the future.

I'm not sure the design we have here is the right one which is why it
was an RFC :)

> 
> [..]
> > >   
> > > > Previous discussions in the community about RASF and scrub could be find here.
> > > > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > > > and some old ones,
> > > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > > >     
> > > 
> > > Do not make people hunt for old discussions, if there are useful points
> > > in that discussion that make the case for the patch set include those in
> > > the next submission, don't make people hunt for the latest state of the
> > > story.  
> > 
> > Sure, more of an essay needed along with links given we are talking
> > about the views of others.
> > 
> > Quick summary from a reread of the linked threads.
> > AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> > about inflexibility of RAS2 spec today. They were looking at some spec
> > changes to improve this + other functions to be added to RAS2.
> > I agree with it being limited, but think extending with backwards
> > compatibility isn't a problem (and ACPI spec rules in theory guarantee
> > it won't break).  I'm keen on working with current version
> > so that we can ensure the ABI design for CXL encompasses it.
> > 
> > Intel folk were cc'd but not said anything on that thread, but Tony Luck
> > did comment in Jiaqi Yan's software scrubbing discussion linked below.
> > He observed that a hardware implementation can be complex if doing range
> > based scrubbing due to interleave etc. RAS2 and CXL both side step this
> > somewhat by making it someone elses problem. In RAS2 the firmware gets
> > to program multiple scrubbers to cover the range requested. In CXL
> > for now this leaves the problem for userspace, but we can definitely
> > consider a region interface if it makes sense.
> > 
> > I'd also like to see inputs from a wider range of systems folk + other
> > CPU companies.  How easy this is to implement is heavily dependent on
> > what entity in your system is responsible for this sort of runtime
> > service and that varies a lot.  
> 
> This answers my main question of whether RAS2 is a done deal with
> shipping platforms making it awkward for Linux to *not* support RAS2, or
> if this is the start of an industry conversation that wants some Linux
> ecosystem feedback. It sounds more like the latter.

I'll let others speak up on this as I was presenting on my current outlook
and understand others are much further down the path.

> 
> > > > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/    
> > > 
> > > Yes, now that is a useful changelog, thank you for highlighting it,
> > > please follow its example.  
> > 
> > It's not a changelog as such but a RFC in text only form.
> > However indeed lots of good info in there.
> > 
> > Jonathan  
> 
> Thanks again for taking the time Jonathan.
> 
You are welcome and thanks for all the questions / pointers.

Jonathan
Jiaqi Yan March 5, 2024, 12:52 a.m. UTC | #10
On Fri, Mar 1, 2024 at 6:42 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
>
> >
> > > > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > > > any new implementation should be RAS2.
> > > >
> > > > The assertion that "RASF patrol scrub no one cared about it as it's
> > > > useless and any new implementation should be RAS2" needs evidence.
> > > >
> > > > For example, what platforms are going to ship with RAS2 support, what
> > > > are the implications of Linux not having RAS2 scrub support in a month,
> > > > or in year? There are parts of the ACPI spec that have never been
> > > > implemented what is the evidence that RAS2 is not going to suffer the
> > > > same fate as RASF?
> > >
> > > From discussions with various firmware folk we have a chicken and egg
> > > situation on RAS2. They will stick to their custom solutions unless there is
> > > plausible support in Linux for it - so right now it's a question mark
> > > on roadmaps. Trying to get rid of that question mark is why Shiju and I
> > > started looking at this in the first place. To get rid of that question
> > > mark we don't necessarily need to have it upstream, but we do need
> > > to be able to make the argument that there will be a solution ready
> > > soon after they release the BIOS image.  (Some distros will take years
> > > to catch up though).
> > >
> > > If anyone else an speak up on this point please do. Discussions and
> > > feedback communicated to Shiju and I off list aren't going to
> > > convince people :(
> > > Negatives perhaps easier to give than positives given this is seen as
> > > a potential feature for future platforms so may be confidential.
> >
> > So one of the observations from efforts like RAS API [1] is that CXL is
> > definining mechanisms that others are using for non-CXL use cases. I.e.
> > a CXL-like mailbox that supports events is a generic transport that can
> > be used for many RAS scenarios not just CXL endpoints. It supplants
> > building new ACPI interfaces for these things because the expectation is
> > that an OS just repurposes its CXL Type-3 infrastructure to also drive
> > event collection for RAS API compliant devices in the topology.

Thanks Dan for bringing up [1]. After sending out [2], the proposal of
an in-kernel "memory scrubber" implemented in **software**, I actually
paid more attention to the hardware patro scrubber given it is (at
least should be) programmable in runtime with RASF and RAS2 as
HORIGUCHI pointed out[3]. The idea is, if hardware patrol scrubber can
become as flexible as software, why let the software waste the CPU
cycles and membw? Then I attempted to
1. define the features required to scrub memory efficiently and flexibly
2. get that list of features **standardized** by engaging hw vendors
who make patrol scrubbers in their chips
3. design a Linux interface so that user space can drive **both
software and hardware scrubber** (Turns out software scrubber still
has an advantage over hardware and I will cover it later.)

[2] https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
[3] https://lore.kernel.org/all/20221109050425.GA527418@hori.linux.bs1.fc.nec.co.jp/

The difference between my work and Shiju+Jonathan's RFC is: taking a
bet on RAS2, #1 and #2 is not a problem for them.

I am under the impression that RAS2 is probably going to suffer from
RASF, but at least I hope Shiju/Jonanthan's API can be compatible with
the sw scrubber that I planned to upstream (made some suggestions[4]
for this purpose).

[4] https://lore.kernel.org/linux-mm/CACw3F539gZc0FoJLo6VvYSyZmeWZ3Pbec7AzsH+MYUJJNzQbUQ@mail.gmail.com/

**However**, ...

> >
> > [1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream

... our hardware RAS team (better expert on fault management and hw
reliability) strongly pushed back some of my proposed feature list and
pointed me exactly to [1].

From my understanding after talking to our RAS people, Open Compute
RAS API is the future to move on. Meanwhile everyone should probably
start to forget about RAS2. The memory interleave issue [5] in RASF
seemed to be carried over to RAS2, making vendors reluctant to adopt.

[5] https://lore.kernel.org/all/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/

> >
> > So when considering whether Linux should build support for ACPI RASF,
> > ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
> > of those can supplant the others.

I think Open Compute RAS API and RAS2/RASF are probably *incompatible*
from their core. Open Compute RAS API "will likely block any OS access
to the patrol scrubber", in the context of Open Compute RAS API's out
of band solution. While userspace has the need to use patrol scrubber,
"the OS doesn't understand memory enough to drive it".

Taking STOP_PATROL_SCRUBBER as example, while it may make sense to
users, stopping patrol scrubber is unacceptable for platform where OEM
has enabled patrol scrubber, because the patrol scrubber is a key part
of logging and is repurposed for other RAS actions. So from Open
Compute RAS API's perspective, STOP_PATROL_SCRUBBER from RAS2 must be
blocked and, tbh must not be exposed to OS/userspace at all.
"Requested Address Range"/"Actual Address Range" (region to scrub) is
a similarly bad thing to expose in RAS2.

But I am still seeking a common ground between Open Compute RAS API
and my feature wishlist of flexible and efficient patrol scrubber, by
taking a step back and ...

>
> RAS API is certainly interesting but the bit of the discussion
> that matters here will equally apply to CXL RAS controls as of today
> (will ship before OCP) and Open Compute's RAS API (sometime in the future).
>
> The subsystem presented here was to address the "show us your code" that
> was inevitable feedback if we'd gone for a discussion Doc style RFC.
>
> What really matters here is whether a common ABI is necessary and what
> it looks like.
> Not even the infrastructure, just whether it's sysfs and what the controls)
> Sure there is less code if it all looks like that CXL get feature,
> but not that much less.  + I'm hoping we'll also end up sharing with
> the various embedded device solutions out there today.
>
> I notice a few familiar names in the meeting recordings. Anyone want
> to provide a summary of overlap etc and likely end result?
> I scan read the docs and caught up with some meetings at high speed,
> but that's not the same as day to day involvement in the spec development.
> Maybe the lesson to take away from this is a more general interface is
> needed incorporating scrub control (which at this stage is probably
> just a name change!)
>
> I see that patrol scrub is on the RAS actions list which is great.
>
> >
> > Speaking only for myself with my Linux kernel maintainer hat on, I am
> > much more attracted to proposals like RAS API where native drivers can
> > be deployed vs ACPI which brings ACPI static definition baggage and a
> > 3rd component to manage. RAS API is kernel driver + device-firmware
> > while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
> > device-firmware.
>
> Not really. The only thing needed from BIOS firmware is a static table
>  to OS to describe where to find the hardware (RAS2 is a header and (1+)
> pointers to the PCCT table entry that tells you where the mailbox(s)
> (PCC Channel) are and their interrupts etc.  It's all of 48 bytes of
> static data to parse.
>
> Could have been done that in DSDT (where you will find other PCC channels
> as many methods can use them under the hood to chat to firmware + there
> are some other users where they are the only option) but my guess is
> assumption is RAS might be needed pre AML interpreter so it's a static table.
>
> A PCC channel is the ACPI spec standard mailbox design (well several
> options for how to do it, but given the code is upstream and in use
> for other purposes, no new maintenance burden for us :)
> PCC channels can be shared resources handling multiple protocols.
> They are used for various other things where the OS needs
> to talk to firmware and have been upstream for a while.
>
>
> ACPI driver --------<PCC Mailbox>---> Device Firmware
> vs
> RAS API Driver-----<CXL Mailbox>----> Device Firmware
> or
> CXL Driver --------<CXL Mailbox>----> Device Firmware
>
> The new complexity (much like the CXL solution) lies in the
> control protocol sent over the mailbox (which is pretty simple!)
>
> Some of the complexity in the driver is left over from earlier
> version doing RASF and RAS2 so we'll flatten that layering out
> and it'll be even simpler in next RFC and perhaps not hint at
> false complexity or maintenance burden.
>
> The only significant burden I really see form incorporating RAS2
> is the need for an interface that works for both (very similar)
> configuration control sets.  Once that is defined we need to support
> the ABI for ever anyway so may be sysfs attribute of extra ABI to
> support in current design?
>
>
> >
> > In other words, this patch proposal enables both CXL memscrub and ACPI
> > RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
> > and your clarification asserts that RAS2 is basically dead until Linux
> > adopts it. So then the question becomes why should Linux breath air into
> > the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?
>
> A fair question and one where I'm looking for inputs from others.
>
> However I think you may be assuming a lot more than is actually
> involved in the RAS2 approach - see below.
>
> >
> > The RAS API example seems to indicate that one way to get scrub support
> > for non-CXL memory controllers would be to reuse CXL memscrub
> > infrastructure. In a world where there is kernel mechanism to understand
> > CXL-like scrub mechanisms, why not nudge the industry in that direction
> > instead of continuing to build new and different ACPI mechanisms?
>
> There may be some shared elements of course (and it seems the RAS API
> stuff has severak sets of proposals for interfacing approaches), but ultimately
> a RAS API element still hangs off something that isn't a CXL device, so
> still demands some common infrastructure (e.g. a class or similar) or
> we are going to find the RAS tools buried under a bunch of different individual
> drivers.
> 1) Maybe shared for system components (maybe not from some of the diagrams!)
>    But likely 1 interface per socket. Probably PCI, but maybe platform devices
>    (I'd not be surprised to see a PCC channel type added for this mailbox)
>    /sys/bus/pci/devices/pcixxx/rasstuff/etc
> 2) CXL devices say /sys/bus/cxl/devices/mem0/rasstuff/etc.
> 3) Other system components such as random PCI drivers.
>
> Like other cases of common infrastructure, I'd argue for a nice class with
> the devices parentage linking back to the underlying EP driver.
> /sys/class/ras/ras0 parent ->   /sys/bus/cxl/devices/mem0/
> /sys/class/ras/ras1 parent ->   /sys/bus/pci/device/pcixxx/ RAS API device.
> etc
>
> Same as if we had a bunch of devices that happened to have an LED on them
> and wanted common userspace controls so registered with /sys/class/led
>
> So to me RAS API looks like another user of this proposal that indeed
> shares a bunch of common code with the CXL driver stack (hopefully they'll
> move to PCI MMPT from current definition based on CXL 2.0 mailbox so the
> discoverability isn't CXL spec based. (I may not have latest version of course!)
>
> >
> > > > There are parts of the CXL specification that have
> > > > never been implemented in mass market products.
> > >
> > > Obviously can't talk about who was involved in this feature
> > > in it's definition, but I have strong confidence it will get implemented
> > > for reasons I can point at on a public list.
> > > a) There will be scrubbing on devices.
> > > b) It will need control (evidence for this is the BIOS controls mentioned below
> > >    for equivalent main memory).
> > > c) Hotplug means that control must be done by OS driver (or via very fiddly
> > >    pre hotplug hacks that I think we can all agree should not be necessary
> > >    and aren't even an option on all platforms)
> > > d) No one likes custom solutions.
> > > This isn't a fancy feature with a high level of complexity which helps.
> >
> > That does help, it would help even more if the maintenance burden of CXL
> > scrub precludes needing to carry the burden of other implementations.
>
> I think we disagree on whether the burden is significant - sure
> we can spin interfaces differently to make it easier for CXL and we can
> just stick it on the individual endpoints for now.
>
> Key here is ABI, I don't really care about whether we wrap it up in a subsystem
> (mostly we do that to enforce compliance with the ABI design as easier than
>  reviewing against a document!)
>
> I want to see userspace ABI that is general enough to extend to other
> devices and doesn't require a horrible hydra of a userspace program on top
> of incompatible controls because everyone wanted to do it slightly
> differently.  The exercise of including RAS2 (and earlier RASF which
> we dropped) was about establishing commonality and I think that was very
> useful.
>
> I'm reluctant to say it will never be necessary to support RAS2 (because
> I want to see solutions well before anyone will have built OCPs proposal
> and RAS2 works on many of today's systems with a small amount of firmware
> work, many have existing PCC channels to appropriate management controllers
> and as I understand it non standard interfaces to control the scrubbing
> engines).
>
> So I think not considering an ABI that is designed to be general is just
> storing up pain for us in the future.

considering user demands. In the end hardware is implemented to
fulfill buyer's needs. I think I should just ask for both
valuable-to-customer and compatible-with-"Open Compute RAS API"
features and drop others (for example, stop is dropped). Given my role
in cloud provider, I wanted to start with two key features:
* Adjust the speed of the scrubbing within vendor defined range.
Adding to Jonathan's reply[6] to Tony, in cloud only the control brain
of the fleet running in userspace or remote knows when performance is
important (a customer VM is present on the host) and when reliability
is important (host is idle but should be well-prepared to serve
customers without memory errors).
* Granularity. Per memory controller control granularity is
unrealistic. How about per NUMA node? Starting with a whole host is
also acceptable right now.

Does these 2 feature requests make sense to people here, and should be
covered by Open Compute RAS API (if they are not)? If both are, then
#1 and #2 are solved by Open Compute RAS API and kernel developer can
start to design OS API.

[6] https://lore.kernel.org/linux-mm/20240220111423.00003eae@Huawei.com/T/#m8d0b0737e2e5704529cc13b55008710a928b62b8

>
> I'm not sure the design we have here is the right one which is why it
> was an RFC :)
>
> >
> > [..]
> > > >
> > > > > Previous discussions in the community about RASF and scrub could be find here.
> > > > > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > > > > and some old ones,
> > > > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > > > >
> > > >
> > > > Do not make people hunt for old discussions, if there are useful points
> > > > in that discussion that make the case for the patch set include those in
> > > > the next submission, don't make people hunt for the latest state of the
> > > > story.
> > >
> > > Sure, more of an essay needed along with links given we are talking
> > > about the views of others.
> > >
> > > Quick summary from a reread of the linked threads.
> > > AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> > > about inflexibility of RAS2 spec today. They were looking at some spec
> > > changes to improve this + other functions to be added to RAS2.
> > > I agree with it being limited, but think extending with backwards
> > > compatibility isn't a problem (and ACPI spec rules in theory guarantee
> > > it won't break).  I'm keen on working with current version
> > > so that we can ensure the ABI design for CXL encompasses it.
> > >
> > > Intel folk were cc'd but not said anything on that thread, but Tony Luck
> > > did comment in Jiaqi Yan's software scrubbing discussion linked below.
> > > He observed that a hardware implementation can be complex if doing range
> > > based scrubbing due to interleave etc. RAS2 and CXL both side step this
> > > somewhat by making it someone elses problem. In RAS2 the firmware gets
> > > to program multiple scrubbers to cover the range requested. In CXL
> > > for now this leaves the problem for userspace, but we can definitely
> > > consider a region interface if it makes sense.
> > >
> > > I'd also like to see inputs from a wider range of systems folk + other
> > > CPU companies.  How easy this is to implement is heavily dependent on
> > > what entity in your system is responsible for this sort of runtime
> > > service and that varies a lot.
> >
> > This answers my main question of whether RAS2 is a done deal with
> > shipping platforms making it awkward for Linux to *not* support RAS2, or
> > if this is the start of an industry conversation that wants some Linux
> > ecosystem feedback. It sounds more like the latter.
>
> I'll let others speak up on this as I was presenting on my current outlook
> and understand others are much further down the path.
>
> >
> > > > > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/

Last thing about the advantage of the above in-kernel "software
scrubber" vs hardware patrol scrubber. A better way to prevent +
detect memory errors is do a write, followed by a read op. The write
op is very difficult, if not impossible, to be fulfilled by hardware
patrol scrubber at **OS runtime** (during boot time as some sort of
memory test is possible); there must be some negotiations with
userspace. The hw won't even know if a page is free to write (not used
by anything in OS). But I think this write-read-then-check idea is
feasible in the software scrubber. That's why I want a general memory
scrub kernel API for both software and patrol scrubber.

> > > >
> > > > Yes, now that is a useful changelog, thank you for highlighting it,
> > > > please follow its example.
> > >
> > > It's not a changelog as such but a RFC in text only form.

Hopefully the software solution can still be attractive to upstream
after people now pay more attention to the hardware solution (and I
hope to send out new RFC with code).

> > > However indeed lots of good info in there.
> > >
> > > Jonathan
> >
> > Thanks again for taking the time Jonathan.
> >
> You are welcome and thanks for all the questions / pointers.
>
> Jonathan
>
>

Thanks,
Jiaqi