diff mbox series

[v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types

Message ID 20230403151849.43408-1-Benjamin.Cheatham@amd.com (mailing list archive)
State Changes Requested, archived
Headers show
Series [v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types | expand

Commit Message

Ben Cheatham April 3, 2023, 3:18 p.m. UTC
From: Yazen Ghannam <yazen.ghannam@amd.com>

V2 Changes:
 - Added Link tags for links
 - removed stray unused variable

This patch is a follow up to the discussion at [1], and builds on Tony's
CXL error patch at [2].

The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped Downstream port. The value of the Memory Address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.

Allow error injection for CXL 1.1 systems by skipping memory range
validation for CXL error injection types.

Output trying to inject CXL.mem error without patch:

-bash: echo: write error: Invalid argument

[1]:
Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
[2]:
Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
 drivers/acpi/apei/einj.c | 12 +++++++++++-
 include/acpi/actbl1.h    |  6 ++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

Dan Williams May 31, 2023, 9:31 p.m. UTC | #1
Hi Ben,

Ben Cheatham wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> V2 Changes:
>  - Added Link tags for links
>  - removed stray unused variable
> 
> This patch is a follow up to the discussion at [1], and builds on Tony's
> CXL error patch at [2].
> 
> The new CXL error types will use the Memory Address field in the
> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> compliant memory-mapped Downstream port. The value of the Memory Address
> will be in the port's MMIO range, and it will not represent physical
> (normal or persistent) memory.
> 
> Allow error injection for CXL 1.1 systems by skipping memory range
> validation for CXL error injection types.

This just feels a bit too loose especially when the kernel has
the cxl_acpi driver to perform the enumeration of CXL root ports.

I know that Terry and Robert are teaching the PCI AER core how to
coordinate with RCRB information [1] (I still need to go dig in deeper
on that set). I would expect ACPI EINJ could benefit from similar
coordination and validate these addresses.

Now, is it any address in the downstream-port RCRB range that is valid,
or only the base?

Another minor comment below...

[1]: http://lore.kernel.org/r/20230523232214.55282-1-terry.bowman@amd.com

> 
> Output trying to inject CXL.mem error without patch:
> 
> -bash: echo: write error: Invalid argument
> 
> [1]:
> Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
> [2]:
> Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
>  drivers/acpi/apei/einj.c | 12 +++++++++++-
>  include/acpi/actbl1.h    |  6 ++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 013eb621dc92..68a20326ed7c 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -37,6 +37,13 @@
>  				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>  				ACPI_EINJ_MEMORY_FATAL)
>  
> +#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
> +				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
> +				ACPI_EINJ_CXL_CACHE_FATAL		| \
> +				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
> +				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
> +				ACPI_EINJ_CXL_MEM_FATAL)
> +
>  /*
>   * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>   */
> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>  	if (type & ACPI5_VENDOR_BIT) {
>  		if (vendor_flags != SETWA_FLAGS_MEM)
>  			goto inject;
> -	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> +	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> +		goto inject;
> +	} else if (type & CXL_ERROR_MASK) {
>  		goto inject;
> +	}
>  
>  	/*
>  	 * Disallow crazy address masks that give BIOS leeway to pick
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 81b9e794424d..c39837266414 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1044,6 +1044,12 @@ enum acpi_einj_command_status {
>  #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
>  #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
>  #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
> +#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
> +#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)

I expect these to come from the next ACPICA update just like the other
definitions. The change in style from (x<<N) to BIT(N) was a tip-off.
The process is to submit a pull request to the ACPICA project, for
example:

https://github.com/acpica/acpica/commit/e948142526c0
Dan Williams June 1, 2023, 3:57 a.m. UTC | #2
Dan Williams wrote:
> Hi Ben,
> 
> Ben Cheatham wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> > 
> > V2 Changes:
> >  - Added Link tags for links
> >  - removed stray unused variable
> > 
> > This patch is a follow up to the discussion at [1], and builds on Tony's
> > CXL error patch at [2].
> > 
> > The new CXL error types will use the Memory Address field in the
> > SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> > compliant memory-mapped Downstream port. The value of the Memory Address
> > will be in the port's MMIO range, and it will not represent physical
> > (normal or persistent) memory.
> > 
> > Allow error injection for CXL 1.1 systems by skipping memory range
> > validation for CXL error injection types.
> 
> This just feels a bit too loose especially when the kernel has
> the cxl_acpi driver to perform the enumeration of CXL root ports.
> 
> I know that Terry and Robert are teaching the PCI AER core how to
> coordinate with RCRB information [1] (I still need to go dig in deeper
> on that set). I would expect ACPI EINJ could benefit from similar
> coordination and validate these addresses.
> 
> Now, is it any address in the downstream-port RCRB range that is valid,
> or only the base?

Now that I say the above, I wonder if the legacy EINJ interface is even
the right interface for CXL protocol error injection. It feels like
something that should be relative to the given 'struct cxl_port'
instance where the error is to be injected. Userspace should not need to
deal in MMIO addresses, especially in the RCH case where the RCRB is
difficult for userspace to enumerate.
Ben Cheatham June 1, 2023, 2:45 p.m. UTC | #3
Thanks for the review Dan, responses inline.

On 5/31/23 4:31 PM, Dan Williams wrote:
> Hi Ben,
> 
> Ben Cheatham wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> V2 Changes:
>>  - Added Link tags for links
>>  - removed stray unused variable
>>
>> This patch is a follow up to the discussion at [1], and builds on Tony's
>> CXL error patch at [2].
>>
>> The new CXL error types will use the Memory Address field in the
>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>> compliant memory-mapped Downstream port. The value of the Memory Address
>> will be in the port's MMIO range, and it will not represent physical
>> (normal or persistent) memory.
>>
>> Allow error injection for CXL 1.1 systems by skipping memory range
>> validation for CXL error injection types.
> 
> This just feels a bit too loose especially when the kernel has
> the cxl_acpi driver to perform the enumeration of CXL root ports.
> 
> I know that Terry and Robert are teaching the PCI AER core how to
> coordinate with RCRB information [1] (I still need to go dig in deeper
> on that set). I would expect ACPI EINJ could benefit from similar
> coordination and validate these addresses.
>> Now, is it any address in the downstream-port RCRB range that is valid,
> or only the base?
> 
Response to above in your follow up email.

> Another minor comment below...
> 
> [1]: http://lore.kernel.org/r/20230523232214.55282-1-terry.bowman@amd.com
> 
>>
>> Output trying to inject CXL.mem error without patch:
>>
>> -bash: echo: write error: Invalid argument
>>
>> [1]:
>> Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
>> [2]:
>> Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t
>>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> ---
>>  drivers/acpi/apei/einj.c | 12 +++++++++++-
>>  include/acpi/actbl1.h    |  6 ++++++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 013eb621dc92..68a20326ed7c 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -37,6 +37,13 @@
>>  				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>>  				ACPI_EINJ_MEMORY_FATAL)
>>  
>> +#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
>> +				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
>> +				ACPI_EINJ_CXL_CACHE_FATAL		| \
>> +				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
>> +				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
>> +				ACPI_EINJ_CXL_MEM_FATAL)
>> +
>>  /*
>>   * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>>   */
>> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>>  	if (type & ACPI5_VENDOR_BIT) {
>>  		if (vendor_flags != SETWA_FLAGS_MEM)
>>  			goto inject;
>> -	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>> +	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>> +		goto inject;
>> +	} else if (type & CXL_ERROR_MASK) {
>>  		goto inject;
>> +	}
>>  
>>  	/*
>>  	 * Disallow crazy address masks that give BIOS leeway to pick
>> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
>> index 81b9e794424d..c39837266414 100644
>> --- a/include/acpi/actbl1.h
>> +++ b/include/acpi/actbl1.h
>> @@ -1044,6 +1044,12 @@ enum acpi_einj_command_status {
>>  #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
>>  #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
>>  #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
>> +#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
>> +#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)
> 
> I expect these to come from the next ACPICA update just like the other
> definitions. The change in style from (x<<N) to BIT(N) was a tip-off.
> The process is to submit a pull request to the ACPICA project, for
> example:
> 
> https://github.com/acpica/acpica/commit/e948142526c0

I wasn't aware of this, I'll go ahead and do that.
Ben Cheatham June 1, 2023, 2:45 p.m. UTC | #4
On 5/31/23 10:57 PM, Dan Williams wrote:
> Dan Williams wrote:
>> Hi Ben,
>>
>> Ben Cheatham wrote:
>>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>>
>>> V2 Changes:
>>>  - Added Link tags for links
>>>  - removed stray unused variable
>>>
>>> This patch is a follow up to the discussion at [1], and builds on Tony's
>>> CXL error patch at [2].
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped Downstream port. The value of the Memory Address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Allow error injection for CXL 1.1 systems by skipping memory range
>>> validation for CXL error injection types.
>>
>> This just feels a bit too loose especially when the kernel has
>> the cxl_acpi driver to perform the enumeration of CXL root ports.
>>
>> I know that Terry and Robert are teaching the PCI AER core how to
>> coordinate with RCRB information [1] (I still need to go dig in deeper
>> on that set). I would expect ACPI EINJ could benefit from similar
>> coordination and validate these addresses.
>>
I haven't kept up with that patch set either, I'll take a look.

>> Now, is it any address in the downstream-port RCRB range that is valid,
>> or only the base?
I took another look at the ACPI 6.5 spec since it's been a while and it doesn't specify,
but when I tested this patch I think I only used the base address. I'll try out other
addresses in the range and see if they work as well.

> 
> Now that I say the above, I wonder if the legacy EINJ interface is even
> the right interface for CXL protocol error injection. It feels like
> something that should be relative to the given 'struct cxl_port'
> instance where the error is to be injected. Userspace should not need to
> deal in MMIO addresses, especially in the RCH case where the RCRB is
> difficult for userspace to enumerate.
I think that's a good idea. My first idea is to have sysfs files under the port that 
are basically a shortcut for an EINJ injection through the legacy interface. I think
it might be possible to have just the sysfs files and no legacy interface support,
but since I'll most likely be reusing the EINJ code I imagine the legacy interface 
will have to be updated anyway.
diff mbox series

Patch

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..68a20326ed7c 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -37,6 +37,13 @@ 
 				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
 				ACPI_EINJ_MEMORY_FATAL)
 
+#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
+				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
+				ACPI_EINJ_CXL_CACHE_FATAL		| \
+				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
+				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
+				ACPI_EINJ_CXL_MEM_FATAL)
+
 /*
  * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
  */
@@ -537,8 +544,11 @@  static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	if (type & ACPI5_VENDOR_BIT) {
 		if (vendor_flags != SETWA_FLAGS_MEM)
 			goto inject;
-	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
+		goto inject;
+	} else if (type & CXL_ERROR_MASK) {
 		goto inject;
+	}
 
 	/*
 	 * Disallow crazy address masks that give BIOS leeway to pick
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 81b9e794424d..c39837266414 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1044,6 +1044,12 @@  enum acpi_einj_command_status {
 #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
 #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
 #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
+#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
+#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
+#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
+#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
+#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
+#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)
 #define ACPI_EINJ_VENDOR_DEFINED            (1<<31)
 
 /*******************************************************************************