diff mbox series

cxl/acpi: Verify CHBS length for CXL2.0

Message ID 20250326074450.937819-1-lizhijian@fujitsu.com
State New
Headers show
Series cxl/acpi: Verify CHBS length for CXL2.0 | expand

Commit Message

Zhijian Li (Fujitsu) March 26, 2025, 7:44 a.m. UTC
Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
own length, verify it to avoid an invalid CHBS

This patch also wraps some sanity checks into a function.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/cxl/acpi.c | 45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

Comments

Ira Weiny March 27, 2025, 3:44 a.m. UTC | #1
Li Zhijian wrote:
> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
> own length, verify it to avoid an invalid CHBS


I think this looks fine.  But did a platform have issues with this?  Does
this need to be backported?

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]
Zhijian Li (Fujitsu) March 27, 2025, 7:44 a.m. UTC | #2
On 27/03/2025 11:44, Ira Weiny wrote:
> Li Zhijian wrote:
>> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
>> own length, verify it to avoid an invalid CHBS
> 
> 
> I think this looks fine.  But did a platform have issues with this?  

Not really, actually, I discovered it while reviewing the code and
CXL specification.

Currently, this issue arises only when I inject an incorrect length
via QEMU environment. Our hardware does not experience this problem.


> Does this need to be backported?
I remain neutral :)

Thanks



> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> 
> [snip]
Dan Williams March 27, 2025, 1:36 p.m. UTC | #3
Zhijian Li (Fujitsu) wrote:
> 
> 
> On 27/03/2025 11:44, Ira Weiny wrote:
> > Li Zhijian wrote:
> >> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
> >> own length, verify it to avoid an invalid CHBS
> > 
> > 
> > I think this looks fine.  But did a platform have issues with this?  
> 
> Not really, actually, I discovered it while reviewing the code and
> CXL specification.
> 
> Currently, this issue arises only when I inject an incorrect length
> via QEMU environment. Our hardware does not experience this problem.
> 
> 
> > Does this need to be backported?
> I remain neutral :)

What does the kernel do with this invalid CHBS from QEMU? I would be
happy to let whatever bad effect from injecting a corrupted CHBS just
happen because there are plenty of ways for QEMU to confuse the kernel
even if the table lengths are correct.

Unless it has real impact I would rather not touch the kernel for every
possible way that QEMU can make a mistake.

I.e. if it was a widespread problem that affected multiple QEMU users by
default then maybe. Just your local test gone awry? Maybe not.
Zhijian Li (Fujitsu) March 28, 2025, 4:15 a.m. UTC | #4
On 27/03/2025 21:36, Dan Williams wrote:
> Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/03/2025 11:44, Ira Weiny wrote:
>>> Li Zhijian wrote:
>>>> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
>>>> own length, verify it to avoid an invalid CHBS
>>>
>>>
>>> I think this looks fine.  But did a platform have issues with this?
>>
>> Not really, actually, I discovered it while reviewing the code and
>> CXL specification.
>>
>> Currently, this issue arises only when I inject an incorrect length
>> via QEMU environment. Our hardware does not experience this problem.
>>
>>
>>> Does this need to be backported?
>> I remain neutral :)
> 
> What does the kernel do with this invalid CHBS from QEMU? I would be
> happy to let whatever bad effect from injecting a corrupted CHBS just
> happen because there are plenty of ways for QEMU to confuse the kernel
> even if the table lengths are correct.
> 
> Unless it has real impact I would rather not touch the kernel for every
> possible way that QEMU can make a mistake.



Thank you for the feedback.

If your earlier comments were specifically about ***backporting*** this patch,
I agree there might not be an urgent need for that.

However, regarding the discussion on whether this patch should be accepted
upstream, TBH, I believe it is necessary.

1. The **CXL Specification (r3.1, Table 9-21)** explicitly defines `length`
requirements for CHBS in both CXL 1.1 and CXL 2.0 cases. Failing to
validate this field against the spec risks misinterpretation of invalid
configurations.

2. As mentioned in section **2.13.8** of the *CXL Memory Device Software Guide (Rev 1.0)*,
It's recommended to verify the CHBS length.

While the immediate impact might be limited to edge cases (e.g., incorrect QEMU configurations),
upstreaming this aligns the kernel with spec-mandated checks and improves
robustness for future use cases.

[1] https://cdrdv2-public.intel.com/643805/643805_CXL_Memory_Device_SW_Guide_Rev1_1.pdf


> 
> I.e. if it was a widespread problem that affected multiple QEMU users by
> default then maybe. Just your local test gone awry? Maybe not.
diff mbox series

Patch

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index cb14829bb9be..4a82d2d8b547 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -11,8 +11,6 @@ 
 #include "cxlpci.h"
 #include "cxl.h"
 
-#define CXL_RCRB_SIZE	SZ_8K
-
 struct cxl_cxims_data {
 	int nr_maps;
 	u64 xormaps[] __counted_by(nr_maps);
@@ -468,8 +466,35 @@  struct cxl_chbs_context {
 	u32 cxl_version;
 	int nr_versions;
 	u32 saved_version;
+	u32 length;
 };
 
+static bool cxl_chbs_verify(struct device *dev, struct cxl_chbs_context ctx)
+{
+	if (ctx.cxl_version == UINT_MAX) {
+		dev_warn(dev, "No CHBS found for Host Bridge (UID %lld)\n",
+			 ctx.uid);
+		return false;
+	}
+
+	if (ctx.base == CXL_RESOURCE_NONE) {
+		dev_warn(dev, "CHBS invalid for Host Bridge (UID %lld)\n",
+			 ctx.uid);
+		return false;
+	}
+
+	if ((ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11 &&
+	     ctx.length != ACPI_CEDT_CHBS_LENGTH_CXL11) ||
+	    (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20 &&
+	     ctx.length != ACPI_CEDT_CHBS_LENGTH_CXL20)) {
+		dev_warn(dev, "Invalid length for Host Bridge (UID %lld)\n",
+			 ctx.uid);
+		return false;
+	}
+
+	return true;
+}
+
 static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg,
 			     const unsigned long end)
 {
@@ -478,10 +503,6 @@  static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg,
 
 	chbs = (struct acpi_cedt_chbs *) header;
 
-	if (chbs->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11 &&
-	    chbs->length != CXL_RCRB_SIZE)
-		return 0;
-
 	if (!chbs->base)
 		return 0;
 
@@ -502,6 +523,7 @@  static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg,
 
 	ctx->cxl_version = chbs->cxl_version;
 	ctx->base = chbs->base;
+	ctx->length = chbs->length;
 
 	return 0;
 }
@@ -570,17 +592,8 @@  static int add_host_bridge_dport(struct device *match, void *arg)
 	if (rc)
 		return rc;
 
-	if (ctx.cxl_version == UINT_MAX) {
-		dev_warn(match, "No CHBS found for Host Bridge (UID %lld)\n",
-			 ctx.uid);
-		return 0;
-	}
-
-	if (ctx.base == CXL_RESOURCE_NONE) {
-		dev_warn(match, "CHBS invalid for Host Bridge (UID %lld)\n",
-			 ctx.uid);
+	if (!cxl_chbs_verify(match, ctx))
 		return 0;
-	}
 
 	pci_root = acpi_pci_find_root(hb->handle);
 	bridge = pci_root->bus->bridge;