Message ID | 20250326074450.937819-1-lizhijian@fujitsu.com |
---|---|
State | New |
Headers | show |
Series | cxl/acpi: Verify CHBS length for CXL2.0 | expand |
Li Zhijian wrote: > Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their > own length, verify it to avoid an invalid CHBS I think this looks fine. But did a platform have issues with this? Does this need to be backported? Reviewed-by: Ira Weiny <ira.weiny@intel.com> [snip]
On 27/03/2025 11:44, Ira Weiny wrote: > Li Zhijian wrote: >> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their >> own length, verify it to avoid an invalid CHBS > > > I think this looks fine. But did a platform have issues with this? Not really, actually, I discovered it while reviewing the code and CXL specification. Currently, this issue arises only when I inject an incorrect length via QEMU environment. Our hardware does not experience this problem. > Does this need to be backported? I remain neutral :) Thanks > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > [snip]
Zhijian Li (Fujitsu) wrote: > > > On 27/03/2025 11:44, Ira Weiny wrote: > > Li Zhijian wrote: > >> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their > >> own length, verify it to avoid an invalid CHBS > > > > > > I think this looks fine. But did a platform have issues with this? > > Not really, actually, I discovered it while reviewing the code and > CXL specification. > > Currently, this issue arises only when I inject an incorrect length > via QEMU environment. Our hardware does not experience this problem. > > > > Does this need to be backported? > I remain neutral :) What does the kernel do with this invalid CHBS from QEMU? I would be happy to let whatever bad effect from injecting a corrupted CHBS just happen because there are plenty of ways for QEMU to confuse the kernel even if the table lengths are correct. Unless it has real impact I would rather not touch the kernel for every possible way that QEMU can make a mistake. I.e. if it was a widespread problem that affected multiple QEMU users by default then maybe. Just your local test gone awry? Maybe not.
On 27/03/2025 21:36, Dan Williams wrote: > Zhijian Li (Fujitsu) wrote: >> >> >> On 27/03/2025 11:44, Ira Weiny wrote: >>> Li Zhijian wrote: >>>> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their >>>> own length, verify it to avoid an invalid CHBS >>> >>> >>> I think this looks fine. But did a platform have issues with this? >> >> Not really, actually, I discovered it while reviewing the code and >> CXL specification. >> >> Currently, this issue arises only when I inject an incorrect length >> via QEMU environment. Our hardware does not experience this problem. >> >> >>> Does this need to be backported? >> I remain neutral :) > > What does the kernel do with this invalid CHBS from QEMU? I would be > happy to let whatever bad effect from injecting a corrupted CHBS just > happen because there are plenty of ways for QEMU to confuse the kernel > even if the table lengths are correct. > > Unless it has real impact I would rather not touch the kernel for every > possible way that QEMU can make a mistake. Thank you for the feedback. If your earlier comments were specifically about ***backporting*** this patch, I agree there might not be an urgent need for that. However, regarding the discussion on whether this patch should be accepted upstream, TBH, I believe it is necessary. 1. The **CXL Specification (r3.1, Table 9-21)** explicitly defines `length` requirements for CHBS in both CXL 1.1 and CXL 2.0 cases. Failing to validate this field against the spec risks misinterpretation of invalid configurations. 2. As mentioned in section **2.13.8** of the *CXL Memory Device Software Guide (Rev 1.0)*, It's recommended to verify the CHBS length. While the immediate impact might be limited to edge cases (e.g., incorrect QEMU configurations), upstreaming this aligns the kernel with spec-mandated checks and improves robustness for future use cases. [1] https://cdrdv2-public.intel.com/643805/643805_CXL_Memory_Device_SW_Guide_Rev1_1.pdf > > I.e. if it was a widespread problem that affected multiple QEMU users by > default then maybe. Just your local test gone awry? Maybe not.
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index cb14829bb9be..4a82d2d8b547 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -11,8 +11,6 @@ #include "cxlpci.h" #include "cxl.h" -#define CXL_RCRB_SIZE SZ_8K - struct cxl_cxims_data { int nr_maps; u64 xormaps[] __counted_by(nr_maps); @@ -468,8 +466,35 @@ struct cxl_chbs_context { u32 cxl_version; int nr_versions; u32 saved_version; + u32 length; }; +static bool cxl_chbs_verify(struct device *dev, struct cxl_chbs_context ctx) +{ + if (ctx.cxl_version == UINT_MAX) { + dev_warn(dev, "No CHBS found for Host Bridge (UID %lld)\n", + ctx.uid); + return false; + } + + if (ctx.base == CXL_RESOURCE_NONE) { + dev_warn(dev, "CHBS invalid for Host Bridge (UID %lld)\n", + ctx.uid); + return false; + } + + if ((ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11 && + ctx.length != ACPI_CEDT_CHBS_LENGTH_CXL11) || + (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20 && + ctx.length != ACPI_CEDT_CHBS_LENGTH_CXL20)) { + dev_warn(dev, "Invalid length for Host Bridge (UID %lld)\n", + ctx.uid); + return false; + } + + return true; +} + static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg, const unsigned long end) { @@ -478,10 +503,6 @@ static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg, chbs = (struct acpi_cedt_chbs *) header; - if (chbs->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11 && - chbs->length != CXL_RCRB_SIZE) - return 0; - if (!chbs->base) return 0; @@ -502,6 +523,7 @@ static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg, ctx->cxl_version = chbs->cxl_version; ctx->base = chbs->base; + ctx->length = chbs->length; return 0; } @@ -570,17 +592,8 @@ static int add_host_bridge_dport(struct device *match, void *arg) if (rc) return rc; - if (ctx.cxl_version == UINT_MAX) { - dev_warn(match, "No CHBS found for Host Bridge (UID %lld)\n", - ctx.uid); - return 0; - } - - if (ctx.base == CXL_RESOURCE_NONE) { - dev_warn(match, "CHBS invalid for Host Bridge (UID %lld)\n", - ctx.uid); + if (!cxl_chbs_verify(match, ctx)) return 0; - } pci_root = acpi_pci_find_root(hb->handle); bridge = pci_root->bus->bridge;
Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their own length, verify it to avoid an invalid CHBS This patch also wraps some sanity checks into a function. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- drivers/cxl/acpi.c | 45 +++++++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 16 deletions(-)