Message ID | 20250114203432.31861-3-fabio.m.de.francesco@linux.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | cxl/core: Enable Region creation on x86 with Low Mem Hole | expand |
On Tue, Jan 14, 2025 at 09:32:54PM +0100, Fabio M. De Francesco wrote: > +/* > + * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. > + * > + * On x86, CFMWS ranges never intersect memory holes while endpoint decoders > + * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, > + * the given endpoint decoder HPA range size is always expected aligned and > + * also larger than that of the matching root decoder. If there are LMH's, > + * the root decoder range end is always less than SZ_4G. > + */ Is there any reason to limit this memory-hole handling to only low memory holes? I have observed systems where the following memory hole situation occurs: (example, not exact sizes) CFMW1: [ 0xc0000000 - 0xdfffffff ] 512MB range Reserved [ 0xe0000000 - 0xffffffff ] 512MB range CFMW2: [ 0x100000000 - 0x15fffffff ] 1.5GB range 2 CXL Memory Devices w/ 1GB capacity each (again, an example). Note that 1 device has its capacity split across the hole, but if the devices are interleaved then both devices have their capacity split across the hole. It seems with some mild modification, this patch set could be re-used to handle this memory hole scenario as well (w/ addr translation - Robert's patch set) Is there a reason not to handle more than just LMH's in this set? (I may try to hack this up on my test system and report back.) ~Gregory
On Wednesday, January 15, 2025 3:23:36 AM GMT+1 Gregory Price wrote: > On Tue, Jan 14, 2025 at 09:32:54PM +0100, Fabio M. De Francesco wrote: > > +/* > > + * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. > > + * > > + * On x86, CFMWS ranges never intersect memory holes while endpoint decoders > > + * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, > > + * the given endpoint decoder HPA range size is always expected aligned and > > + * also larger than that of the matching root decoder. If there are LMH's, > > + * the root decoder range end is always less than SZ_4G. > > + */ > > Is there any reason to limit this memory-hole handling to only low > memory holes? > No I don't see any special reasons to limit this to only low memory holes. It's just that I didn't know about others. > > I have observed systems where the following memory > hole situation occurs: > > (example, not exact sizes) > CFMW1: [ 0xc0000000 - 0xdfffffff ] 512MB range > Reserved [ 0xe0000000 - 0xffffffff ] 512MB range > CFMW2: [ 0x100000000 - 0x15fffffff ] 1.5GB range > > 2 CXL Memory Devices w/ 1GB capacity each (again, an example). > > Note that 1 device has its capacity split across the hole, but > if the devices are interleaved then both devices have their capacity > split across the hole. > > It seems with some mild modification, this patch set could be > re-used to handle this memory hole scenario as well > (w/ addr translation - Robert's patch set) > > Is there a reason not to handle more than just LMH's in this set? > > (I may try to hack this up on my test system and report back.) > I'd appreciate it if you want to report back. Thank you, Fabio > > ~Gregory >
Gregory Price wrote: > On Tue, Jan 14, 2025 at 09:32:54PM +0100, Fabio M. De Francesco wrote: > > +/* > > + * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. > > + * > > + * On x86, CFMWS ranges never intersect memory holes while endpoint decoders > > + * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, > > + * the given endpoint decoder HPA range size is always expected aligned and > > + * also larger than that of the matching root decoder. If there are LMH's, > > + * the root decoder range end is always less than SZ_4G. > > + */ > > Is there any reason to limit this memory-hole handling to only low > memory holes? I have observed systems where the following memory > hole situation occurs: > > (example, not exact sizes) > CFMW1: [ 0xc0000000 - 0xdfffffff ] 512MB range > Reserved [ 0xe0000000 - 0xffffffff ] 512MB range > CFMW2: [ 0x100000000 - 0x15fffffff ] 1.5GB range > > 2 CXL Memory Devices w/ 1GB capacity each (again, an example). > > Note that 1 device has its capacity split across the hole, but > if the devices are interleaved then both devices have their capacity > split across the hole. > > It seems with some mild modification, this patch set could be > re-used to handle this memory hole scenario as well > (w/ addr translation - Robert's patch set) > > Is there a reason not to handle more than just LMH's in this set? This discussion was referenced recently on an IM and I wanted to share my response to it: The rules for when to apply this memory hole quirk are explicit and suitable to add to the CXL specification. I want the same standard for any other quirk and ideally some proof-of-work to get that quirk recognized by the specification. Otherwise, I worry that generalizing for all the possible ways that platform BIOS tries to be clever means we end up with something that has no rules. The spec is there to allow software to delineate valid configurations vs mistakes, and this slow drip of "Linux does not understand this platform configuration" is a spec gap.
On Tue, Apr 01, 2025 at 01:32:33PM -0700, Dan Williams wrote: > Gregory Price wrote: > > Is there a reason not to handle more than just LMH's in this set? > > This discussion was referenced recently on an IM and I wanted to share > my response to it: > > The rules for when to apply this memory hole quirk are explicit and > suitable to add to the CXL specification. I want the same standard for > any other quirk and ideally some proof-of-work to get that quirk > recognized by the specification. Otherwise, I worry that generalizing > for all the possible ways that platform BIOS tries to be clever means we > end up with something that has no rules. > > The spec is there to allow software to delineate valid configurations vs > mistakes, and this slow drip of "Linux does not understand this platform > configuration" is a spec gap. Note: I've since come around to understand the whole ecosystem a bit better since i wrote this response. I don't know that it's needed. Some of the explanation of this patch series is a bit confusing. It justifies itself by saying CFMWS don't intersect memory holes and that endpoint decoders have to be 256MB aligned. /* * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. * * On x86, CFMWS ranges never intersect memory holes while endpoint decoders * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, * the given endpoint decoder HPA range size is always expected aligned and * also larger than that of the matching root decoder. If there are LMH's, * the root decoder range end is always less than SZ_4G. */ But per the spec, CFMWS is also aligned to be aligned to 256MB. Shouldn't the platform work around memory holes to generate multiple CFMWS for the entire capacity, and then use multiple endpoint decoders (1 per CFMWS) to map the capacity accordingly? (Also, I still don't understand the oracle value of <4GB address range. It seems like if this is some quirk of SPA vs HPA alignment, then it can hold for *all* ocurrances, not just stuff below 4GB) ~Gregory
Gregory Price wrote: > On Tue, Apr 01, 2025 at 01:32:33PM -0700, Dan Williams wrote: > > Gregory Price wrote: > > > Is there a reason not to handle more than just LMH's in this set? > > > > This discussion was referenced recently on an IM and I wanted to share > > my response to it: > > > > The rules for when to apply this memory hole quirk are explicit and > > suitable to add to the CXL specification. I want the same standard for > > any other quirk and ideally some proof-of-work to get that quirk > > recognized by the specification. Otherwise, I worry that generalizing > > for all the possible ways that platform BIOS tries to be clever means we > > end up with something that has no rules. > > > > The spec is there to allow software to delineate valid configurations vs > > mistakes, and this slow drip of "Linux does not understand this platform > > configuration" is a spec gap. > > Note: I've since come around to understand the whole ecosystem a bit > better since i wrote this response. Yes, I should have acknowledged shifts in understanding since this thread went quiet. Fabio was about to spin this set again to add more "generalization" and I wanted to clarify my current thinking that generalization is the opposite of what should happen here. > I don't know that it's needed. Referring to spec changes? I think they are, see below > Some of the explanation of this patch series is a bit confusing. It > justifies itself by saying CFMWS don't intersect memory holes and that > endpoint decoders have to be 256MB aligned. > > /* > * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. > * > * On x86, CFMWS ranges never intersect memory holes while endpoint decoders > * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, > * the given endpoint decoder HPA range size is always expected aligned and > * also larger than that of the matching root decoder. If there are LMH's, > * the root decoder range end is always less than SZ_4G. > */ > > But per the spec, CFMWS is also aligned to be aligned to 256MB. Right, something has to give, i.e. "spec meet reality". Hardware endpoint decoders must be aligned, that is a shipping expectation, and endpoints are not in a position to know or care about host platform constraints. In constrast, the CFMWS definition runs into a practical problem meeting the same expectation given competing host phyiscal memory map constraints. The platforms with this condition want to support CXL mapped starting at zero *and* the typical/historical PCI MMIO space in low memory (for those PCI devices that do not support 64-bit addressing). If the CFMWS blindly followed the 256MB*NIW constraint the CXL window would overlap the MMIO space. So the choices are: 1/ Give up on mapping CXL starting at zero when 256MB * NIW does not fit 2/ Give up on maintaining historical availabilty of and compatibility with 32-bit only PCI devices (PCI configuration regression) 3/ Trim CFMWS to match the reality that the platform will always route memory cycles in that PCI MMIO range to PCI MMIO and never to CXL. 4/ Define some new protocol for when CFMWS is explicitly countermanded by other platform resource descriptors, and not a BIOS bug. The platform in question chose option 3. > Shouldn't the platform work around memory holes to generate multiple > CFMWS for the entire capacity, and then use multiple endpoint decoders > (1 per CFMWS) to map the capacity accordingly? Per above, the maths do not work out to be able to support that relative to a CXL region with problematic NIW. > (Also, I still don't understand the oracle value of <4GB address range. > It seems like if this is some quirk of SPA vs HPA alignment, then it > can hold for *all* ocurrances, not just stuff below 4GB) The goal is to get platform vendors to define the rules so that an OS has a reasonable expectation to know what is a valid vs invalid configuration. A hole above 4GB has no reason to exist, there is no resource conflict like PCI MMIO that explains why typical spec expectation can not be met. So I want the subsystem to have an explicit set of platform quirks ideally backed up by updated spec language. That allows us to validate that the Linux implementation is correct by some objective source of truth, encourage platform vendors to update that source of truth when they create new corner cases, or even better, be more mindful to not create new corner cases.
On Tue, Apr 01, 2025 at 04:34:44PM -0700, Dan Williams wrote: > The platforms with this condition want to support CXL mapped starting at > zero *and* the typical/historical PCI MMIO space in low memory (for > those PCI devices that do not support 64-bit addressing). If the CFMWS > blindly followed the 256MB*NIW constraint the CXL window would overlap > the MMIO space. So the choices are: > If I'm understanding everything correctly, then I think this is intended to work only when EFI_MEMORY_SP is *not* set for these particular CXL devices - so it comes up as memory in early boot and we're just trying to wire up all the bits to let the driver manage it accordingly? If that's the case, then ignore me, i'm just bikeshedding at this point. I can't imagine a system where the memory at 0x0-4GB is intended to come up as EFI_MEMORY_SP, so I hope no one ever tries this :D > > (Also, I still don't understand the oracle value of <4GB address range. > > It seems like if this is some quirk of SPA vs HPA alignment, then it > > can hold for *all* ocurrances, not just stuff below 4GB) > > The goal is to get platform vendors to define the rules so that an OS > has a reasonable expectation to know what is a valid vs invalid > configuration. A hole above 4GB has no reason to exist, there is no > resource conflict like PCI MMIO that explains why typical spec > expectation can not be met. > > So I want the subsystem to have an explicit set of platform quirks > ideally backed up by updated spec language. That allows us to validate > that the Linux implementation is correct by some objective source of > truth, encourage platform vendors to update that source of truth when > they create new corner cases, or even better, be more mindful to not > create new corner cases. I follow, seems reasonable. ~Gregory
diff --git a/drivers/cxl/core/lmh.c b/drivers/cxl/core/lmh.c new file mode 100644 index 0000000000000..232ebea0a8364 --- /dev/null +++ b/drivers/cxl/core/lmh.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/range.h> +#include "lmh.h" + +/* Start of CFMWS range that end before x86 Low Memory Holes */ +#define LMH_CFMWS_RANGE_START 0x0ULL + +/* + * Match CXL Root and Endpoint Decoders by comparing SPA and HPA ranges. + * + * On x86, CFMWS ranges never intersect memory holes while endpoint decoders + * HPA range sizes are always guaranteed aligned to NIW * 256MB; therefore, + * the given endpoint decoder HPA range size is always expected aligned and + * also larger than that of the matching root decoder. If there are LMH's, + * the root decoder range end is always less than SZ_4G. + */ +bool arch_match_spa(struct cxl_root_decoder *cxlrd, + struct cxl_endpoint_decoder *cxled) +{ + struct range *r1, *r2; + int niw; + + r1 = &cxlrd->cxlsd.cxld.hpa_range; + r2 = &cxled->cxld.hpa_range; + niw = cxled->cxld.interleave_ways; + + if (r1->start == LMH_CFMWS_RANGE_START && r1->start == r2->start && + r1->end < (LMH_CFMWS_RANGE_START + SZ_4G) && r1->end < r2->end && + IS_ALIGNED(range_len(r2), niw * SZ_256M)) + return true; + + return false; +} + +/* Similar to arch_match_spa(), it matches regions and decoders */ +bool arch_match_region(struct cxl_region_params *p, struct cxl_decoder *cxld) +{ + struct range *r = &cxld->hpa_range; + struct resource *res = p->res; + int niw = cxld->interleave_ways; + + if (res->start == LMH_CFMWS_RANGE_START && res->start == r->start && + res->end < (LMH_CFMWS_RANGE_START + SZ_4G) && res->end < r->end && + IS_ALIGNED(range_len(r), niw * SZ_256M)) + return true; + + return false; +} + +void arch_adjust_region_resource(struct resource *res, + struct cxl_root_decoder *cxlrd) +{ + res->end = cxlrd->res->end; +} diff --git a/drivers/cxl/core/lmh.h b/drivers/cxl/core/lmh.h new file mode 100644 index 0000000000000..ec8907145afe8 --- /dev/null +++ b/drivers/cxl/core/lmh.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#include "cxl.h" + +#ifdef CONFIG_CXL_ARCH_LOW_MEMORY_HOLE +bool arch_match_spa(struct cxl_root_decoder *cxlrd, + struct cxl_endpoint_decoder *cxled); +bool arch_match_region(struct cxl_region_params *p, struct cxl_decoder *cxld); +void arch_adjust_region_resource(struct resource *res, + struct cxl_root_decoder *cxlrd); +#else +static bool arch_match_spa(struct cxl_root_decoder *cxlrd, + struct cxl_endpoint_decoder *cxled) +{ + return false; +} + +static bool arch_match_region(struct cxl_region_params *p, + struct cxl_decoder *cxld) +{ + return false; +} + +static void arch_adjust_region_resource(struct resource *res, + struct cxl_root_decoder *cxlrd) +{ +} +#endif /* CXL_ARCH_LOW_MEMORY_HOLE */
In x86 with Low memory Hole, the BIOS may publishes CFMWS that describe SPA ranges which are subsets of the corresponding CXL Endpoint Decoders HPA's because the CFMWS never intersects LMH's while EP Decoders HPA's ranges are always guaranteed to align to the NIW * 256M rule. In order to construct Regions and attach Decoders, the driver needs to match Root Decoders and Regions with Endpoint Decoders, but it fails and the entire process returns errors because it doesn't expect to deal with SPA range lengths smaller than corresponding HPA's. Introduce functions that indirectly detect x86 LMH's by comparing SPA's with corresponding HPA's. They will be used in the process of Regions creation and Endpoint attachments to prevent driver failures in a few steps of the above-mentioned process. The helpers return true when HPA/SPA misalignments are detected under specific conditions: both the SPA and HPA ranges must start at LMH_CFMWS_RANGE_START (that in x86 with LMH's is 0x0), SPA range sizes be less than HPA's, SPA's range's size be less than 4G, HPA's size be aligned to the NIW * 256M rule. Also introduce a function to adjust the range end of the Regions to be created on x86 with LMH's. Cc: Alison Schofield <alison.schofield@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> --- drivers/cxl/core/lmh.c | 55 ++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/lmh.h | 28 +++++++++++++++++++++ 2 files changed, 83 insertions(+) create mode 100644 drivers/cxl/core/lmh.c create mode 100644 drivers/cxl/core/lmh.h