Message ID | 165237933127.3832067.12500546479146655886.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | cxl: Fix "mem_enable" handling | expand |
> Previously, the cxl_mem driver was relying on platform-firmware to set > "mem_enable". That is an invalid assumption as there is no requirement > that platform-firmware sets the bit before the driver sees a device, > especially in hot-plug scenarios. Additionally, ACPI-platforms that > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery > Table). That table outlines the platform permissible address ranges for > CXL operation. So, there is a need for the driver to set "mem_enable", > and there is information available to determine the validity of the CXL > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely. > They always decode at least 256MB if "mem_enable" is set and the HDM > Decoder capability is disabled. Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4 has this statement: --- A CXL.mem capable device that does not implement CXL HDM Decoder Capability registers directs host accesses to an address A within its local HDM memory if the following two equations are satisfied - Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28] Memory_Active AND Mem_Enable=1 --- Per the above, if a device advertises Memory_Size = 0, then the first equation (second part) can never be true. So, a device advertising Memory_Size = 0 will effectively shut off this range, even if Mem_Enable = 1 and the HDM Decoder capability is disabled. Regards, Ariel
On Mon, May 16, 2022 at 11:41 AM <Ariel.Sibley@microchip.com> wrote: > > > Previously, the cxl_mem driver was relying on platform-firmware to set > > "mem_enable". That is an invalid assumption as there is no requirement > > that platform-firmware sets the bit before the driver sees a device, > > especially in hot-plug scenarios. Additionally, ACPI-platforms that > > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery > > Table). That table outlines the platform permissible address ranges for > > CXL operation. So, there is a need for the driver to set "mem_enable", > > and there is information available to determine the validity of the CXL > > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely. > > They always decode at least 256MB if "mem_enable" is set and the HDM > > Decoder capability is disabled. > > Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4 > has this statement: > --- > A CXL.mem capable device that does not implement CXL HDM Decoder > Capability registers directs host accesses to an address A within its > local HDM memory if the following two equations are satisfied - > Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28] > Memory_Active AND Mem_Enable=1 > --- > > Per the above, if a device advertises Memory_Size = 0, then the first > equation (second part) can never be true. So, a device advertising > Memory_Size = 0 will effectively shut off this range, even if > Mem_Enable = 1 and the HDM Decoder capability is disabled. Hmm, so now I am confused as to why the spec goes on to say: "If the address A is not backed by real memory (e.g. a device with less than 256 MB of memory), a device that does not implement CXL HDM Decoder Capability registers must handle those accesses gracefully i.e. return all 1’s on reads and drop writes." This is what led me to conclude that even though Memory_Size is 0, the device is still actively decoding [0, 256MB).
> > > Previously, the cxl_mem driver was relying on platform-firmware to set > > > "mem_enable". That is an invalid assumption as there is no requirement > > > that platform-firmware sets the bit before the driver sees a device, > > > especially in hot-plug scenarios. Additionally, ACPI-platforms that > > > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery > > > Table). That table outlines the platform permissible address ranges for > > > CXL operation. So, there is a need for the driver to set "mem_enable", > > > and there is information available to determine the validity of the CXL > > > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely. > > > They always decode at least 256MB if "mem_enable" is set and the HDM > > > Decoder capability is disabled. > > > > Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4 > > has this statement: > > --- > > A CXL.mem capable device that does not implement CXL HDM Decoder > > Capability registers directs host accesses to an address A within its > > local HDM memory if the following two equations are satisfied - > > Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28] > > Memory_Active AND Mem_Enable=1 > > --- > > > > Per the above, if a device advertises Memory_Size = 0, then the first > > equation (second part) can never be true. So, a device advertising > > Memory_Size = 0 will effectively shut off this range, even if > > Mem_Enable = 1 and the HDM Decoder capability is disabled. > > Hmm, so now I am confused as to why the spec goes on to say: > > "If the address A is not backed by real memory (e.g. a device with > less than 256 MB of > memory), a device that does not implement CXL HDM Decoder Capability registers > must handle those accesses gracefully i.e. return all 1’s on reads and > drop writes." > > This is what led me to conclude that even though Memory_Size is 0, the > device is still actively decoding [0, 256MB). My interpretation of that statement is that such a device would need to advertise a capacity of 256M, and behave per the text for the portion of the 256M range not backed by physical memory. Such a device would be problematic for the host to handle, as I'm not sure how the host would be able to determine the portion of the range that is usable. As such, I don't think building such a device would be practical. There are several other locations in the CXL 2.0 spec that indicate that devices must manage capacity in units of 256M. E.g. Table 175. Identify Memory Device Output Payload. Regards, Ariel
On Mon, May 16, 2022 at 12:32 PM <Ariel.Sibley@microchip.com> wrote: > > > > > Previously, the cxl_mem driver was relying on platform-firmware to set > > > > "mem_enable". That is an invalid assumption as there is no requirement > > > > that platform-firmware sets the bit before the driver sees a device, > > > > especially in hot-plug scenarios. Additionally, ACPI-platforms that > > > > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery > > > > Table). That table outlines the platform permissible address ranges for > > > > CXL operation. So, there is a need for the driver to set "mem_enable", > > > > and there is information available to determine the validity of the CXL > > > > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely. > > > > They always decode at least 256MB if "mem_enable" is set and the HDM > > > > Decoder capability is disabled. > > > > > > Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4 > > > has this statement: > > > --- > > > A CXL.mem capable device that does not implement CXL HDM Decoder > > > Capability registers directs host accesses to an address A within its > > > local HDM memory if the following two equations are satisfied - > > > Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28] > > > Memory_Active AND Mem_Enable=1 > > > --- > > > > > > Per the above, if a device advertises Memory_Size = 0, then the first > > > equation (second part) can never be true. So, a device advertising > > > Memory_Size = 0 will effectively shut off this range, even if > > > Mem_Enable = 1 and the HDM Decoder capability is disabled. > > > > Hmm, so now I am confused as to why the spec goes on to say: > > > > "If the address A is not backed by real memory (e.g. a device with > > less than 256 MB of > > memory), a device that does not implement CXL HDM Decoder Capability registers > > must handle those accesses gracefully i.e. return all 1’s on reads and > > drop writes." > > > > This is what led me to conclude that even though Memory_Size is 0, the > > device is still actively decoding [0, 256MB). > > My interpretation of that statement is that such a device would need to > advertise a capacity of 256M, and behave per the text for the portion of > the 256M range not backed by physical memory. Such a device would be > problematic for the host to handle, as I'm not sure how the host would be > able to determine the portion of the range that is usable. As such, I > don't think building such a device would be practical. Right, I don't expect it to happen, but it's easy enough for the kernel to exclude the possibility. It just so happens that one of the early revs of the QEMU patch set introduced this case, so it may not be feasible for hardware, but it's already happened for emulated devices. > There are several other locations in the CXL 2.0 spec that indicate that > devices must manage capacity in units of 256M. E.g. Table 175. Identify > Memory Device Output Payload. Yes, for this patch it is just trying to find a reasonable heuristic for when "Mem_enable = 1" and "HDM Decoder Capabilty Enable = 0" to determine that the kernel can go ahead and set "HDM Decoder Capability Enable = 1"
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index a697c48fc830..cf53370d3d20 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -175,30 +175,164 @@ static int wait_for_valid(struct cxl_dev_state *cxlds) return -ETIMEDOUT; } +static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val) +{ + struct pci_dev *pdev = to_pci_dev(cxlds->dev); + int d = cxlds->cxl_dvsec; + u16 ctrl; + int rc; + + rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl); + if (rc < 0) + return rc; + + if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val) + return 1; + ctrl &= ~CXL_DVSEC_MEM_ENABLE; + ctrl |= val; + + rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl); + if (rc < 0) + return rc; + + return 0; +} + +static void clear_mem_enable(void *cxlds) +{ + cxl_set_mem_enable(cxlds, 0); +} + +static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds) +{ + int rc; + + rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE); + if (rc < 0) + return rc; + if (rc > 0) + return 0; + return devm_add_action_or_reset(host, clear_mem_enable, cxlds); +} + +static bool range_contains(struct range *r1, struct range *r2) +{ + return r1->start <= r2->start && r1->end >= r2->end; +} + +/* require dvsec ranges to be covered by a locked platform window */ +static int dvsec_range_allowed(struct device *dev, void *arg) +{ + struct range *dev_range = arg; + struct cxl_decoder *cxld; + struct range root_range; + + if (!is_root_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + + if (!(cxld->flags & CXL_DECODER_F_LOCK)) + return 0; + if (!(cxld->flags & CXL_DECODER_F_RAM)) + return 0; + + root_range = (struct range) { + .start = cxld->platform_res.start, + .end = cxld->platform_res.end, + }; + + return range_contains(&root_range, dev_range); +} + +static void disable_hdm(void *_cxlhdm) +{ + u32 global_ctrl; + struct cxl_hdm *cxlhdm = _cxlhdm; + void __iomem *hdm = cxlhdm->regs.hdm_decoder; + + global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET); + writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE, + hdm + CXL_HDM_DECODER_CTRL_OFFSET); +} + +static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm) +{ + void __iomem *hdm = cxlhdm->regs.hdm_decoder; + u32 global_ctrl; + + global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET); + writel(global_ctrl | CXL_HDM_DECODER_ENABLE, + hdm + CXL_HDM_DECODER_CTRL_OFFSET); + + return devm_add_action_or_reset(host, disable_hdm, cxlhdm); +} + static bool __cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm, struct cxl_endpoint_dvsec_info *info) { void __iomem *hdm = cxlhdm->regs.hdm_decoder; - bool global_enable; + struct cxl_port *port = cxlhdm->port; + struct device *dev = cxlds->dev; + struct cxl_port *root; u32 global_ctrl; + int i, rc; global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET); - global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE; - if (!global_enable && info->mem_enabled) + /* + * If the HDM Decoder Capability is already enabled then assume + * that some other agent like platform firmware set it up. + */ + if (global_ctrl & CXL_HDM_DECODER_ENABLE) { + rc = devm_cxl_enable_mem(&port->dev, cxlds); + if (rc) + return false; + return true; + } + + root = to_cxl_port(port->dev.parent); + while (!is_cxl_root(root) && is_cxl_port(root->dev.parent)) + root = to_cxl_port(root->dev.parent); + if (!is_cxl_root(root)) { + dev_err(dev, "Failed to acquire root port for HDM enable\n"); return false; + } + + for (i = 0; i < info->ranges; i++) { + struct device *cxld_dev; + + if (!info->mem_enabled) + break; + + cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i], + dvsec_range_allowed); + if (!cxld_dev) { + dev_dbg(dev, "Range%d disallowed by platform\n", i); + cxl_set_mem_enable(cxlds, 0); + info->mem_enabled = 0; + break; + } + put_device(cxld_dev); + break; + } + put_device(&root->dev); /* - * Permanently (for this boot at least) opt the device into HDM - * operation. Individual HDM decoders still need to be enabled after - * this point. + * At least one DVSEC range is enabled and allowed, skip HDM + * Decoder Capability Enable */ - if (!global_enable) { - dev_dbg(cxlds->dev, "Enabling HDM decode\n"); - writel(global_ctrl | CXL_HDM_DECODER_ENABLE, - hdm + CXL_HDM_DECODER_CTRL_OFFSET); - } + if (info->mem_enabled) + return false; + + rc = devm_cxl_enable_hdm(&port->dev, cxlhdm); + if (rc) + return false; + + rc = devm_cxl_enable_mem(&port->dev, cxlds); + if (rc) + return false; return true; } @@ -253,9 +387,14 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) return rc; } + /* + * The current DVSEC values are moot if the memory capability is + * disabled, and they will remain moot after the HDM Decoder + * capability is enabled. + */ info.mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl); if (!info.mem_enabled) - return 0; + return __cxl_hdm_decode_init(cxlds, cxlhdm, &info); for (i = 0; i < hdm_count; i++) { u64 base, size;
CXL memory expanders that support the CXL 2.0 memory device class code include an "HDM Decoder Capability" mechanism to supplant the "CXL DVSEC Range" mechanism originally defined in CXL 1.1. Both mechanisms depend on a "mem_enable" bit being set in configuration space before either mechanism activates. When the HDM Decoder Capability is enabled the CXL DVSEC Range settings are ignored. Previously, the cxl_mem driver was relying on platform-firmware to set "mem_enable". That is an invalid assumption as there is no requirement that platform-firmware sets the bit before the driver sees a device, especially in hot-plug scenarios. Additionally, ACPI-platforms that support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery Table). That table outlines the platform permissible address ranges for CXL operation. So, there is a need for the driver to set "mem_enable", and there is information available to determine the validity of the CXL DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely. They always decode at least 256MB if "mem_enable" is set and the HDM Decoder capability is disabled. Arrange for the driver to optionally enable the HDM Decoder Capability if "mem_enable" was not set by platform firmware, or the CXL DVSEC Range configuration was invalid. Be careful to only disable memory decode if the kernel was the one to enable it. In other words, if CXL is backing all of kernel memory at boot the device needs to maintain "mem_enable" and "HDM Decoder enable" all the way up to handoff back to platform firmware (e.g. ACPI S5 state entry may require CXL memory to stay active). Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/pci.c | 163 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 151 insertions(+), 12 deletions(-)