diff mbox series

[14/14] cxl/port: Enable HDM Capability after validating DVSEC Ranges

Message ID 165237933127.3832067.12500546479146655886.stgit@dwillia2-desk3.amr.corp.intel.com
State Superseded
Headers show
Series cxl: Fix "mem_enable" handling | expand

Commit Message

Dan Williams May 12, 2022, 6:15 p.m. UTC
CXL memory expanders that support the CXL 2.0 memory device class code
include an "HDM Decoder Capability" mechanism to supplant the "CXL DVSEC
Range" mechanism originally defined in CXL 1.1. Both mechanisms depend
on a "mem_enable" bit being set in configuration space before either
mechanism activates. When the HDM Decoder Capability is enabled the CXL
DVSEC Range settings are ignored.

Previously, the cxl_mem driver was relying on platform-firmware to set
"mem_enable". That is an invalid assumption as there is no requirement
that platform-firmware sets the bit before the driver sees a device,
especially in hot-plug scenarios. Additionally, ACPI-platforms that
support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery
Table). That table outlines the platform permissible address ranges for
CXL operation. So, there is a need for the driver to set "mem_enable",
and there is information available to determine the validity of the CXL
DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely.
They always decode at least 256MB if "mem_enable" is set and the HDM
Decoder capability is disabled.

Arrange for the driver to optionally enable the HDM Decoder Capability
if "mem_enable" was not set by platform firmware, or the CXL DVSEC Range
configuration was invalid. Be careful to only disable memory decode if
the kernel was the one to enable it. In other words, if CXL is backing
all of kernel memory at boot the device needs to maintain "mem_enable"
and "HDM Decoder enable" all the way up to handoff back to platform
firmware (e.g. ACPI S5 state entry may require CXL memory to stay
active).

Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/pci.c |  163 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 151 insertions(+), 12 deletions(-)

Comments

Ariel.Sibley@microchip.com May 16, 2022, 6:41 p.m. UTC | #1
> Previously, the cxl_mem driver was relying on platform-firmware to set
> "mem_enable". That is an invalid assumption as there is no requirement
> that platform-firmware sets the bit before the driver sees a device,
> especially in hot-plug scenarios. Additionally, ACPI-platforms that
> support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery
> Table). That table outlines the platform permissible address ranges for
> CXL operation. So, there is a need for the driver to set "mem_enable",
> and there is information available to determine the validity of the CXL
> DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely.
> They always decode at least 256MB if "mem_enable" is set and the HDM
> Decoder capability is disabled.

Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4
has this statement:
---
A CXL.mem capable device that does not implement CXL HDM Decoder
Capability registers directs host accesses to an address A within its
local HDM memory if the following two equations are satisfied -
Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28]
Memory_Active AND Mem_Enable=1
---

Per the above, if a device advertises Memory_Size = 0, then the first
equation (second part) can never be true. So, a device advertising
Memory_Size = 0 will effectively shut off this range, even if
Mem_Enable = 1 and the HDM Decoder capability is disabled.

Regards,
Ariel
Dan Williams May 16, 2022, 6:52 p.m. UTC | #2
On Mon, May 16, 2022 at 11:41 AM <Ariel.Sibley@microchip.com> wrote:
>
> > Previously, the cxl_mem driver was relying on platform-firmware to set
> > "mem_enable". That is an invalid assumption as there is no requirement
> > that platform-firmware sets the bit before the driver sees a device,
> > especially in hot-plug scenarios. Additionally, ACPI-platforms that
> > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery
> > Table). That table outlines the platform permissible address ranges for
> > CXL operation. So, there is a need for the driver to set "mem_enable",
> > and there is information available to determine the validity of the CXL
> > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely.
> > They always decode at least 256MB if "mem_enable" is set and the HDM
> > Decoder capability is disabled.
>
> Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4
> has this statement:
> ---
> A CXL.mem capable device that does not implement CXL HDM Decoder
> Capability registers directs host accesses to an address A within its
> local HDM memory if the following two equations are satisfied -
> Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28]
> Memory_Active AND Mem_Enable=1
> ---
>
> Per the above, if a device advertises Memory_Size = 0, then the first
> equation (second part) can never be true. So, a device advertising
> Memory_Size = 0 will effectively shut off this range, even if
> Mem_Enable = 1 and the HDM Decoder capability is disabled.

Hmm, so now I am confused as to why the spec goes on to say:

"If the address A is not backed by real memory (e.g. a device with
less than 256 MB of
memory), a device that does not implement CXL HDM Decoder Capability registers
must handle those accesses gracefully i.e. return all 1’s on reads and
drop writes."

This is what led me to conclude that even though Memory_Size is 0, the
device is still actively decoding [0, 256MB).
Ariel.Sibley@microchip.com May 16, 2022, 7:31 p.m. UTC | #3
> > > Previously, the cxl_mem driver was relying on platform-firmware to set
> > > "mem_enable". That is an invalid assumption as there is no requirement
> > > that platform-firmware sets the bit before the driver sees a device,
> > > especially in hot-plug scenarios. Additionally, ACPI-platforms that
> > > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery
> > > Table). That table outlines the platform permissible address ranges for
> > > CXL operation. So, there is a need for the driver to set "mem_enable",
> > > and there is information available to determine the validity of the CXL
> > > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely.
> > > They always decode at least 256MB if "mem_enable" is set and the HDM
> > > Decoder capability is disabled.
> >
> > Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4
> > has this statement:
> > ---
> > A CXL.mem capable device that does not implement CXL HDM Decoder
> > Capability registers directs host accesses to an address A within its
> > local HDM memory if the following two equations are satisfied -
> > Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28]
> > Memory_Active AND Mem_Enable=1
> > ---
> >
> > Per the above, if a device advertises Memory_Size = 0, then the first
> > equation (second part) can never be true. So, a device advertising
> > Memory_Size = 0 will effectively shut off this range, even if
> > Mem_Enable = 1 and the HDM Decoder capability is disabled.
> 
> Hmm, so now I am confused as to why the spec goes on to say:
> 
> "If the address A is not backed by real memory (e.g. a device with
> less than 256 MB of
> memory), a device that does not implement CXL HDM Decoder Capability registers
> must handle those accesses gracefully i.e. return all 1’s on reads and
> drop writes."
> 
> This is what led me to conclude that even though Memory_Size is 0, the
> device is still actively decoding [0, 256MB).

My interpretation of that statement is that such a device would need to
advertise a capacity of 256M, and behave per the text for the portion of
the 256M range not backed by physical memory. Such a device would be 
problematic for the host to handle, as I'm not sure how the host would be 
able to determine the portion of the range that is usable. As such, I
don't think building such a device would be practical.

There are several other locations in the CXL 2.0 spec that indicate that
devices must manage capacity in units of 256M. E.g. Table 175. Identify
Memory Device Output Payload.

Regards,
Ariel
Dan Williams May 16, 2022, 8:07 p.m. UTC | #4
On Mon, May 16, 2022 at 12:32 PM <Ariel.Sibley@microchip.com> wrote:
>
> > > > Previously, the cxl_mem driver was relying on platform-firmware to set
> > > > "mem_enable". That is an invalid assumption as there is no requirement
> > > > that platform-firmware sets the bit before the driver sees a device,
> > > > especially in hot-plug scenarios. Additionally, ACPI-platforms that
> > > > support CXL 2.0 devices also support the ACPI CEDT (CXL Early Discovery
> > > > Table). That table outlines the platform permissible address ranges for
> > > > CXL operation. So, there is a need for the driver to set "mem_enable",
> > > > and there is information available to determine the validity of the CXL
> > > > DVSEC Ranges. Note that the DVSEC Ranges can not be shut off completely.
> > > > They always decode at least 256MB if "mem_enable" is set and the HDM
> > > > Decoder capability is disabled.
> > >
> > > Regarding "can not be shut off completely", CXL 2.0 Section 8.1.3.8.4
> > > has this statement:
> > > ---
> > > A CXL.mem capable device that does not implement CXL HDM Decoder
> > > Capability registers directs host accesses to an address A within its
> > > local HDM memory if the following two equations are satisfied -
> > > Memory_Base[63:28] <= (A >> 28) < Memory_Base[63:28]+Memory_Size[63:28]
> > > Memory_Active AND Mem_Enable=1
> > > ---
> > >
> > > Per the above, if a device advertises Memory_Size = 0, then the first
> > > equation (second part) can never be true. So, a device advertising
> > > Memory_Size = 0 will effectively shut off this range, even if
> > > Mem_Enable = 1 and the HDM Decoder capability is disabled.
> >
> > Hmm, so now I am confused as to why the spec goes on to say:
> >
> > "If the address A is not backed by real memory (e.g. a device with
> > less than 256 MB of
> > memory), a device that does not implement CXL HDM Decoder Capability registers
> > must handle those accesses gracefully i.e. return all 1’s on reads and
> > drop writes."
> >
> > This is what led me to conclude that even though Memory_Size is 0, the
> > device is still actively decoding [0, 256MB).
>
> My interpretation of that statement is that such a device would need to
> advertise a capacity of 256M, and behave per the text for the portion of
> the 256M range not backed by physical memory. Such a device would be
> problematic for the host to handle, as I'm not sure how the host would be
> able to determine the portion of the range that is usable. As such, I
> don't think building such a device would be practical.

Right, I don't expect it to happen, but it's easy enough for the
kernel to exclude the possibility.

It just so happens that one of the early revs of the QEMU patch set
introduced this case, so it may not be feasible for hardware, but it's
already happened for emulated devices.

> There are several other locations in the CXL 2.0 spec that indicate that
> devices must manage capacity in units of 256M. E.g. Table 175. Identify
> Memory Device Output Payload.

Yes, for this patch it is just trying to find a reasonable heuristic
for when "Mem_enable = 1" and "HDM Decoder Capabilty Enable = 0" to
determine that the kernel can go ahead and set "HDM Decoder Capability
Enable = 1"
diff mbox series

Patch

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index a697c48fc830..cf53370d3d20 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -175,30 +175,164 @@  static int wait_for_valid(struct cxl_dev_state *cxlds)
 	return -ETIMEDOUT;
 }
 
+static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
+{
+	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+	int d = cxlds->cxl_dvsec;
+	u16 ctrl;
+	int rc;
+
+	rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+	if (rc < 0)
+		return rc;
+
+	if ((ctrl & CXL_DVSEC_MEM_ENABLE) == val)
+		return 1;
+	ctrl &= ~CXL_DVSEC_MEM_ENABLE;
+	ctrl |= val;
+
+	rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+
+static void clear_mem_enable(void *cxlds)
+{
+	cxl_set_mem_enable(cxlds, 0);
+}
+
+static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
+{
+	int rc;
+
+	rc = cxl_set_mem_enable(cxlds, CXL_DVSEC_MEM_ENABLE);
+	if (rc < 0)
+		return rc;
+	if (rc > 0)
+		return 0;
+	return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
+}
+
+static bool range_contains(struct range *r1, struct range *r2)
+{
+	return r1->start <= r2->start && r1->end >= r2->end;
+}
+
+/* require dvsec ranges to be covered by a locked platform window */
+static int dvsec_range_allowed(struct device *dev, void *arg)
+{
+	struct range *dev_range = arg;
+	struct cxl_decoder *cxld;
+	struct range root_range;
+
+	if (!is_root_decoder(dev))
+		return 0;
+
+	cxld = to_cxl_decoder(dev);
+
+	if (!(cxld->flags & CXL_DECODER_F_LOCK))
+		return 0;
+	if (!(cxld->flags & CXL_DECODER_F_RAM))
+		return 0;
+
+	root_range = (struct range) {
+		.start = cxld->platform_res.start,
+		.end = cxld->platform_res.end,
+	};
+
+	return range_contains(&root_range, dev_range);
+}
+
+static void disable_hdm(void *_cxlhdm)
+{
+	u32 global_ctrl;
+	struct cxl_hdm *cxlhdm = _cxlhdm;
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+
+	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
+	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+}
+
+static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
+{
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	u32 global_ctrl;
+
+	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
+	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
+}
+
 static bool __cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
 				  struct cxl_hdm *cxlhdm,
 				  struct cxl_endpoint_dvsec_info *info)
 {
 	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
-	bool global_enable;
+	struct cxl_port *port = cxlhdm->port;
+	struct device *dev = cxlds->dev;
+	struct cxl_port *root;
 	u32 global_ctrl;
+	int i, rc;
 
 	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
 
-	if (!global_enable && info->mem_enabled)
+	/*
+	 * If the HDM Decoder Capability is already enabled then assume
+	 * that some other agent like platform firmware set it up.
+	 */
+	if (global_ctrl & CXL_HDM_DECODER_ENABLE) {
+		rc = devm_cxl_enable_mem(&port->dev, cxlds);
+		if (rc)
+			return false;
+		return true;
+	}
+
+	root = to_cxl_port(port->dev.parent);
+	while (!is_cxl_root(root) && is_cxl_port(root->dev.parent))
+		root = to_cxl_port(root->dev.parent);
+	if (!is_cxl_root(root)) {
+		dev_err(dev, "Failed to acquire root port for HDM enable\n");
 		return false;
+	}
+
+	for (i = 0; i < info->ranges; i++) {
+		struct device *cxld_dev;
+
+		if (!info->mem_enabled)
+			break;
+
+		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
+					     dvsec_range_allowed);
+		if (!cxld_dev) {
+			dev_dbg(dev, "Range%d disallowed by platform\n", i);
+			cxl_set_mem_enable(cxlds, 0);
+			info->mem_enabled = 0;
+			break;
+		}
+		put_device(cxld_dev);
+		break;
+	}
+	put_device(&root->dev);
 
 	/*
-	 * Permanently (for this boot at least) opt the device into HDM
-	 * operation. Individual HDM decoders still need to be enabled after
-	 * this point.
+	 * At least one DVSEC range is enabled and allowed, skip HDM
+	 * Decoder Capability Enable
 	 */
-	if (!global_enable) {
-		dev_dbg(cxlds->dev, "Enabling HDM decode\n");
-		writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
-		       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-	}
+	if (info->mem_enabled)
+		return false;
+
+	rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
+	if (rc)
+		return false;
+
+	rc = devm_cxl_enable_mem(&port->dev, cxlds);
+	if (rc)
+		return false;
 
 	return true;
 }
@@ -253,9 +387,14 @@  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
 		return rc;
 	}
 
+	/*
+	 * The current DVSEC values are moot if the memory capability is
+	 * disabled, and they will remain moot after the HDM Decoder
+	 * capability is enabled.
+	 */
 	info.mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
 	if (!info.mem_enabled)
-		return 0;
+		return __cxl_hdm_decode_init(cxlds, cxlhdm, &info);
 
 	for (i = 0; i < hdm_count; i++) {
 		u64 base, size;