diff mbox series

[v2,2/2] cxl: Move cxl_await_media_ready() to before capacity info retrieval

Message ID 168443110845.2957452.15022248373553807511.stgit@djiang5-mobl3
State New, archived
Headers show
Series ] cxl: Move operations after memory is ready | expand

Commit Message

Dave Jiang May 18, 2023, 5:31 p.m. UTC
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/pci.c  |    6 ++++++
 drivers/cxl/port.c |    6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

Ira Weiny May 18, 2023, 7:05 p.m. UTC | #1
Dave Jiang wrote:
> Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
> IDENTIFY and retrieving memory device information to ensure that the
> device is ready to provide the information.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Dan Williams May 18, 2023, 8:38 p.m. UTC | #2
Dave Jiang wrote:
> Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
> IDENTIFY and retrieving memory device information to ensure that the
> device is ready to provide the information.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/pci.c  |    6 ++++++
>  drivers/cxl/port.c |    6 ------
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index ea38bd49b0cf..fc59ca79b2a0 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
>  
> +	rc = cxl_await_media_ready(cxlds);
> +	if (rc) {
> +		dev_err(&pdev->dev, "Media not active (%d)\n", rc);
> +		return rc;
> +	}

So, now that I see this new failure mode here it raises another concern.
I think there is value in still trying to bring up the mailbox interface
even if media-ready never completes. For example, what if you need to
update-firmware to remediate the device?

The mailbox interface does not need the memdev to attach to the cxl_mem
driver to enable command submission, but the question becomes what do
you do once you have fixed up whatever condition caused the media to
fail?

I think the simplest answer is to just require tooling to reload cxl_pci
for that device to re-attempt init. A more complicated answer would be
to teach cxl_mem how to revalidate capacity after-the-fact, but at this
point I think the implementation is stuck with cxl_pci owning capacity
initialization.

So I think this looks like caching the the media-ready timeout condition
and teaching all subsequent code that cares about capacity info to fail
and continue. Like don't issue identify, make sure the sysfs capacity
values return 0, and don't let cxl_mem proceed with topology
enumeration.
diff mbox series

Patch

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index ea38bd49b0cf..fc59ca79b2a0 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -708,6 +708,12 @@  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
 
+	rc = cxl_await_media_ready(cxlds);
+	if (rc) {
+		dev_err(&pdev->dev, "Media not active (%d)\n", rc);
+		return rc;
+	}
+
 	rc = cxl_pci_setup_mailbox(cxlds);
 	if (rc)
 		return rc;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index a49f5eb149f1..bfb948e00c42 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -108,12 +108,6 @@  static int cxl_endpoint_port_probe(struct cxl_port *port)
 	if (rc)
 		return rc;
 
-	rc = cxl_await_media_ready(cxlds);
-	if (rc) {
-		dev_err(&port->dev, "Media not active (%d)\n", rc);
-		return rc;
-	}
-
 	rc = devm_cxl_enumerate_decoders(cxlhdm, &info);
 	if (rc)
 		return rc;