Message ID | 168443110845.2957452.15022248373553807511.stgit@djiang5-mobl3 |
---|---|
State | New, archived |
Headers | show |
Series | ] cxl: Move operations after memory is ready | expand |
Dave Jiang wrote: > Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing > IDENTIFY and retrieving memory device information to ensure that the > device is ready to provide the information. > > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices") > Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Dave Jiang wrote: > Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing > IDENTIFY and retrieving memory device information to ensure that the > device is ready to provide the information. > > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices") > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/pci.c | 6 ++++++ > drivers/cxl/port.c | 6 ------ > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index ea38bd49b0cf..fc59ca79b2a0 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > if (rc) > dev_dbg(&pdev->dev, "Failed to map RAS capability.\n"); > > + rc = cxl_await_media_ready(cxlds); > + if (rc) { > + dev_err(&pdev->dev, "Media not active (%d)\n", rc); > + return rc; > + } So, now that I see this new failure mode here it raises another concern. I think there is value in still trying to bring up the mailbox interface even if media-ready never completes. For example, what if you need to update-firmware to remediate the device? The mailbox interface does not need the memdev to attach to the cxl_mem driver to enable command submission, but the question becomes what do you do once you have fixed up whatever condition caused the media to fail? I think the simplest answer is to just require tooling to reload cxl_pci for that device to re-attempt init. A more complicated answer would be to teach cxl_mem how to revalidate capacity after-the-fact, but at this point I think the implementation is stuck with cxl_pci owning capacity initialization. So I think this looks like caching the the media-ready timeout condition and teaching all subsequent code that cares about capacity info to fail and continue. Like don't issue identify, make sure the sysfs capacity values return 0, and don't let cxl_mem proceed with topology enumeration.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index ea38bd49b0cf..fc59ca79b2a0 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) dev_dbg(&pdev->dev, "Failed to map RAS capability.\n"); + rc = cxl_await_media_ready(cxlds); + if (rc) { + dev_err(&pdev->dev, "Media not active (%d)\n", rc); + return rc; + } + rc = cxl_pci_setup_mailbox(cxlds); if (rc) return rc; diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index a49f5eb149f1..bfb948e00c42 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -108,12 +108,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) if (rc) return rc; - rc = cxl_await_media_ready(cxlds); - if (rc) { - dev_err(&port->dev, "Media not active (%d)\n", rc); - return rc; - } - rc = devm_cxl_enumerate_decoders(cxlhdm, &info); if (rc) return rc;
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing IDENTIFY and retrieving memory device information to ensure that the device is ready to provide the information. Suggested-by: Dan Williams <dan.j.williams@intel.com> Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices") Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/cxl/pci.c | 6 ++++++ drivers/cxl/port.c | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-)