Message ID | 20240108-dont-fail-irq-v1-1-4407228debd2@intel.com |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] cxl/pci: Skip irq features if irq's are not supported | expand |
On Mon, Jan 08, 2024 at 11:51:13PM -0800, Ira Weiny wrote: > CXL 3.1 Section 3.1.1 states: > > "A Function on a CXL device must not generate INTx messages if > that Function participates in CXL.cache protocol or CXL.mem > protocols." > > The generic CXL memory driver only supports devices which use the > CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X > vectors in anticipation of their need for mailbox interrupts or event > processing. However, the above requirement does not require a device to > support interrupts at all. A device may not use mailbox interrupts and > may be configured for firmware first event processing. > > Rather than fail device probe if interrupts are not supported; flag such > that irqs are not supported and do not enable features which require > interrupts. dev_warn() in those cases which require interrupts but they > were not supported. > > It is possible for a device to have host based event processing through > polling but this patch does not support the addition of such polling. > Leave that to the future if such a device comes along. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > Compile tested only. > > This is an RFC based on errors seen by Dave Larson and reported on > discord. Dan requested that the driver not fail if irqs are not > required. > --- > drivers/cxl/cxlmem.h | 2 ++ > drivers/cxl/pci.c | 25 +++++++++++++++++++------ > 2 files changed, 21 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index a2fcbca253f3..422bc9657e5c 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -410,6 +410,7 @@ enum cxl_devtype { > * @ram_res: Active Volatile memory capacity configuration > * @serial: PCIe Device Serial Number > * @type: Generic Memory Class device or Vendor Specific Memory device > + * @irq_supported: Flag if irqs are supported by the device > */ > struct cxl_dev_state { > struct device *dev; > @@ -424,6 +425,7 @@ struct cxl_dev_state { > struct resource ram_res; > u64 serial; > enum cxl_devtype type; > + bool irq_supported; > }; > > /** > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 0155fb66b580..bb90ac011290 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds) > if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ)) > return 0; > > + if (!cxlds->irq_supported) { > + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n"); > + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n"); > + return 0; > + } > + Commit msg says dev_warn() yet here it is dev_err() Can you fit in one msg, something like: "Device does not support mailbox interrupts\n" Perhaps skip the hard stops. No other dev_*() in this file adds them. Documentation/process/coding-style.rst Spellcheck > msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap); > irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum); > if (irq < 0) > @@ -587,7 +593,8 @@ static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds) > return devm_add_action_or_reset(mds->cxlds.dev, free_event_buf, buf); > } > > -static int cxl_alloc_irq_vectors(struct pci_dev *pdev) > +static void cxl_alloc_irq_vectors(struct pci_dev *pdev, > + struct cxl_dev_state *cxlds) > { > int nvecs; > > @@ -604,9 +611,10 @@ static int cxl_alloc_irq_vectors(struct pci_dev *pdev) > PCI_IRQ_MSIX | PCI_IRQ_MSI); > if (nvecs < 1) { > dev_dbg(&pdev->dev, "Failed to alloc irq vectors: %d\n", nvecs); > - return -ENXIO; > + return; > } > - return 0; > + > + cxlds->irq_supported = true; > } > > static irqreturn_t cxl_event_thread(int irq, void *id) > @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, > if (!host_bridge->native_cxl_error) > return 0; > > + /* Polling not supported */ I understand this comment while reading it in the context of this patch. Lacking that context, maybe it deserves a bit more like you wrote in the commit log. Be clear that it's the driver that is not supporting polling, and when if or when the driver does add polling support they'll be an alternative method for processing events. IIUC ;) > + if (!mds->cxlds.irq_supported) { > + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n"); > + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n"); > + return 0; > + } Similar to above > + > rc = cxl_mem_alloc_event_buf(mds); > if (rc) > return rc; > @@ -845,9 +860,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > else > dev_warn(&pdev->dev, "Media not active (%d)\n", rc); > > - rc = cxl_alloc_irq_vectors(pdev); > - if (rc) > - return rc; > + cxl_alloc_irq_vectors(pdev, cxlds); > > rc = cxl_pci_setup_mailbox(mds); > if (rc) > > --- > base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a > change-id: 20240108-dont-fail-irq-a96310368f0f > > Best regards, > -- > Ira Weiny <ira.weiny@intel.com> >
Ira Weiny wrote: > CXL 3.1 Section 3.1.1 states: > > "A Function on a CXL device must not generate INTx messages if > that Function participates in CXL.cache protocol or CXL.mem > protocols." > > The generic CXL memory driver only supports devices which use the > CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X > vectors in anticipation of their need for mailbox interrupts or event > processing. However, the above requirement does not require a device to > support interrupts at all. A device may not use mailbox interrupts and > may be configured for firmware first event processing. > > Rather than fail device probe if interrupts are not supported; flag such > that irqs are not supported and do not enable features which require > interrupts. dev_warn() in those cases which require interrupts but they > were not supported. > > It is possible for a device to have host based event processing through > polling but this patch does not support the addition of such polling. > Leave that to the future if such a device comes along. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > Compile tested only. > > This is an RFC based on errors seen by Dave Larson and reported on > discord. Dan requested that the driver not fail if irqs are not > required. > --- > drivers/cxl/cxlmem.h | 2 ++ > drivers/cxl/pci.c | 25 +++++++++++++++++++------ > 2 files changed, 21 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index a2fcbca253f3..422bc9657e5c 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -410,6 +410,7 @@ enum cxl_devtype { > * @ram_res: Active Volatile memory capacity configuration > * @serial: PCIe Device Serial Number > * @type: Generic Memory Class device or Vendor Specific Memory device > + * @irq_supported: Flag if irqs are supported by the device > */ > struct cxl_dev_state { > struct device *dev; > @@ -424,6 +425,7 @@ struct cxl_dev_state { > struct resource ram_res; > u64 serial; > enum cxl_devtype type; > + bool irq_supported; I would rather not carry this init-time-only relevant flag in perpetuity in the state structure. Let cxl_pci_probe() see the result from cxl_alloc_irq_vectors() and then optionally skip calling setup for features the demand interrupt support. > }; > > /** > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 0155fb66b580..bb90ac011290 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds) > if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ)) > return 0; > > + if (!cxlds->irq_supported) { > + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n"); > + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n"); > + return 0; > + } I see no need to do a emit a log message here as the code is happy to support a mailbox in polled mode. I.e. this is not an error that the user should call their device-vendor about because end user will see no loss of functionality. The code right after this is already fully tolerant of IRQ setup errors: irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum); if (irq < 0) return 0; if (cxl_request_irq(cxlds, irq, cxl_pci_mbox_irq)) return 0; > + > msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap); > irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum); > if (irq < 0) > @@ -587,7 +593,8 @@ static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds) > return devm_add_action_or_reset(mds->cxlds.dev, free_event_buf, buf); > } > > -static int cxl_alloc_irq_vectors(struct pci_dev *pdev) > +static void cxl_alloc_irq_vectors(struct pci_dev *pdev, > + struct cxl_dev_state *cxlds) > { > int nvecs; > > @@ -604,9 +611,10 @@ static int cxl_alloc_irq_vectors(struct pci_dev *pdev) > PCI_IRQ_MSIX | PCI_IRQ_MSI); > if (nvecs < 1) { > dev_dbg(&pdev->dev, "Failed to alloc irq vectors: %d\n", nvecs); > - return -ENXIO; > + return; > } > - return 0; > + > + cxlds->irq_supported = true; > } > > static irqreturn_t cxl_event_thread(int irq, void *id) > @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, > if (!host_bridge->native_cxl_error) > return 0; > > + /* Polling not supported */ > + if (!mds->cxlds.irq_supported) { > + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n"); > + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n"); > + return 0; > + } This one can be a dev_info(), since there is no polling fallback and it is unlikely that a device supports events without supporting interrupts. ...or maybe unify all these notifications in the result from cxl_alloc_irq_vectors(): rc = cxl_alloc_irq_vectors(); if (rc) { dev_dbg(dev, "No interrupt support, interrupt-dependent features disabled.\n"); interrupts_supported = false; } Where dev_dbg() instead of dev_info() because the people that are missing features will report this debug log and upstream can say... "yup, there's your problem". Where users with cards that are known to not support interrupts do not otherwise spam the logs with info they know already. I also note that cxl_request_irq() will do the right thing, so likely don't even need that interrupts_supported flag.
Alison Schofield wrote: > On Mon, Jan 08, 2024 at 11:51:13PM -0800, Ira Weiny wrote: > > CXL 3.1 Section 3.1.1 states: > > [snip] > > /** > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 0155fb66b580..bb90ac011290 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds) > > if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ)) > > return 0; > > > > + if (!cxlds->irq_supported) { > > + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n"); > > + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n"); > > + return 0; > > + } > > + > > Commit msg says dev_warn() yet here it is dev_err() > > Can you fit in one msg, something like: > "Device does not support mailbox interrupts\n" > > Perhaps skip the hard stops. No other dev_*() in this file adds them. Dan had comments on cleaning up the error messages. I'll do those. > Documentation/process/coding-style.rst > > Spellcheck Thanks. [snip] > > > > static irqreturn_t cxl_event_thread(int irq, void *id) > > @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, > > if (!host_bridge->native_cxl_error) > > return 0; > > > > + /* Polling not supported */ > > I understand this comment while reading it in the context of this patch. > Lacking that context, maybe it deserves a bit more like you wrote in > the commit log. Be clear that it's the driver that is not supporting > polling, and when if or when the driver does add polling support they'll > be an alternative method for processing events. IIUC ;) Yea I wanted to make it clear that polling is an option for the driver but it was not supported because I don't anticipate the need. But should a device come along which requires polling (ie no irq support) it would be nice to leave breadcrumbs... but... > > > > + if (!mds->cxlds.irq_supported) { > > + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n"); > > + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n"); ... here I mention polling not supported which is redundant to the comment. Comment should be deleted. > > + return 0; > > + } > > Similar to above Yep will clean up based on Dan's feedback. Thanks for the review! Ira [snip]
Dan Williams wrote: > Ira Weiny wrote: [snip] > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > > index a2fcbca253f3..422bc9657e5c 100644 > > --- a/drivers/cxl/cxlmem.h > > +++ b/drivers/cxl/cxlmem.h > > @@ -410,6 +410,7 @@ enum cxl_devtype { > > * @ram_res: Active Volatile memory capacity configuration > > * @serial: PCIe Device Serial Number > > * @type: Generic Memory Class device or Vendor Specific Memory device > > + * @irq_supported: Flag if irqs are supported by the device > > */ > > struct cxl_dev_state { > > struct device *dev; > > @@ -424,6 +425,7 @@ struct cxl_dev_state { > > struct resource ram_res; > > u64 serial; > > enum cxl_devtype type; > > + bool irq_supported; > > I would rather not carry this init-time-only relevant flag in perpetuity > in the state structure. Fair enough. > Let cxl_pci_probe() see the result from > cxl_alloc_irq_vectors() and then optionally skip calling setup for > features the demand interrupt support. yea better the bool is a local variable to cxl_pci_probe(). > > > }; > > > > /** > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 0155fb66b580..bb90ac011290 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds) > > if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ)) > > return 0; > > > > + if (!cxlds->irq_supported) { > > + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n"); > > + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n"); > > + return 0; > > + } > > I see no need to do a emit a log message here as the code is happy to > support a mailbox in polled mode. True. However this indicates an error with the device IMO. The device did not support MSI/MSI-X but yet indicates irq support for mailboxes. That is not a well behaved device even it it will work. We are not failing the probe here but I think the error gives users good insight. We could just make it dev_dbg() though. > I.e. this is not an error that the > user should call their device-vendor about because end user will see no > loss of functionality. But it is not exactly a nice device IMO. > > The code right after this is already fully tolerant of IRQ setup errors: Agreed which is why only the error was printed and the irq setup calls skipped for good measure. If you feel strongly about it I can just drop the hunk but I still think it is worth some message for those devices behaving this way. > > irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum); > if (irq < 0) > return 0; > > if (cxl_request_irq(cxlds, irq, cxl_pci_mbox_irq)) > return 0; > > [snip] > > > > static irqreturn_t cxl_event_thread(int irq, void *id) > > @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, > > if (!host_bridge->native_cxl_error) > > return 0; > > > > + /* Polling not supported */ > > + if (!mds->cxlds.irq_supported) { > > + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n"); > > + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n"); > > + return 0; > > + } > > This one can be a dev_info(), since there is no polling fallback and it > is unlikely that a device supports events without supporting interrupts. Sounds good. > > ...or maybe unify all these notifications in the result from > cxl_alloc_irq_vectors(): > > rc = cxl_alloc_irq_vectors(); > if (rc) { > dev_dbg(dev, "No interrupt support, interrupt-dependent features disabled.\n"); > interrupts_supported = false; > } > > Where dev_dbg() instead of dev_info() because the people that are > missing features will report this debug log and upstream can say... > "yup, there's your problem". Where users with cards that are known to > not support interrupts do not otherwise spam the logs with info they > know already. > > I also note that cxl_request_irq() will do the right thing, so likely > don't even need that interrupts_supported flag. Perhaps, but devices which don't support interrupts by design (and don't attempt to have any irq features) should be silent IMO. Why spam the log with that information even if only during a debug session. For example if a user has 2 devices, 1 broken from vendor X and 1 which just does not do irqs from vendor Y, the above would be printed for both devices when they are trying to debug the broken device. Then they have to rely on both vendors to report back. In the case of reporting an actual error they can call vendor X and leave vendor Y alone. I know it is more code and you wanted the smallest possible change but I think this is worth some code. I'll rework this a bit and send a V1 for real review. Ira
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index a2fcbca253f3..422bc9657e5c 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -410,6 +410,7 @@ enum cxl_devtype { * @ram_res: Active Volatile memory capacity configuration * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device + * @irq_supported: Flag if irqs are supported by the device */ struct cxl_dev_state { struct device *dev; @@ -424,6 +425,7 @@ struct cxl_dev_state { struct resource ram_res; u64 serial; enum cxl_devtype type; + bool irq_supported; }; /** diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 0155fb66b580..bb90ac011290 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds) if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ)) return 0; + if (!cxlds->irq_supported) { + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n"); + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n"); + return 0; + } + msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap); irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum); if (irq < 0) @@ -587,7 +593,8 @@ static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds) return devm_add_action_or_reset(mds->cxlds.dev, free_event_buf, buf); } -static int cxl_alloc_irq_vectors(struct pci_dev *pdev) +static void cxl_alloc_irq_vectors(struct pci_dev *pdev, + struct cxl_dev_state *cxlds) { int nvecs; @@ -604,9 +611,10 @@ static int cxl_alloc_irq_vectors(struct pci_dev *pdev) PCI_IRQ_MSIX | PCI_IRQ_MSI); if (nvecs < 1) { dev_dbg(&pdev->dev, "Failed to alloc irq vectors: %d\n", nvecs); - return -ENXIO; + return; } - return 0; + + cxlds->irq_supported = true; } static irqreturn_t cxl_event_thread(int irq, void *id) @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (!host_bridge->native_cxl_error) return 0; + /* Polling not supported */ + if (!mds->cxlds.irq_supported) { + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n"); + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n"); + return 0; + } + rc = cxl_mem_alloc_event_buf(mds); if (rc) return rc; @@ -845,9 +860,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) else dev_warn(&pdev->dev, "Media not active (%d)\n", rc); - rc = cxl_alloc_irq_vectors(pdev); - if (rc) - return rc; + cxl_alloc_irq_vectors(pdev, cxlds); rc = cxl_pci_setup_mailbox(mds); if (rc)
CXL 3.1 Section 3.1.1 states: "A Function on a CXL device must not generate INTx messages if that Function participates in CXL.cache protocol or CXL.mem protocols." The generic CXL memory driver only supports devices which use the CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X vectors in anticipation of their need for mailbox interrupts or event processing. However, the above requirement does not require a device to support interrupts at all. A device may not use mailbox interrupts and may be configured for firmware first event processing. Rather than fail device probe if interrupts are not supported; flag such that irqs are not supported and do not enable features which require interrupts. dev_warn() in those cases which require interrupts but they were not supported. It is possible for a device to have host based event processing through polling but this patch does not support the addition of such polling. Leave that to the future if such a device comes along. Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- Compile tested only. This is an RFC based on errors seen by Dave Larson and reported on discord. Dan requested that the driver not fail if irqs are not required. --- drivers/cxl/cxlmem.h | 2 ++ drivers/cxl/pci.c | 25 +++++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-) --- base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a change-id: 20240108-dont-fail-irq-a96310368f0f Best regards,