Message ID | 20211120000250.1663391-6-ben.widawsky@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | Add drivers for CXL ports and mem devices | expand |
On Fri, 19 Nov 2021 16:02:32 -0800 Ben Widawsky <ben.widawsky@intel.com> wrote: > The expectation is that the mailbox interface ready bit is the first > step in access through the mailbox interface. Reword this? Perhaps "The expectation is that the mailbox interface ready bit will be set at the start of any access through the mailbox interface." > Therefore, waiting for the > doorbell busy bit to be clear would imply that the mailbox interface is > ready. The original driver implementation used the doorbell timeout for > the Mailbox Interface Ready bit to piggyback off of, since the latter > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > Find device capabilities"), a timeout has since been defined with an ECN > to the 2.0 spec). With the current driver waiting for mailbox interface > ready as a part of probe() it's no longer necessary to use the > piggyback. > > With the piggybacking no longer necessary it doesn't make sense to check > doorbell status when acquiring the mailbox. It will be checked during > the normal mailbox exchange protocol. > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Trivial comment inline - with that fixed either by calling it out, or by pulling it out of this patch. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > This patch did not exist in RFCv2 > --- > drivers/cxl/pci.c | 25 ++++++------------------- > 1 file changed, 6 insertions(+), 19 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 2cef9fec8599..869b4fc18e27 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > /* > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the Whilst it's trivial, I'd prefer white space cleanup in separate patches. I guess this one is obvious enough to just call out in the patch description though. > * bit is to allow firmware running on the device to notify the driver > - * that it's ready to receive commands. It is unclear if the bit needs > - * to be read for each transaction mailbox, ie. the firmware can switch > - * it on and off as needed. Second, there is no defined timeout for > - * mailbox ready, like there is for the doorbell interface. > - * > - * Assumptions: > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > - * it for every command. > - * > - * 2. If the doorbell is clear, the firmware should have first set the > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > - * to be ready is sufficient. > + * that it's ready to receive commands. The spec does not clearly define > + * under what conditions the bit may get set or cleared. As of the 2.0 > + * base specification there was no defined timeout for mailbox ready, > + * like there is for the doorbell interface. This was fixed with an ECN, > + * but it's possible early devices implemented this before the ECN. > */ > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > - if (rc) { > - dev_warn(dev, "Mailbox interface not ready\n"); > - goto out; > - } > - > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
On 21-11-22 15:11:31, Jonathan Cameron wrote: > On Fri, 19 Nov 2021 16:02:32 -0800 > Ben Widawsky <ben.widawsky@intel.com> wrote: > > > The expectation is that the mailbox interface ready bit is the first > > step in access through the mailbox interface. > > Reword this? Perhaps > "The expectation is that the mailbox interface ready bit will be set > at the start of any access through the mailbox interface." > > > Therefore, waiting for the > > doorbell busy bit to be clear would imply that the mailbox interface is > > ready. The original driver implementation used the doorbell timeout for > > the Mailbox Interface Ready bit to piggyback off of, since the latter > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > > Find device capabilities"), a timeout has since been defined with an ECN > > to the 2.0 spec). With the current driver waiting for mailbox interface > > ready as a part of probe() it's no longer necessary to use the > > piggyback. > > > > With the piggybacking no longer necessary it doesn't make sense to check > > doorbell status when acquiring the mailbox. It will be checked during > > the normal mailbox exchange protocol. > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > Trivial comment inline - with that fixed either by calling it out, or by > pulling it out of this patch. > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > --- > > This patch did not exist in RFCv2 > > --- > > drivers/cxl/pci.c | 25 ++++++------------------- > > 1 file changed, 6 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 2cef9fec8599..869b4fc18e27 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > > > /* > > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > Whilst it's trivial, I'd prefer white space cleanup in separate patches. > I guess this one is obvious enough to just call out in the patch description > though. > Okay. I'll keep this in mind for the future, and just fixup the commit messages with your suggestion and this, now. Thanks. Ben > > * bit is to allow firmware running on the device to notify the driver > > - * that it's ready to receive commands. It is unclear if the bit needs > > - * to be read for each transaction mailbox, ie. the firmware can switch > > - * it on and off as needed. Second, there is no defined timeout for > > - * mailbox ready, like there is for the doorbell interface. > > - * > > - * Assumptions: > > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > > - * it for every command. > > - * > > - * 2. If the doorbell is clear, the firmware should have first set the > > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > > - * to be ready is sufficient. > > + * that it's ready to receive commands. The spec does not clearly define > > + * under what conditions the bit may get set or cleared. As of the 2.0 > > + * base specification there was no defined timeout for mailbox ready, > > + * like there is for the doorbell interface. This was fixed with an ECN, > > + * but it's possible early devices implemented this before the ECN. > > */ > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > - if (rc) { > > - dev_warn(dev, "Mailbox interface not ready\n"); > > - goto out; > > - } > > - > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); >
On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > The expectation is that the mailbox interface ready bit is the first > step in access through the mailbox interface. Therefore, waiting for the > doorbell busy bit to be clear would imply that the mailbox interface is > ready. The original driver implementation used the doorbell timeout for > the Mailbox Interface Ready bit to piggyback off of, since the latter > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > Find device capabilities"), a timeout has since been defined with an ECN > to the 2.0 spec). With the current driver waiting for mailbox interface > ready as a part of probe() it's no longer necessary to use the > piggyback. > > With the piggybacking no longer necessary it doesn't make sense to check > doorbell status when acquiring the mailbox. It will be checked during > the normal mailbox exchange protocol. > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > --- > This patch did not exist in RFCv2 > --- > drivers/cxl/pci.c | 25 ++++++------------------- > 1 file changed, 6 insertions(+), 19 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 2cef9fec8599..869b4fc18e27 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > /* > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > * bit is to allow firmware running on the device to notify the driver > - * that it's ready to receive commands. It is unclear if the bit needs > - * to be read for each transaction mailbox, ie. the firmware can switch > - * it on and off as needed. Second, there is no defined timeout for > - * mailbox ready, like there is for the doorbell interface. > - * > - * Assumptions: > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > - * it for every command. > - * > - * 2. If the doorbell is clear, the firmware should have first set the > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > - * to be ready is sufficient. > + * that it's ready to receive commands. The spec does not clearly define > + * under what conditions the bit may get set or cleared. As of the 2.0 > + * base specification there was no defined timeout for mailbox ready, > + * like there is for the doorbell interface. This was fixed with an ECN, > + * but it's possible early devices implemented this before the ECN. Can we just drop comment block altogether? Outside of cxl_pci_setup_mailbox() the only time the mailbox status should be checked is after a doorbell timeout after submitting a command. > */ > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > - if (rc) { > - dev_warn(dev, "Mailbox interface not ready\n"); > - goto out; > - } > - > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); This error message is obsolete since nothing is pre-checking the mailbox anymore, and per above I see no problem waiting to check the status until after the mailbox has failed to respond after a timeout.
On 21-11-24 13:55:03, Dan Williams wrote: > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > The expectation is that the mailbox interface ready bit is the first > > step in access through the mailbox interface. Therefore, waiting for the > > doorbell busy bit to be clear would imply that the mailbox interface is > > ready. The original driver implementation used the doorbell timeout for > > the Mailbox Interface Ready bit to piggyback off of, since the latter > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > > Find device capabilities"), a timeout has since been defined with an ECN > > to the 2.0 spec). With the current driver waiting for mailbox interface > > ready as a part of probe() it's no longer necessary to use the > > piggyback. > > > > With the piggybacking no longer necessary it doesn't make sense to check > > doorbell status when acquiring the mailbox. It will be checked during > > the normal mailbox exchange protocol. > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > --- > > This patch did not exist in RFCv2 > > --- > > drivers/cxl/pci.c | 25 ++++++------------------- > > 1 file changed, 6 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > index 2cef9fec8599..869b4fc18e27 100644 > > --- a/drivers/cxl/pci.c > > +++ b/drivers/cxl/pci.c > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > > > /* > > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > * bit is to allow firmware running on the device to notify the driver > > - * that it's ready to receive commands. It is unclear if the bit needs > > - * to be read for each transaction mailbox, ie. the firmware can switch > > - * it on and off as needed. Second, there is no defined timeout for > > - * mailbox ready, like there is for the doorbell interface. > > - * > > - * Assumptions: > > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > > - * it for every command. > > - * > > - * 2. If the doorbell is clear, the firmware should have first set the > > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > > - * to be ready is sufficient. > > + * that it's ready to receive commands. The spec does not clearly define > > + * under what conditions the bit may get set or cleared. As of the 2.0 > > + * base specification there was no defined timeout for mailbox ready, > > + * like there is for the doorbell interface. This was fixed with an ECN, > > + * but it's possible early devices implemented this before the ECN. > > Can we just drop comment block altogether? Outside of > cxl_pci_setup_mailbox() the only time the mailbox status should be > checked is after a doorbell timeout after submitting a command. > Yes, I think it's fine to drop it. > > */ > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > - if (rc) { > > - dev_warn(dev, "Mailbox interface not ready\n"); > > - goto out; > > - } > > - > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > > This error message is obsolete since nothing is pre-checking the > mailbox anymore, and per above I see no problem waiting to check the > status until after the mailbox has failed to respond after a timeout. The message is wrong, but I think the logic is still valuable. How about: "mbox: reported interface ready, but mbox not ready"
On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > On 21-11-24 13:55:03, Dan Williams wrote: > > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > > > The expectation is that the mailbox interface ready bit is the first > > > step in access through the mailbox interface. Therefore, waiting for the > > > doorbell busy bit to be clear would imply that the mailbox interface is > > > ready. The original driver implementation used the doorbell timeout for > > > the Mailbox Interface Ready bit to piggyback off of, since the latter > > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > > > Find device capabilities"), a timeout has since been defined with an ECN > > > to the 2.0 spec). With the current driver waiting for mailbox interface > > > ready as a part of probe() it's no longer necessary to use the > > > piggyback. > > > > > > With the piggybacking no longer necessary it doesn't make sense to check > > > doorbell status when acquiring the mailbox. It will be checked during > > > the normal mailbox exchange protocol. > > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > --- > > > This patch did not exist in RFCv2 > > > --- > > > drivers/cxl/pci.c | 25 ++++++------------------- > > > 1 file changed, 6 insertions(+), 19 deletions(-) > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > > index 2cef9fec8599..869b4fc18e27 100644 > > > --- a/drivers/cxl/pci.c > > > +++ b/drivers/cxl/pci.c > > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > > > > > /* > > > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > > > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > * bit is to allow firmware running on the device to notify the driver > > > - * that it's ready to receive commands. It is unclear if the bit needs > > > - * to be read for each transaction mailbox, ie. the firmware can switch > > > - * it on and off as needed. Second, there is no defined timeout for > > > - * mailbox ready, like there is for the doorbell interface. > > > - * > > > - * Assumptions: > > > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > > > - * it for every command. > > > - * > > > - * 2. If the doorbell is clear, the firmware should have first set the > > > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > > > - * to be ready is sufficient. > > > + * that it's ready to receive commands. The spec does not clearly define > > > + * under what conditions the bit may get set or cleared. As of the 2.0 > > > + * base specification there was no defined timeout for mailbox ready, > > > + * like there is for the doorbell interface. This was fixed with an ECN, > > > + * but it's possible early devices implemented this before the ECN. > > > > Can we just drop comment block altogether? Outside of > > cxl_pci_setup_mailbox() the only time the mailbox status should be > > checked is after a doorbell timeout after submitting a command. > > > > Yes, I think it's fine to drop it. > > > > */ > > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > > - if (rc) { > > > - dev_warn(dev, "Mailbox interface not ready\n"); > > > - goto out; > > > - } > > > - > > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > > > > This error message is obsolete since nothing is pre-checking the > > mailbox anymore, and per above I see no problem waiting to check the > > status until after the mailbox has failed to respond after a timeout. > > The message is wrong, but I think the logic is still valuable. How about: > "mbox: reported interface ready, but mbox not ready" You mean check this every time even though the spec says the driver only needs to check it once per-reset?
On 21-11-29 11:02:41, Dan Williams wrote: > On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > On 21-11-24 13:55:03, Dan Williams wrote: > > > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > > > > > The expectation is that the mailbox interface ready bit is the first > > > > step in access through the mailbox interface. Therefore, waiting for the > > > > doorbell busy bit to be clear would imply that the mailbox interface is > > > > ready. The original driver implementation used the doorbell timeout for > > > > the Mailbox Interface Ready bit to piggyback off of, since the latter > > > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > > > > Find device capabilities"), a timeout has since been defined with an ECN > > > > to the 2.0 spec). With the current driver waiting for mailbox interface > > > > ready as a part of probe() it's no longer necessary to use the > > > > piggyback. > > > > > > > > With the piggybacking no longer necessary it doesn't make sense to check > > > > doorbell status when acquiring the mailbox. It will be checked during > > > > the normal mailbox exchange protocol. > > > > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > > --- > > > > This patch did not exist in RFCv2 > > > > --- > > > > drivers/cxl/pci.c | 25 ++++++------------------- > > > > 1 file changed, 6 insertions(+), 19 deletions(-) > > > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > > > index 2cef9fec8599..869b4fc18e27 100644 > > > > --- a/drivers/cxl/pci.c > > > > +++ b/drivers/cxl/pci.c > > > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > > > > > > > /* > > > > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > > > > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > > * bit is to allow firmware running on the device to notify the driver > > > > - * that it's ready to receive commands. It is unclear if the bit needs > > > > - * to be read for each transaction mailbox, ie. the firmware can switch > > > > - * it on and off as needed. Second, there is no defined timeout for > > > > - * mailbox ready, like there is for the doorbell interface. > > > > - * > > > > - * Assumptions: > > > > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > > > > - * it for every command. > > > > - * > > > > - * 2. If the doorbell is clear, the firmware should have first set the > > > > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > > > > - * to be ready is sufficient. > > > > + * that it's ready to receive commands. The spec does not clearly define > > > > + * under what conditions the bit may get set or cleared. As of the 2.0 > > > > + * base specification there was no defined timeout for mailbox ready, > > > > + * like there is for the doorbell interface. This was fixed with an ECN, > > > > + * but it's possible early devices implemented this before the ECN. > > > > > > Can we just drop comment block altogether? Outside of > > > cxl_pci_setup_mailbox() the only time the mailbox status should be > > > checked is after a doorbell timeout after submitting a command. > > > > > > > Yes, I think it's fine to drop it. > > > > > > */ > > > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > > > - if (rc) { > > > > - dev_warn(dev, "Mailbox interface not ready\n"); > > > > - goto out; > > > > - } > > > > - > > > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > > > > > > This error message is obsolete since nothing is pre-checking the > > > mailbox anymore, and per above I see no problem waiting to check the > > > status until after the mailbox has failed to respond after a timeout. > > > > The message is wrong, but I think the logic is still valuable. How about: > > "mbox: reported interface ready, but mbox not ready" > > You mean check this every time even though the spec says the driver > only needs to check it once per-reset? Unfortunately it does not say that. "... it shall remain set until the next reset or the device encounters an error that prevents any mailbox communication." Once we have real error checking in place, this could go away, though I see no harm in leaving it.
On Mon, Nov 29, 2021 at 11:11 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > On 21-11-29 11:02:41, Dan Williams wrote: > > On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > > > On 21-11-24 13:55:03, Dan Williams wrote: > > > > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote: > > > > > > > > > > The expectation is that the mailbox interface ready bit is the first > > > > > step in access through the mailbox interface. Therefore, waiting for the > > > > > doorbell busy bit to be clear would imply that the mailbox interface is > > > > > ready. The original driver implementation used the doorbell timeout for > > > > > the Mailbox Interface Ready bit to piggyback off of, since the latter > > > > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: > > > > > Find device capabilities"), a timeout has since been defined with an ECN > > > > > to the 2.0 spec). With the current driver waiting for mailbox interface > > > > > ready as a part of probe() it's no longer necessary to use the > > > > > piggyback. > > > > > > > > > > With the piggybacking no longer necessary it doesn't make sense to check > > > > > doorbell status when acquiring the mailbox. It will be checked during > > > > > the normal mailbox exchange protocol. > > > > > > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > > > --- > > > > > This patch did not exist in RFCv2 > > > > > --- > > > > > drivers/cxl/pci.c | 25 ++++++------------------- > > > > > 1 file changed, 6 insertions(+), 19 deletions(-) > > > > > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > > > > > index 2cef9fec8599..869b4fc18e27 100644 > > > > > --- a/drivers/cxl/pci.c > > > > > +++ b/drivers/cxl/pci.c > > > > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) > > > > > > > > > > /* > > > > > * XXX: There is some amount of ambiguity in the 2.0 version of the spec > > > > > - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > > > + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the > > > > > * bit is to allow firmware running on the device to notify the driver > > > > > - * that it's ready to receive commands. It is unclear if the bit needs > > > > > - * to be read for each transaction mailbox, ie. the firmware can switch > > > > > - * it on and off as needed. Second, there is no defined timeout for > > > > > - * mailbox ready, like there is for the doorbell interface. > > > > > - * > > > > > - * Assumptions: > > > > > - * 1. The firmware might toggle the Mailbox Interface Ready bit, check > > > > > - * it for every command. > > > > > - * > > > > > - * 2. If the doorbell is clear, the firmware should have first set the > > > > > - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell > > > > > - * to be ready is sufficient. > > > > > + * that it's ready to receive commands. The spec does not clearly define > > > > > + * under what conditions the bit may get set or cleared. As of the 2.0 > > > > > + * base specification there was no defined timeout for mailbox ready, > > > > > + * like there is for the doorbell interface. This was fixed with an ECN, > > > > > + * but it's possible early devices implemented this before the ECN. > > > > > > > > Can we just drop comment block altogether? Outside of > > > > cxl_pci_setup_mailbox() the only time the mailbox status should be > > > > checked is after a doorbell timeout after submitting a command. > > > > > > > > > > Yes, I think it's fine to drop it. > > > > > > > > */ > > > > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > > > > - if (rc) { > > > > > - dev_warn(dev, "Mailbox interface not ready\n"); > > > > > - goto out; > > > > > - } > > > > > - > > > > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > > > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > > > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > > > > > > > > This error message is obsolete since nothing is pre-checking the > > > > mailbox anymore, and per above I see no problem waiting to check the > > > > status until after the mailbox has failed to respond after a timeout. > > > > > > The message is wrong, but I think the logic is still valuable. How about: > > > "mbox: reported interface ready, but mbox not ready" > > > > You mean check this every time even though the spec says the driver > > only needs to check it once per-reset? > > Unfortunately it does not say that. "... it shall remain set until the next > reset or the device encounters an error that prevents any mailbox > communication." > > Once we have real error checking in place, this could go away, though I see no > harm in leaving it. Right, there's no harm in the check, it just seems overly paranoid to me if it was already checked once. Until a doorbell timeout happens it's an extra MMIO cycle that can saved for a "what happened?" check after a timeout.
On 21-11-29 11:18:36, Dan Williams wrote: [snip] > > > > > > - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); > > > > > > - if (rc) { > > > > > > - dev_warn(dev, "Mailbox interface not ready\n"); > > > > > > - goto out; > > > > > > - } > > > > > > - > > > > > > md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); > > > > > > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > > > > > > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > > > > > > > > > > This error message is obsolete since nothing is pre-checking the > > > > > mailbox anymore, and per above I see no problem waiting to check the > > > > > status until after the mailbox has failed to respond after a timeout. > > > > > > > > The message is wrong, but I think the logic is still valuable. How about: > > > > "mbox: reported interface ready, but mbox not ready" > > > > > > You mean check this every time even though the spec says the driver > > > only needs to check it once per-reset? > > > > Unfortunately it does not say that. "... it shall remain set until the next > > reset or the device encounters an error that prevents any mailbox > > communication." > > > > Once we have real error checking in place, this could go away, though I see no > > harm in leaving it. > > Right, there's no harm in the check, it just seems overly paranoid to > me if it was already checked once. Until a doorbell timeout happens > it's an extra MMIO cycle that can saved for a "what happened?" check > after a timeout. Well I suspect we're just rearranging the deck chairs on the Titanic now, but... I see doorbell timeouts as disconnected from whether or not the mailbox interface is ready. If they were the same, we wouldn't need both bits and we could just wait extra long for the doorbell when probing. In other words, I expect if the interface goes unready, doorbell timeout will occur, but I don't think we should assume if doorbell timeout occurs, the interface is no longer ready. I don't purport to know why a doorbell timeout might occur while the interface remains available (likely a firmware bug, I presume). It does seem interesting to check if the interface is no longer ready on timeout though.
On Mon, Nov 29, 2021 at 11:32 AM Ben Widawsky <ben.widawsky@intel.com> wrote: [..] > > > > Right, there's no harm in the check, it just seems overly paranoid to > > me if it was already checked once. Until a doorbell timeout happens > > it's an extra MMIO cycle that can saved for a "what happened?" check > > after a timeout. > > Well I suspect we're just rearranging the deck chairs on the Titanic now, but... Not so much, just trying to get this driver in line with other error handling designs. > I see doorbell timeouts as disconnected from whether or not the mailbox > interface is ready. If they were the same, we wouldn't need both bits and we > could just wait extra long for the doorbell when probing. > > In other words, I expect if the interface goes unready, doorbell timeout will > occur, but I don't think we should assume if doorbell timeout occurs, the > interface is no longer ready. I don't purport to know why a doorbell timeout > might occur while the interface remains available (likely a firmware bug, I > presume). > > It does seem interesting to check if the interface is no longer ready on timeout > though. So I'm just modeling this off of NVME error handling where there is a Controller Fatal Status bit that could be checked every transaction, but instead the driver waits until a command timeout to collect if the device went fatal / not-ready.
On 21-11-29 11:37:34, Dan Williams wrote: > On Mon, Nov 29, 2021 at 11:32 AM Ben Widawsky <ben.widawsky@intel.com> wrote: > [..] > > > > > > Right, there's no harm in the check, it just seems overly paranoid to > > > me if it was already checked once. Until a doorbell timeout happens > > > it's an extra MMIO cycle that can saved for a "what happened?" check > > > after a timeout. > > > > Well I suspect we're just rearranging the deck chairs on the Titanic now, but... > > Not so much, just trying to get this driver in line with other error > handling designs. Okay. I shall remove it then. > > > I see doorbell timeouts as disconnected from whether or not the mailbox > > interface is ready. If they were the same, we wouldn't need both bits and we > > could just wait extra long for the doorbell when probing. > > > > In other words, I expect if the interface goes unready, doorbell timeout will > > occur, but I don't think we should assume if doorbell timeout occurs, the > > interface is no longer ready. I don't purport to know why a doorbell timeout > > might occur while the interface remains available (likely a firmware bug, I > > presume). > > > > It does seem interesting to check if the interface is no longer ready on timeout > > though. > > So I'm just modeling this off of NVME error handling where there is a > Controller Fatal Status bit that could be checked every transaction, > but instead the driver waits until a command timeout to collect if the > device went fatal / not-ready. No error interrupts?
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 2cef9fec8599..869b4fc18e27 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds) /* * XXX: There is some amount of ambiguity in the 2.0 version of the spec - * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the + * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the * bit is to allow firmware running on the device to notify the driver - * that it's ready to receive commands. It is unclear if the bit needs - * to be read for each transaction mailbox, ie. the firmware can switch - * it on and off as needed. Second, there is no defined timeout for - * mailbox ready, like there is for the doorbell interface. - * - * Assumptions: - * 1. The firmware might toggle the Mailbox Interface Ready bit, check - * it for every command. - * - * 2. If the doorbell is clear, the firmware should have first set the - * Mailbox Interface Ready bit. Therefore, waiting for the doorbell - * to be ready is sufficient. + * that it's ready to receive commands. The spec does not clearly define + * under what conditions the bit may get set or cleared. As of the 2.0 + * base specification there was no defined timeout for mailbox ready, + * like there is for the doorbell interface. This was fixed with an ECN, + * but it's possible early devices implemented this before the ECN. */ - rc = cxl_pci_mbox_wait_for_doorbell(cxlds); - if (rc) { - dev_warn(dev, "Mailbox interface not ready\n"); - goto out; - } - md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
The expectation is that the mailbox interface ready bit is the first step in access through the mailbox interface. Therefore, waiting for the doorbell busy bit to be clear would imply that the mailbox interface is ready. The original driver implementation used the doorbell timeout for the Mailbox Interface Ready bit to piggyback off of, since the latter doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem: Find device capabilities"), a timeout has since been defined with an ECN to the 2.0 spec). With the current driver waiting for mailbox interface ready as a part of probe() it's no longer necessary to use the piggyback. With the piggybacking no longer necessary it doesn't make sense to check doorbell status when acquiring the mailbox. It will be checked during the normal mailbox exchange protocol. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> --- This patch did not exist in RFCv2 --- drivers/cxl/pci.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-)