diff mbox series

[05/23] cxl/pci: Don't poll doorbell for mailbox access

Message ID 20211120000250.1663391-6-ben.widawsky@intel.com
State Superseded
Headers show
Series Add drivers for CXL ports and mem devices | expand

Commit Message

Ben Widawsky Nov. 20, 2021, 12:02 a.m. UTC
The expectation is that the mailbox interface ready bit is the first
step in access through the mailbox interface. Therefore, waiting for the
doorbell busy bit to be clear would imply that the mailbox interface is
ready. The original driver implementation used the doorbell timeout for
the Mailbox Interface Ready bit to piggyback off of, since the latter
doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
Find device capabilities"), a timeout has since been defined with an ECN
to the 2.0 spec). With the current driver waiting for mailbox interface
ready as a part of probe() it's no longer necessary to use the
piggyback.

With the piggybacking no longer necessary it doesn't make sense to check
doorbell status when acquiring the mailbox. It will be checked during
the normal mailbox exchange protocol.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
This patch did not exist in RFCv2
---
 drivers/cxl/pci.c | 25 ++++++-------------------
 1 file changed, 6 insertions(+), 19 deletions(-)

Comments

Jonathan Cameron Nov. 22, 2021, 3:11 p.m. UTC | #1
On Fri, 19 Nov 2021 16:02:32 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

> The expectation is that the mailbox interface ready bit is the first
> step in access through the mailbox interface.

Reword this? Perhaps
"The expectation is that the mailbox interface ready bit will be set
 at the start of any access through the mailbox interface."

> Therefore, waiting for the
> doorbell busy bit to be clear would imply that the mailbox interface is
> ready. The original driver implementation used the doorbell timeout for
> the Mailbox Interface Ready bit to piggyback off of, since the latter
> doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> Find device capabilities"), a timeout has since been defined with an ECN
> to the 2.0 spec). With the current driver waiting for mailbox interface
> ready as a part of probe() it's no longer necessary to use the
> piggyback.
> 
> With the piggybacking no longer necessary it doesn't make sense to check
> doorbell status when acquiring the mailbox. It will be checked during
> the normal mailbox exchange protocol.
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Trivial comment inline - with that fixed either by calling it out, or by
pulling it out of this patch.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
> This patch did not exist in RFCv2
> ---
>  drivers/cxl/pci.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 2cef9fec8599..869b4fc18e27 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
>  
>  	/*
>  	 * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> -	 * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> +	 * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the

Whilst it's trivial, I'd prefer white space cleanup in separate patches.
I guess this one is obvious enough to just call out in the patch description
though.

>  	 * bit is to allow firmware running on the device to notify the driver
> -	 * that it's ready to receive commands. It is unclear if the bit needs
> -	 * to be read for each transaction mailbox, ie. the firmware can switch
> -	 * it on and off as needed. Second, there is no defined timeout for
> -	 * mailbox ready, like there is for the doorbell interface.
> -	 *
> -	 * Assumptions:
> -	 * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> -	 *    it for every command.
> -	 *
> -	 * 2. If the doorbell is clear, the firmware should have first set the
> -	 *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> -	 *    to be ready is sufficient.
> +	 * that it's ready to receive commands. The spec does not clearly define
> +	 * under what conditions the bit may get set or cleared. As of the 2.0
> +	 * base specification there was no defined timeout for mailbox ready,
> +	 * like there is for the doorbell interface. This was fixed with an ECN,
> +	 * but it's possible early devices implemented this before the ECN.
>  	 */
> -	rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> -	if (rc) {
> -		dev_warn(dev, "Mailbox interface not ready\n");
> -		goto out;
> -	}
> -
>  	md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>  	if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
>  		dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
Ben Widawsky Nov. 22, 2021, 5:24 p.m. UTC | #2
On 21-11-22 15:11:31, Jonathan Cameron wrote:
> On Fri, 19 Nov 2021 16:02:32 -0800
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > The expectation is that the mailbox interface ready bit is the first
> > step in access through the mailbox interface.
> 
> Reword this? Perhaps
> "The expectation is that the mailbox interface ready bit will be set
>  at the start of any access through the mailbox interface."
> 
> > Therefore, waiting for the
> > doorbell busy bit to be clear would imply that the mailbox interface is
> > ready. The original driver implementation used the doorbell timeout for
> > the Mailbox Interface Ready bit to piggyback off of, since the latter
> > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> > Find device capabilities"), a timeout has since been defined with an ECN
> > to the 2.0 spec). With the current driver waiting for mailbox interface
> > ready as a part of probe() it's no longer necessary to use the
> > piggyback.
> > 
> > With the piggybacking no longer necessary it doesn't make sense to check
> > doorbell status when acquiring the mailbox. It will be checked during
> > the normal mailbox exchange protocol.
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Trivial comment inline - with that fixed either by calling it out, or by
> pulling it out of this patch.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> > This patch did not exist in RFCv2
> > ---
> >  drivers/cxl/pci.c | 25 ++++++-------------------
> >  1 file changed, 6 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 2cef9fec8599..869b4fc18e27 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> >  
> >  	/*
> >  	 * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> > -	 * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> > +	 * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
> 
> Whilst it's trivial, I'd prefer white space cleanup in separate patches.
> I guess this one is obvious enough to just call out in the patch description
> though.
> 

Okay. I'll keep this in mind for the future, and just fixup the commit messages
with your suggestion and this, now.

Thanks.
Ben

> >  	 * bit is to allow firmware running on the device to notify the driver
> > -	 * that it's ready to receive commands. It is unclear if the bit needs
> > -	 * to be read for each transaction mailbox, ie. the firmware can switch
> > -	 * it on and off as needed. Second, there is no defined timeout for
> > -	 * mailbox ready, like there is for the doorbell interface.
> > -	 *
> > -	 * Assumptions:
> > -	 * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> > -	 *    it for every command.
> > -	 *
> > -	 * 2. If the doorbell is clear, the firmware should have first set the
> > -	 *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> > -	 *    to be ready is sufficient.
> > +	 * that it's ready to receive commands. The spec does not clearly define
> > +	 * under what conditions the bit may get set or cleared. As of the 2.0
> > +	 * base specification there was no defined timeout for mailbox ready,
> > +	 * like there is for the doorbell interface. This was fixed with an ECN,
> > +	 * but it's possible early devices implemented this before the ECN.
> >  	 */
> > -	rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > -	if (rc) {
> > -		dev_warn(dev, "Mailbox interface not ready\n");
> > -		goto out;
> > -	}
> > -
> >  	md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> >  	if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> >  		dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
>
Dan Williams Nov. 24, 2021, 9:55 p.m. UTC | #3
On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> The expectation is that the mailbox interface ready bit is the first
> step in access through the mailbox interface. Therefore, waiting for the
> doorbell busy bit to be clear would imply that the mailbox interface is
> ready. The original driver implementation used the doorbell timeout for
> the Mailbox Interface Ready bit to piggyback off of, since the latter
> doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> Find device capabilities"), a timeout has since been defined with an ECN
> to the 2.0 spec). With the current driver waiting for mailbox interface
> ready as a part of probe() it's no longer necessary to use the
> piggyback.
>
> With the piggybacking no longer necessary it doesn't make sense to check
> doorbell status when acquiring the mailbox. It will be checked during
> the normal mailbox exchange protocol.
>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
> This patch did not exist in RFCv2
> ---
>  drivers/cxl/pci.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 2cef9fec8599..869b4fc18e27 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
>
>         /*
>          * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> -        * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> +        * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
>          * bit is to allow firmware running on the device to notify the driver
> -        * that it's ready to receive commands. It is unclear if the bit needs
> -        * to be read for each transaction mailbox, ie. the firmware can switch
> -        * it on and off as needed. Second, there is no defined timeout for
> -        * mailbox ready, like there is for the doorbell interface.
> -        *
> -        * Assumptions:
> -        * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> -        *    it for every command.
> -        *
> -        * 2. If the doorbell is clear, the firmware should have first set the
> -        *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> -        *    to be ready is sufficient.
> +        * that it's ready to receive commands. The spec does not clearly define
> +        * under what conditions the bit may get set or cleared. As of the 2.0
> +        * base specification there was no defined timeout for mailbox ready,
> +        * like there is for the doorbell interface. This was fixed with an ECN,
> +        * but it's possible early devices implemented this before the ECN.

Can we just drop comment block altogether? Outside of
cxl_pci_setup_mailbox() the only time the mailbox status should be
checked is after a doorbell timeout after submitting a command.

>          */
> -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> -       if (rc) {
> -               dev_warn(dev, "Mailbox interface not ready\n");
> -               goto out;
> -       }
> -
>         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
>         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
>                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");

This error message is obsolete since nothing is pre-checking the
mailbox anymore, and per above I see no problem waiting to check the
status until after the mailbox has failed to respond after a timeout.
Ben Widawsky Nov. 29, 2021, 6:33 p.m. UTC | #4
On 21-11-24 13:55:03, Dan Williams wrote:
> On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > The expectation is that the mailbox interface ready bit is the first
> > step in access through the mailbox interface. Therefore, waiting for the
> > doorbell busy bit to be clear would imply that the mailbox interface is
> > ready. The original driver implementation used the doorbell timeout for
> > the Mailbox Interface Ready bit to piggyback off of, since the latter
> > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> > Find device capabilities"), a timeout has since been defined with an ECN
> > to the 2.0 spec). With the current driver waiting for mailbox interface
> > ready as a part of probe() it's no longer necessary to use the
> > piggyback.
> >
> > With the piggybacking no longer necessary it doesn't make sense to check
> > doorbell status when acquiring the mailbox. It will be checked during
> > the normal mailbox exchange protocol.
> >
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> > This patch did not exist in RFCv2
> > ---
> >  drivers/cxl/pci.c | 25 ++++++-------------------
> >  1 file changed, 6 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 2cef9fec8599..869b4fc18e27 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> >
> >         /*
> >          * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> > -        * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> > +        * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
> >          * bit is to allow firmware running on the device to notify the driver
> > -        * that it's ready to receive commands. It is unclear if the bit needs
> > -        * to be read for each transaction mailbox, ie. the firmware can switch
> > -        * it on and off as needed. Second, there is no defined timeout for
> > -        * mailbox ready, like there is for the doorbell interface.
> > -        *
> > -        * Assumptions:
> > -        * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> > -        *    it for every command.
> > -        *
> > -        * 2. If the doorbell is clear, the firmware should have first set the
> > -        *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> > -        *    to be ready is sufficient.
> > +        * that it's ready to receive commands. The spec does not clearly define
> > +        * under what conditions the bit may get set or cleared. As of the 2.0
> > +        * base specification there was no defined timeout for mailbox ready,
> > +        * like there is for the doorbell interface. This was fixed with an ECN,
> > +        * but it's possible early devices implemented this before the ECN.
> 
> Can we just drop comment block altogether? Outside of
> cxl_pci_setup_mailbox() the only time the mailbox status should be
> checked is after a doorbell timeout after submitting a command.
> 

Yes, I think it's fine to drop it.

> >          */
> > -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > -       if (rc) {
> > -               dev_warn(dev, "Mailbox interface not ready\n");
> > -               goto out;
> > -       }
> > -
> >         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> >         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> >                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
> 
> This error message is obsolete since nothing is pre-checking the
> mailbox anymore, and per above I see no problem waiting to check the
> status until after the mailbox has failed to respond after a timeout.

The message is wrong, but I think the logic is still valuable. How about:
"mbox: reported interface ready, but mbox not ready"
Dan Williams Nov. 29, 2021, 7:02 p.m. UTC | #5
On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-11-24 13:55:03, Dan Williams wrote:
> > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > The expectation is that the mailbox interface ready bit is the first
> > > step in access through the mailbox interface. Therefore, waiting for the
> > > doorbell busy bit to be clear would imply that the mailbox interface is
> > > ready. The original driver implementation used the doorbell timeout for
> > > the Mailbox Interface Ready bit to piggyback off of, since the latter
> > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> > > Find device capabilities"), a timeout has since been defined with an ECN
> > > to the 2.0 spec). With the current driver waiting for mailbox interface
> > > ready as a part of probe() it's no longer necessary to use the
> > > piggyback.
> > >
> > > With the piggybacking no longer necessary it doesn't make sense to check
> > > doorbell status when acquiring the mailbox. It will be checked during
> > > the normal mailbox exchange protocol.
> > >
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > ---
> > > This patch did not exist in RFCv2
> > > ---
> > >  drivers/cxl/pci.c | 25 ++++++-------------------
> > >  1 file changed, 6 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index 2cef9fec8599..869b4fc18e27 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> > >
> > >         /*
> > >          * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> > > -        * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> > > +        * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
> > >          * bit is to allow firmware running on the device to notify the driver
> > > -        * that it's ready to receive commands. It is unclear if the bit needs
> > > -        * to be read for each transaction mailbox, ie. the firmware can switch
> > > -        * it on and off as needed. Second, there is no defined timeout for
> > > -        * mailbox ready, like there is for the doorbell interface.
> > > -        *
> > > -        * Assumptions:
> > > -        * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> > > -        *    it for every command.
> > > -        *
> > > -        * 2. If the doorbell is clear, the firmware should have first set the
> > > -        *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> > > -        *    to be ready is sufficient.
> > > +        * that it's ready to receive commands. The spec does not clearly define
> > > +        * under what conditions the bit may get set or cleared. As of the 2.0
> > > +        * base specification there was no defined timeout for mailbox ready,
> > > +        * like there is for the doorbell interface. This was fixed with an ECN,
> > > +        * but it's possible early devices implemented this before the ECN.
> >
> > Can we just drop comment block altogether? Outside of
> > cxl_pci_setup_mailbox() the only time the mailbox status should be
> > checked is after a doorbell timeout after submitting a command.
> >
>
> Yes, I think it's fine to drop it.
>
> > >          */
> > > -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > > -       if (rc) {
> > > -               dev_warn(dev, "Mailbox interface not ready\n");
> > > -               goto out;
> > > -       }
> > > -
> > >         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> > >         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> > >                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
> >
> > This error message is obsolete since nothing is pre-checking the
> > mailbox anymore, and per above I see no problem waiting to check the
> > status until after the mailbox has failed to respond after a timeout.
>
> The message is wrong, but I think the logic is still valuable. How about:
> "mbox: reported interface ready, but mbox not ready"

You mean check this every time even though the spec says the driver
only needs to check it once per-reset?
Ben Widawsky Nov. 29, 2021, 7:11 p.m. UTC | #6
On 21-11-29 11:02:41, Dan Williams wrote:
> On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > On 21-11-24 13:55:03, Dan Williams wrote:
> > > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > >
> > > > The expectation is that the mailbox interface ready bit is the first
> > > > step in access through the mailbox interface. Therefore, waiting for the
> > > > doorbell busy bit to be clear would imply that the mailbox interface is
> > > > ready. The original driver implementation used the doorbell timeout for
> > > > the Mailbox Interface Ready bit to piggyback off of, since the latter
> > > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> > > > Find device capabilities"), a timeout has since been defined with an ECN
> > > > to the 2.0 spec). With the current driver waiting for mailbox interface
> > > > ready as a part of probe() it's no longer necessary to use the
> > > > piggyback.
> > > >
> > > > With the piggybacking no longer necessary it doesn't make sense to check
> > > > doorbell status when acquiring the mailbox. It will be checked during
> > > > the normal mailbox exchange protocol.
> > > >
> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > ---
> > > > This patch did not exist in RFCv2
> > > > ---
> > > >  drivers/cxl/pci.c | 25 ++++++-------------------
> > > >  1 file changed, 6 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > index 2cef9fec8599..869b4fc18e27 100644
> > > > --- a/drivers/cxl/pci.c
> > > > +++ b/drivers/cxl/pci.c
> > > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> > > >
> > > >         /*
> > > >          * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> > > > -        * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> > > > +        * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
> > > >          * bit is to allow firmware running on the device to notify the driver
> > > > -        * that it's ready to receive commands. It is unclear if the bit needs
> > > > -        * to be read for each transaction mailbox, ie. the firmware can switch
> > > > -        * it on and off as needed. Second, there is no defined timeout for
> > > > -        * mailbox ready, like there is for the doorbell interface.
> > > > -        *
> > > > -        * Assumptions:
> > > > -        * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> > > > -        *    it for every command.
> > > > -        *
> > > > -        * 2. If the doorbell is clear, the firmware should have first set the
> > > > -        *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> > > > -        *    to be ready is sufficient.
> > > > +        * that it's ready to receive commands. The spec does not clearly define
> > > > +        * under what conditions the bit may get set or cleared. As of the 2.0
> > > > +        * base specification there was no defined timeout for mailbox ready,
> > > > +        * like there is for the doorbell interface. This was fixed with an ECN,
> > > > +        * but it's possible early devices implemented this before the ECN.
> > >
> > > Can we just drop comment block altogether? Outside of
> > > cxl_pci_setup_mailbox() the only time the mailbox status should be
> > > checked is after a doorbell timeout after submitting a command.
> > >
> >
> > Yes, I think it's fine to drop it.
> >
> > > >          */
> > > > -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > > > -       if (rc) {
> > > > -               dev_warn(dev, "Mailbox interface not ready\n");
> > > > -               goto out;
> > > > -       }
> > > > -
> > > >         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> > > >         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> > > >                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
> > >
> > > This error message is obsolete since nothing is pre-checking the
> > > mailbox anymore, and per above I see no problem waiting to check the
> > > status until after the mailbox has failed to respond after a timeout.
> >
> > The message is wrong, but I think the logic is still valuable. How about:
> > "mbox: reported interface ready, but mbox not ready"
> 
> You mean check this every time even though the spec says the driver
> only needs to check it once per-reset?

Unfortunately it does not say that. "... it shall remain set until the next
reset or the device encounters an error that prevents any mailbox
communication."

Once we have real error checking in place, this could go away, though I see no
harm in leaving it.
Dan Williams Nov. 29, 2021, 7:18 p.m. UTC | #7
On Mon, Nov 29, 2021 at 11:11 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-11-29 11:02:41, Dan Williams wrote:
> > On Mon, Nov 29, 2021 at 10:33 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > On 21-11-24 13:55:03, Dan Williams wrote:
> > > > On Fri, Nov 19, 2021 at 4:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > >
> > > > > The expectation is that the mailbox interface ready bit is the first
> > > > > step in access through the mailbox interface. Therefore, waiting for the
> > > > > doorbell busy bit to be clear would imply that the mailbox interface is
> > > > > ready. The original driver implementation used the doorbell timeout for
> > > > > the Mailbox Interface Ready bit to piggyback off of, since the latter
> > > > > doesn't have a defined timeout (introduced in 8adaf747c9f0 ("cxl/mem:
> > > > > Find device capabilities"), a timeout has since been defined with an ECN
> > > > > to the 2.0 spec). With the current driver waiting for mailbox interface
> > > > > ready as a part of probe() it's no longer necessary to use the
> > > > > piggyback.
> > > > >
> > > > > With the piggybacking no longer necessary it doesn't make sense to check
> > > > > doorbell status when acquiring the mailbox. It will be checked during
> > > > > the normal mailbox exchange protocol.
> > > > >
> > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > ---
> > > > > This patch did not exist in RFCv2
> > > > > ---
> > > > >  drivers/cxl/pci.c | 25 ++++++-------------------
> > > > >  1 file changed, 6 insertions(+), 19 deletions(-)
> > > > >
> > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > index 2cef9fec8599..869b4fc18e27 100644
> > > > > --- a/drivers/cxl/pci.c
> > > > > +++ b/drivers/cxl/pci.c
> > > > > @@ -221,27 +221,14 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> > > > >
> > > > >         /*
> > > > >          * XXX: There is some amount of ambiguity in the 2.0 version of the spec
> > > > > -        * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
> > > > > +        * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
> > > > >          * bit is to allow firmware running on the device to notify the driver
> > > > > -        * that it's ready to receive commands. It is unclear if the bit needs
> > > > > -        * to be read for each transaction mailbox, ie. the firmware can switch
> > > > > -        * it on and off as needed. Second, there is no defined timeout for
> > > > > -        * mailbox ready, like there is for the doorbell interface.
> > > > > -        *
> > > > > -        * Assumptions:
> > > > > -        * 1. The firmware might toggle the Mailbox Interface Ready bit, check
> > > > > -        *    it for every command.
> > > > > -        *
> > > > > -        * 2. If the doorbell is clear, the firmware should have first set the
> > > > > -        *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
> > > > > -        *    to be ready is sufficient.
> > > > > +        * that it's ready to receive commands. The spec does not clearly define
> > > > > +        * under what conditions the bit may get set or cleared. As of the 2.0
> > > > > +        * base specification there was no defined timeout for mailbox ready,
> > > > > +        * like there is for the doorbell interface. This was fixed with an ECN,
> > > > > +        * but it's possible early devices implemented this before the ECN.
> > > >
> > > > Can we just drop comment block altogether? Outside of
> > > > cxl_pci_setup_mailbox() the only time the mailbox status should be
> > > > checked is after a doorbell timeout after submitting a command.
> > > >
> > >
> > > Yes, I think it's fine to drop it.
> > >
> > > > >          */
> > > > > -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > > > > -       if (rc) {
> > > > > -               dev_warn(dev, "Mailbox interface not ready\n");
> > > > > -               goto out;
> > > > > -       }
> > > > > -
> > > > >         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> > > > >         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> > > > >                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
> > > >
> > > > This error message is obsolete since nothing is pre-checking the
> > > > mailbox anymore, and per above I see no problem waiting to check the
> > > > status until after the mailbox has failed to respond after a timeout.
> > >
> > > The message is wrong, but I think the logic is still valuable. How about:
> > > "mbox: reported interface ready, but mbox not ready"
> >
> > You mean check this every time even though the spec says the driver
> > only needs to check it once per-reset?
>
> Unfortunately it does not say that. "... it shall remain set until the next
> reset or the device encounters an error that prevents any mailbox
> communication."
>
> Once we have real error checking in place, this could go away, though I see no
> harm in leaving it.

Right, there's no harm in the check, it just seems overly paranoid to
me if it was already checked once. Until a doorbell timeout happens
it's an extra MMIO cycle that can saved for a "what happened?" check
after a timeout.
Ben Widawsky Nov. 29, 2021, 7:31 p.m. UTC | #8
On 21-11-29 11:18:36, Dan Williams wrote:

[snip]

> > > > > > -       rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
> > > > > > -       if (rc) {
> > > > > > -               dev_warn(dev, "Mailbox interface not ready\n");
> > > > > > -               goto out;
> > > > > > -       }
> > > > > > -
> > > > > >         md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> > > > > >         if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
> > > > > >                 dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
> > > > >
> > > > > This error message is obsolete since nothing is pre-checking the
> > > > > mailbox anymore, and per above I see no problem waiting to check the
> > > > > status until after the mailbox has failed to respond after a timeout.
> > > >
> > > > The message is wrong, but I think the logic is still valuable. How about:
> > > > "mbox: reported interface ready, but mbox not ready"
> > >
> > > You mean check this every time even though the spec says the driver
> > > only needs to check it once per-reset?
> >
> > Unfortunately it does not say that. "... it shall remain set until the next
> > reset or the device encounters an error that prevents any mailbox
> > communication."
> >
> > Once we have real error checking in place, this could go away, though I see no
> > harm in leaving it.
> 
> Right, there's no harm in the check, it just seems overly paranoid to
> me if it was already checked once. Until a doorbell timeout happens
> it's an extra MMIO cycle that can saved for a "what happened?" check
> after a timeout.

Well I suspect we're just rearranging the deck chairs on the Titanic now, but...

I see doorbell timeouts as disconnected from whether or not the mailbox
interface is ready. If they were the same, we wouldn't need both bits and we
could just wait extra long for the doorbell when probing.

In other words, I expect if the interface goes unready, doorbell timeout will
occur, but I don't think we should assume if doorbell timeout occurs, the
interface is no longer ready. I don't purport to know why a doorbell timeout
might occur while the interface remains available (likely a firmware bug, I
presume).

It does seem interesting to check if the interface is no longer ready on timeout
though.
Dan Williams Nov. 29, 2021, 7:37 p.m. UTC | #9
On Mon, Nov 29, 2021 at 11:32 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
[..]
> >
> > Right, there's no harm in the check, it just seems overly paranoid to
> > me if it was already checked once. Until a doorbell timeout happens
> > it's an extra MMIO cycle that can saved for a "what happened?" check
> > after a timeout.
>
> Well I suspect we're just rearranging the deck chairs on the Titanic now, but...

Not so much, just trying to get this driver in line with other error
handling designs.

> I see doorbell timeouts as disconnected from whether or not the mailbox
> interface is ready. If they were the same, we wouldn't need both bits and we
> could just wait extra long for the doorbell when probing.
>
> In other words, I expect if the interface goes unready, doorbell timeout will
> occur, but I don't think we should assume if doorbell timeout occurs, the
> interface is no longer ready. I don't purport to know why a doorbell timeout
> might occur while the interface remains available (likely a firmware bug, I
> presume).
>
> It does seem interesting to check if the interface is no longer ready on timeout
> though.

So I'm just modeling this off of NVME error handling where there is a
Controller Fatal Status bit that could be checked every transaction,
but instead the driver waits until a command timeout to collect if the
device went fatal / not-ready.
Ben Widawsky Nov. 29, 2021, 7:50 p.m. UTC | #10
On 21-11-29 11:37:34, Dan Williams wrote:
> On Mon, Nov 29, 2021 at 11:32 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> [..]
> > >
> > > Right, there's no harm in the check, it just seems overly paranoid to
> > > me if it was already checked once. Until a doorbell timeout happens
> > > it's an extra MMIO cycle that can saved for a "what happened?" check
> > > after a timeout.
> >
> > Well I suspect we're just rearranging the deck chairs on the Titanic now, but...
> 
> Not so much, just trying to get this driver in line with other error
> handling designs.

Okay. I shall remove it then.

> 
> > I see doorbell timeouts as disconnected from whether or not the mailbox
> > interface is ready. If they were the same, we wouldn't need both bits and we
> > could just wait extra long for the doorbell when probing.
> >
> > In other words, I expect if the interface goes unready, doorbell timeout will
> > occur, but I don't think we should assume if doorbell timeout occurs, the
> > interface is no longer ready. I don't purport to know why a doorbell timeout
> > might occur while the interface remains available (likely a firmware bug, I
> > presume).
> >
> > It does seem interesting to check if the interface is no longer ready on timeout
> > though.
> 
> So I'm just modeling this off of NVME error handling where there is a
> Controller Fatal Status bit that could be checked every transaction,
> but instead the driver waits until a command timeout to collect if the
> device went fatal / not-ready.

No error interrupts?
diff mbox series

Patch

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 2cef9fec8599..869b4fc18e27 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -221,27 +221,14 @@  static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
 
 	/*
 	 * XXX: There is some amount of ambiguity in the 2.0 version of the spec
-	 * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
+	 * around the mailbox interface ready (8.2.8.5.1.1). The purpose of the
 	 * bit is to allow firmware running on the device to notify the driver
-	 * that it's ready to receive commands. It is unclear if the bit needs
-	 * to be read for each transaction mailbox, ie. the firmware can switch
-	 * it on and off as needed. Second, there is no defined timeout for
-	 * mailbox ready, like there is for the doorbell interface.
-	 *
-	 * Assumptions:
-	 * 1. The firmware might toggle the Mailbox Interface Ready bit, check
-	 *    it for every command.
-	 *
-	 * 2. If the doorbell is clear, the firmware should have first set the
-	 *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
-	 *    to be ready is sufficient.
+	 * that it's ready to receive commands. The spec does not clearly define
+	 * under what conditions the bit may get set or cleared. As of the 2.0
+	 * base specification there was no defined timeout for mailbox ready,
+	 * like there is for the doorbell interface. This was fixed with an ECN,
+	 * but it's possible early devices implemented this before the ECN.
 	 */
-	rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
-	if (rc) {
-		dev_warn(dev, "Mailbox interface not ready\n");
-		goto out;
-	}
-
 	md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
 	if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
 		dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");