Message ID | 20230405-pl180-busydetect-fix-v3-10-cd3d5925ae64@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix busydetect on Ux500 PL180/MMCI | expand |
On Sun, 11 Jun 2023 at 21:41, Linus Walleij <linus.walleij@linaro.org> wrote: > > Add a timeout for busydetect IRQs using a delayed work. > It might happen (and does happen) on Ux500 that the first > busy detect IRQ appears and not the second one. This will > make the host hang indefinitely waiting for the second > IRQ to appear. > > Fire a delayed work after 10ms and re-engage the command > IRQ so the transaction finishes: we are certainly done > at this point, or we will catch an error in the status > register. A fixed value of 10ms doesn't work. We have lots of commands that are associated with way longer timeouts than this. Typically, the cmd->busy_timeout contains the current value of the timeout that should be used for the commands that have the flags MMC_RSP_BUSY set for it. The stm variant already uses cmd->busy_timeout, but also assigns a default timeout, just to make sure if the core has failed to set cmd->busy_timeout (it shouldn't but who knows). A few more more comments to code, see below. > > This makes the eMMC work again on Skomer. > > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > --- > ChangeLog v2->v3: > - Rebased. > ChangeLog v1->v2: > - No changes > --- > drivers/mmc/host/mmci.c | 23 +++++++++++++++++++++++ > drivers/mmc/host/mmci.h | 1 + > 2 files changed, 24 insertions(+) > > diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c > index 05b8fad26c10..7e40b8f2dbf3 100644 > --- a/drivers/mmc/host/mmci.c > +++ b/drivers/mmc/host/mmci.c > @@ -37,6 +37,7 @@ > #include <linux/pinctrl/consumer.h> > #include <linux/reset.h> > #include <linux/gpio/consumer.h> > +#include <linux/workqueue.h> > > #include <asm/div64.h> > #include <asm/io.h> > @@ -741,6 +742,8 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk) > host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND); > writel(host->variant->busy_detect_mask, base + MMCICLEAR); > host->busy_state = MMCI_BUSY_WAITING_FOR_END_IRQ; > + schedule_delayed_work(&host->busy_timeout_work, > + msecs_to_jiffies(10)); > } else { > dev_dbg(mmc_dev(host->mmc), > "lost busy status when waiting for busy start IRQ\n"); > @@ -752,6 +755,7 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk) > if (status & host->variant->busy_detect_flag) { > host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND); > writel(host->variant->busy_detect_mask, base + MMCICLEAR); > + cancel_delayed_work_sync(&host->busy_timeout_work); > ux500_busy_clear_mask_done(host); > } else { > dev_dbg(mmc_dev(host->mmc), > @@ -1487,6 +1491,22 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd, > } > } > > +/* > + * This busy timeout worker is used to "kick" the command IRQ if a > + * busy detect IRQ fails to appear in reasonable time. Only used on > + * variants with busy detection IRQ delivery. > + */ > +static void busy_timeout_work(struct work_struct *work) > +{ > + struct mmci_host *host = > + container_of(work, struct mmci_host, busy_timeout_work.work); > + u32 status; > + > + dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n"); > + status = readl(host->base + MMCISTATUS); > + mmci_cmd_irq(host, host->cmd, status); There's no locking here. I assume that's because we call cancel_delayed_work_sync() from an atomic context and we don't want that to hang. However, can't the call to mmci_cmd_irq() race with a proper irq being managed in parallel? > +} > + > static int mmci_get_rx_fifocnt(struct mmci_host *host, u32 status, int remain) > { > return remain - (readl(host->base + MMCIFIFOCNT) << 2); > @@ -2300,6 +2320,9 @@ static int mmci_probe(struct amba_device *dev, > goto clk_disable; > } > > + if (host->variant->busy_detect && host->ops->busy_complete) > + INIT_DELAYED_WORK(&host->busy_timeout_work, busy_timeout_work); > + > writel(MCI_IRQENABLE | variant->start_err, host->base + MMCIMASK0); > > amba_set_drvdata(dev, mmc); > diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h > index 12a7bbd3ce26..95d3d0a6049b 100644 > --- a/drivers/mmc/host/mmci.h > +++ b/drivers/mmc/host/mmci.h > @@ -451,6 +451,7 @@ struct mmci_host { > void *dma_priv; > > s32 next_cookie; > + struct delayed_work busy_timeout_work; > }; > > #define dma_inprogress(host) ((host)->dma_in_progress) > Kind regards Uffe
On Tue, Jun 13, 2023 at 2:08 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > Typically, the cmd->busy_timeout contains the current value of the > timeout that should be used for the commands that have the flags > MMC_RSP_BUSY set for it. > > The stm variant already uses cmd->busy_timeout, but also assigns a > default timeout, just to make sure if the core has failed to set > cmd->busy_timeout (it shouldn't but who knows). I generalized the STM32 solution with the core-specified timeout and default and used that. If we know the core will always provide correct timeouts we should probably delete hacks like this though (but that would be a separate patch). > > +static void busy_timeout_work(struct work_struct *work) > > +{ > > + struct mmci_host *host = > > + container_of(work, struct mmci_host, busy_timeout_work.work); > > + u32 status; > > + > > + dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n"); > > + status = readl(host->base + MMCISTATUS); > > + mmci_cmd_irq(host, host->cmd, status); > > There's no locking here. I assume that's because we call > cancel_delayed_work_sync() from an atomic context and we don't want > that to hang. > > However, can't the call to mmci_cmd_irq() race with a proper irq being > managed in parallel? It will not be a problem AFAIC: the MMCI is using level IRQs, meaning it will handle all flags that are set for command or data IRQs before exiting the IRQ handler, the order of the IRQ handling if several fire at the same time is actually dependent on the order the IRQ flags are checked in the code. When the interrupt handler exits, all IRQs should be handled and the respective IRQ lines and thus the IRQ from the MMCI should be de-asserted. In this case, our problem is that an interrupt is not fired at all, but if the actual IRQ (that should have been fired) or a totally different IRQ (such as an error) is fired at the same time doesn't matter: the pending IRQs will be handled, and if the timer then fires an additional IRQ the IRQ handler will check if there are any IRQs to handle and then conclude there isn't and then just return. At least the way I read it. I made a v4, sending it out so you can check! Yours, Linus Walleij
On Tue, 13 Jun 2023 at 22:32, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Jun 13, 2023 at 2:08 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > Typically, the cmd->busy_timeout contains the current value of the > > timeout that should be used for the commands that have the flags > > MMC_RSP_BUSY set for it. > > > > The stm variant already uses cmd->busy_timeout, but also assigns a > > default timeout, just to make sure if the core has failed to set > > cmd->busy_timeout (it shouldn't but who knows). > > I generalized the STM32 solution with the core-specified timeout > and default and used that. > > If we know the core will always provide correct timeouts we should > probably delete hacks like this though (but that would be a separate > patch). > > > > +static void busy_timeout_work(struct work_struct *work) > > > +{ > > > + struct mmci_host *host = > > > + container_of(work, struct mmci_host, busy_timeout_work.work); > > > + u32 status; > > > + > > > + dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n"); > > > + status = readl(host->base + MMCISTATUS); > > > + mmci_cmd_irq(host, host->cmd, status); > > > > There's no locking here. I assume that's because we call > > cancel_delayed_work_sync() from an atomic context and we don't want > > that to hang. > > > > However, can't the call to mmci_cmd_irq() race with a proper irq being > > managed in parallel? > > It will not be a problem AFAIC: the MMCI is using level IRQs, meaning it > will handle all flags that are set for command or data IRQs before exiting > the IRQ handler, the order of the IRQ handling if several fire at the same > time is actually dependent on the order the IRQ flags are checked in the > code. > > When the interrupt handler exits, all IRQs should be handled and the > respective IRQ lines and thus the IRQ from the MMCI should be > de-asserted. Right, I think I follow. > > In this case, our problem is that an interrupt is not fired at all, but if > the actual IRQ (that should have been fired) or a totally different IRQ > (such as an error) is fired at the same time doesn't matter: the pending > IRQs will be handled, and if the timer then fires an additional IRQ > the IRQ handler will check if there are any IRQs to handle and then > conclude there isn't and then just return. To clarify, I am not worried about the IRQ handling as such. However, we use the spin_lock to protect some members in the struct mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to understand whether there is an active command to manage. When the command has been completed, we set host->cmd to NULL. [...] Kind regards Uffe
On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > However, we use the spin_lock to protect some members in the struct > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to > understand whether there is an active command to manage. When the > command has been completed, we set host->cmd to NULL. Hm right... I'm leaning toward some catch-all like: if (!host->cmd) state = MMCI_BUSY_DONE; as if there is no command going on then surely nothing is busy on the host controller. Yours, Linus Walleij
On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > However, we use the spin_lock to protect some members in the struct > > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to > > understand whether there is an active command to manage. When the > > command has been completed, we set host->cmd to NULL. > > Hm right... > > I'm leaning toward some catch-all like: > > if (!host->cmd) > state = MMCI_BUSY_DONE; > > as if there is no command going on then surely nothing is busy on the > host controller. Right, so at what point do you want to add this check? Kind regards Uffe
On Wed, Jun 14, 2023 at 2:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > However, we use the spin_lock to protect some members in the struct > > > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to > > > understand whether there is an active command to manage. When the > > > command has been completed, we set host->cmd to NULL. > > > > Hm right... > > > > I'm leaning toward some catch-all like: > > > > if (!host->cmd) > > state = MMCI_BUSY_DONE; > > > > as if there is no command going on then surely nothing is busy on the > > host controller. > > Right, so at what point do you want to add this check? I have put it before the calls to the busy_complete() callback, in the IRQ, where we are already in atomic context. If we are not processing a command, we should not do any busy detection for sure. Yours, Linus Walleij
On Wed, Jun 14, 2023 at 9:22 PM Linus Walleij <linus.walleij@linaro.org> wrote: > On Wed, Jun 14, 2023 at 2:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > > > On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > However, we use the spin_lock to protect some members in the struct > > > > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to > > > > understand whether there is an active command to manage. When the > > > > command has been completed, we set host->cmd to NULL. > > > > > > Hm right... > > > > > > I'm leaning toward some catch-all like: > > > > > > if (!host->cmd) > > > state = MMCI_BUSY_DONE; > > > > > > as if there is no command going on then surely nothing is busy on the > > > host controller. > > > > Right, so at what point do you want to add this check? > > I have put it before the calls to the busy_complete() callback, in the > IRQ, where we are already in atomic context. If we are not processing > a command, we should not do any busy detection for sure. No that's wrong. The mmci_cmd_irq() is actually passed the command as parameter, so I just augment the busy_complete() prototype to pass this along down, check out the patch (I talk to much). Yours, Linus Walleij
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c index 05b8fad26c10..7e40b8f2dbf3 100644 --- a/drivers/mmc/host/mmci.c +++ b/drivers/mmc/host/mmci.c @@ -37,6 +37,7 @@ #include <linux/pinctrl/consumer.h> #include <linux/reset.h> #include <linux/gpio/consumer.h> +#include <linux/workqueue.h> #include <asm/div64.h> #include <asm/io.h> @@ -741,6 +742,8 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk) host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND); writel(host->variant->busy_detect_mask, base + MMCICLEAR); host->busy_state = MMCI_BUSY_WAITING_FOR_END_IRQ; + schedule_delayed_work(&host->busy_timeout_work, + msecs_to_jiffies(10)); } else { dev_dbg(mmc_dev(host->mmc), "lost busy status when waiting for busy start IRQ\n"); @@ -752,6 +755,7 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk) if (status & host->variant->busy_detect_flag) { host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND); writel(host->variant->busy_detect_mask, base + MMCICLEAR); + cancel_delayed_work_sync(&host->busy_timeout_work); ux500_busy_clear_mask_done(host); } else { dev_dbg(mmc_dev(host->mmc), @@ -1487,6 +1491,22 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd, } } +/* + * This busy timeout worker is used to "kick" the command IRQ if a + * busy detect IRQ fails to appear in reasonable time. Only used on + * variants with busy detection IRQ delivery. + */ +static void busy_timeout_work(struct work_struct *work) +{ + struct mmci_host *host = + container_of(work, struct mmci_host, busy_timeout_work.work); + u32 status; + + dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n"); + status = readl(host->base + MMCISTATUS); + mmci_cmd_irq(host, host->cmd, status); +} + static int mmci_get_rx_fifocnt(struct mmci_host *host, u32 status, int remain) { return remain - (readl(host->base + MMCIFIFOCNT) << 2); @@ -2300,6 +2320,9 @@ static int mmci_probe(struct amba_device *dev, goto clk_disable; } + if (host->variant->busy_detect && host->ops->busy_complete) + INIT_DELAYED_WORK(&host->busy_timeout_work, busy_timeout_work); + writel(MCI_IRQENABLE | variant->start_err, host->base + MMCIMASK0); amba_set_drvdata(dev, mmc); diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h index 12a7bbd3ce26..95d3d0a6049b 100644 --- a/drivers/mmc/host/mmci.h +++ b/drivers/mmc/host/mmci.h @@ -451,6 +451,7 @@ struct mmci_host { void *dma_priv; s32 next_cookie; + struct delayed_work busy_timeout_work; }; #define dma_inprogress(host) ((host)->dma_in_progress)
Add a timeout for busydetect IRQs using a delayed work. It might happen (and does happen) on Ux500 that the first busy detect IRQ appears and not the second one. This will make the host hang indefinitely waiting for the second IRQ to appear. Fire a delayed work after 10ms and re-engage the command IRQ so the transaction finishes: we are certainly done at this point, or we will catch an error in the status register. This makes the eMMC work again on Skomer. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- ChangeLog v2->v3: - Rebased. ChangeLog v1->v2: - No changes --- drivers/mmc/host/mmci.c | 23 +++++++++++++++++++++++ drivers/mmc/host/mmci.h | 1 + 2 files changed, 24 insertions(+)