Message ID | 1508746897-62291-3-git-send-email-xiaolei.li@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Xiaolei, On Mon, 23 Oct 2017 16:21:37 +0800 Xiaolei Li <xiaolei.li@mediatek.com> wrote: > For MT2701 NAND Controller, there may generate infinite ECC decode IRQ > during long time burn test on some platforms. Once this issue occurred, > the ECC decode IRQ status cannot be cleared in the IRQ handler function, > and threads cannot be scheduled. > > ECC HW generates decode IRQ each sector, so there will have more than one > decode IRQ if read one page of large page NAND. > > Currently, ECC IRQ handle flow is that we will check whether it is decode > IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear > type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be > cleared at the same time. > Secondly, we will check whether all sectors are decoded by reading the > register ECC_DECDONE. This is because the current IRQ may be not dealed > in time, and the next sectors have been decoded before reading the > register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not > be generated. > Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we > will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by > programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the > next ECC IRQ. > > But, there is a timing issue between step one and two. When we read the > reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector, > and the ECC IRQ signal is cleared. But the last sector is decoded before > reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and > it means we will receive one extra ECC IRQ later. In step three, we will > find that all sectors were decoded, then disable ECC IRQ and return. > When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared > anymore. That is because the register ECC_DECIRQ_STA can only be cleared > when the register ECC_IRQ_REG(op) is enabled. But actually we have > disabled ECC IRQ in the previous ECC IRQ handle. So, there will > keep receiving ECC decode IRQ. > > Now, we read the register ECC_DECIRQ_STA once again before completing the > ecc done event. This ensures that there will be no extra ECC decode IRQ. Why don't you remove the writel(0, ecc->regs + ECC_IRQ_REG(op)); line in the irq handler and add readw(ecc->regs + ECC_DECIRQ_STA); just before disabling the irq in mtk_ecc_disable(). It really looks weird to have the IRQ handler disable the IRQ on its own. It's usually the caller who decides when the IRQ (or set of IRQs) should be enabled/disabled. Moreover, I'm not sure you're guaranteed that no new interrupts will come in between the ECC_DECIRQ_STA read and the ECC_IRQ_REG() write you're doing in your irq handler, while, doing that after the engine has been disabled (in mtk_ecc_disable()) should guarantee that nothing can happen after you have read the status reg. > > MT2712 ECC HW is designed to generate only one decode IRQ each page, so > there is no this problem on MT2712 platforms. > > Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> You miss a Fixes tag, and you might want to add a Cc-stable tag to backport the fix. Regards, Boris > --- > drivers/mtd/nand/mtk_ecc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c > index 82aa6f2..0e04323 100644 > --- a/drivers/mtd/nand/mtk_ecc.c > +++ b/drivers/mtd/nand/mtk_ecc.c > @@ -115,6 +115,11 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id) > op = ECC_DECODE; > dec = readw(ecc->regs + ECC_DECDONE); > if (dec & ecc->sectors) { > + /** > + * Clear decode IRQ status once again to ensure that > + * there will be no extra IRQ. > + */ > + dec = readw(ecc->regs + ECC_DECIRQ_STA); > ecc->sectors = 0; > complete(&ecc->done); > } else {
Hi Boris, On Fri, 2017-10-27 at 14:44 +0200, Boris Brezillon wrote: > Hi Xiaolei, > > On Mon, 23 Oct 2017 16:21:37 +0800 > Xiaolei Li <xiaolei.li@mediatek.com> wrote: > > > For MT2701 NAND Controller, there may generate infinite ECC decode IRQ > > during long time burn test on some platforms. Once this issue occurred, > > the ECC decode IRQ status cannot be cleared in the IRQ handler function, > > and threads cannot be scheduled. > > > > ECC HW generates decode IRQ each sector, so there will have more than one > > decode IRQ if read one page of large page NAND. > > > > Currently, ECC IRQ handle flow is that we will check whether it is decode > > IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear > > type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be > > cleared at the same time. > > Secondly, we will check whether all sectors are decoded by reading the > > register ECC_DECDONE. This is because the current IRQ may be not dealed > > in time, and the next sectors have been decoded before reading the > > register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not > > be generated. > > Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we > > will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by > > programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the > > next ECC IRQ. > > > > But, there is a timing issue between step one and two. When we read the > > reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector, > > and the ECC IRQ signal is cleared. But the last sector is decoded before > > reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and > > it means we will receive one extra ECC IRQ later. In step three, we will > > find that all sectors were decoded, then disable ECC IRQ and return. > > When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared > > anymore. That is because the register ECC_DECIRQ_STA can only be cleared > > when the register ECC_IRQ_REG(op) is enabled. But actually we have > > disabled ECC IRQ in the previous ECC IRQ handle. So, there will > > keep receiving ECC decode IRQ. > > > > Now, we read the register ECC_DECIRQ_STA once again before completing the > > ecc done event. This ensures that there will be no extra ECC decode IRQ. > > Why don't you remove the > > writel(0, ecc->regs + ECC_IRQ_REG(op)); > > line in the irq handler and add > > readw(ecc->regs + ECC_DECIRQ_STA); > > just before disabling the irq in mtk_ecc_disable(). > > It really looks weird to have the IRQ handler disable the IRQ on its > own. It's usually the caller who decides when the IRQ (or set of IRQs) > should be enabled/disabled. It is reasonable. I will remove writel(0, ecc->regs + ECC_IRQ_REG(op)); from irq handler, and keep it in mtk_ecc_disable(). But I thinks it is better to add readw(ecc->regs + ECC_DECIRQ_STA); before completing ecc done event in the irq handler than in mtk_ecc_disable(). Please let me show some backgrounds at first: 1. ECC irq signal is low level vaild. 2. Reading ECC_DECIRQ_STA can pull ecc irq signal high, but it is just valid if ECC_IRQ_REG(op) is enabled. At the beginning of irq handler, I read ECC_DECIRQ_STA and this pulls ecc irq signal high. But ecc irq signal may be pulled down again between ECC_DECIRQ_STA read and ECC_DECDONE read, if one sector is decoded done between them. And this means that one IRQ will come in later. But if all sectors are decoded done now, we will complete ecc done event, and there is really no need to deal with this extra IRQ. So, read ECC_DECIRQ_STA before completing ecc done event, this can guarantee that ecc irq signal is always high and no extra IRQ later. We can keep mtk_ecc_disable() the same. Because all irqs have been handled. I am not sure whether this explanation is clear? > Moreover, I'm not sure you're guaranteed > that no new interrupts will come in between the ECC_DECIRQ_STA read > and the ECC_IRQ_REG() write you're doing in your irq handler, while, > doing that after the engine has been disabled (in mtk_ecc_disable()) > should guarantee that nothing can happen after you have read the status > reg. > > > > > MT2712 ECC HW is designed to generate only one decode IRQ each page, so > > there is no this problem on MT2712 platforms. > > > > Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> > > You miss a Fixes tag, and you might want to add a Cc-stable tag to > backport the fix. > OK. Thanks. > Regards, > > Boris > > > --- > > drivers/mtd/nand/mtk_ecc.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c > > index 82aa6f2..0e04323 100644 > > --- a/drivers/mtd/nand/mtk_ecc.c > > +++ b/drivers/mtd/nand/mtk_ecc.c > > @@ -115,6 +115,11 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id) > > op = ECC_DECODE; > > dec = readw(ecc->regs + ECC_DECDONE); > > if (dec & ecc->sectors) { > > + /** > > + * Clear decode IRQ status once again to ensure that > > + * there will be no extra IRQ. > > + */ > > + dec = readw(ecc->regs + ECC_DECIRQ_STA); > > ecc->sectors = 0; > > complete(&ecc->done); > > } else { >
On Sat, 28 Oct 2017 12:05:57 +0800 xiaolei li <xiaolei.li@mediatek.com> wrote: > Hi Boris, > > On Fri, 2017-10-27 at 14:44 +0200, Boris Brezillon wrote: > > Hi Xiaolei, > > > > On Mon, 23 Oct 2017 16:21:37 +0800 > > Xiaolei Li <xiaolei.li@mediatek.com> wrote: > > > > > For MT2701 NAND Controller, there may generate infinite ECC decode IRQ > > > during long time burn test on some platforms. Once this issue occurred, > > > the ECC decode IRQ status cannot be cleared in the IRQ handler function, > > > and threads cannot be scheduled. > > > > > > ECC HW generates decode IRQ each sector, so there will have more than one > > > decode IRQ if read one page of large page NAND. > > > > > > Currently, ECC IRQ handle flow is that we will check whether it is decode > > > IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear > > > type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be > > > cleared at the same time. > > > Secondly, we will check whether all sectors are decoded by reading the > > > register ECC_DECDONE. This is because the current IRQ may be not dealed > > > in time, and the next sectors have been decoded before reading the > > > register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not > > > be generated. > > > Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we > > > will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by > > > programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the > > > next ECC IRQ. > > > > > > But, there is a timing issue between step one and two. When we read the > > > reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector, > > > and the ECC IRQ signal is cleared. But the last sector is decoded before > > > reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and > > > it means we will receive one extra ECC IRQ later. In step three, we will > > > find that all sectors were decoded, then disable ECC IRQ and return. > > > When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared > > > anymore. That is because the register ECC_DECIRQ_STA can only be cleared > > > when the register ECC_IRQ_REG(op) is enabled. But actually we have > > > disabled ECC IRQ in the previous ECC IRQ handle. So, there will > > > keep receiving ECC decode IRQ. > > > > > > Now, we read the register ECC_DECIRQ_STA once again before completing the > > > ecc done event. This ensures that there will be no extra ECC decode IRQ. > > > > Why don't you remove the > > > > writel(0, ecc->regs + ECC_IRQ_REG(op)); > > > > line in the irq handler and add > > > > readw(ecc->regs + ECC_DECIRQ_STA); > > > > just before disabling the irq in mtk_ecc_disable(). > > > > It really looks weird to have the IRQ handler disable the IRQ on its > > own. It's usually the caller who decides when the IRQ (or set of IRQs) > > should be enabled/disabled. > > It is reasonable. I will remove > writel(0, ecc->regs + ECC_IRQ_REG(op)); > from irq handler, and keep it in mtk_ecc_disable(). > > But I thinks it is better to add > readw(ecc->regs + ECC_DECIRQ_STA); > before completing ecc done event in the irq handler than in > mtk_ecc_disable(). > > Please let me show some backgrounds at first: > 1. ECC irq signal is low level vaild. > 2. Reading ECC_DECIRQ_STA can pull ecc irq signal high, but it is just > valid if ECC_IRQ_REG(op) is enabled. > > At the beginning of irq handler, I read ECC_DECIRQ_STA and this pulls > ecc irq signal high. But ecc irq signal may be pulled down again between > ECC_DECIRQ_STA read and ECC_DECDONE read, if one sector is decoded done > between them. And this means that one IRQ will come in later. But if all > sectors are decoded done now, we will complete ecc done event, and there > is really no need to deal with this extra IRQ. > So, read ECC_DECIRQ_STA before completing ecc done event, this can > guarantee that ecc irq signal is always high and no extra IRQ later. > > We can keep mtk_ecc_disable() the same. Because all irqs have been > handled. > > I am not sure whether this explanation is clear? I'd say that it's safer to read ECC_DECIRQ_STA both in the irq handler (where you want to add it) and in mtk_ecc_disable(), just in case you had a timeout on the wait_for_completion_timeout() call (not sure the one in mtk_ecc_disable() is necessary though). > > > Moreover, I'm not sure you're guaranteed > > that no new interrupts will come in between the ECC_DECIRQ_STA read > > and the ECC_IRQ_REG() write you're doing in your irq handler, while, > > doing that after the engine has been disabled (in mtk_ecc_disable()) > > should guarantee that nothing can happen after you have read the status > > reg. > > > > > > > > MT2712 ECC HW is designed to generate only one decode IRQ each page, so > > > there is no this problem on MT2712 platforms. > > > > > > Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> > > > > You miss a Fixes tag, and you might want to add a Cc-stable tag to > > backport the fix. > > > OK. Thanks. > > > Regards, > > > > Boris > > > > > --- > > > drivers/mtd/nand/mtk_ecc.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c > > > index 82aa6f2..0e04323 100644 > > > --- a/drivers/mtd/nand/mtk_ecc.c > > > +++ b/drivers/mtd/nand/mtk_ecc.c > > > @@ -115,6 +115,11 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id) > > > op = ECC_DECODE; > > > dec = readw(ecc->regs + ECC_DECDONE); > > > if (dec & ecc->sectors) { > > > + /** > > > + * Clear decode IRQ status once again to ensure that > > > + * there will be no extra IRQ. > > > + */ > > > + dec = readw(ecc->regs + ECC_DECIRQ_STA); > > > ecc->sectors = 0; > > > complete(&ecc->done); > > > } else { > > > >
diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c index 82aa6f2..0e04323 100644 --- a/drivers/mtd/nand/mtk_ecc.c +++ b/drivers/mtd/nand/mtk_ecc.c @@ -115,6 +115,11 @@ static irqreturn_t mtk_ecc_irq(int irq, void *id) op = ECC_DECODE; dec = readw(ecc->regs + ECC_DECDONE); if (dec & ecc->sectors) { + /** + * Clear decode IRQ status once again to ensure that + * there will be no extra IRQ. + */ + dec = readw(ecc->regs + ECC_DECIRQ_STA); ecc->sectors = 0; complete(&ecc->done); } else {
For MT2701 NAND Controller, there may generate infinite ECC decode IRQ during long time burn test on some platforms. Once this issue occurred, the ECC decode IRQ status cannot be cleared in the IRQ handler function, and threads cannot be scheduled. ECC HW generates decode IRQ each sector, so there will have more than one decode IRQ if read one page of large page NAND. Currently, ECC IRQ handle flow is that we will check whether it is decode IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be cleared at the same time. Secondly, we will check whether all sectors are decoded by reading the register ECC_DECDONE. This is because the current IRQ may be not dealed in time, and the next sectors have been decoded before reading the register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not be generated. Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the next ECC IRQ. But, there is a timing issue between step one and two. When we read the reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector, and the ECC IRQ signal is cleared. But the last sector is decoded before reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and it means we will receive one extra ECC IRQ later. In step three, we will find that all sectors were decoded, then disable ECC IRQ and return. When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared anymore. That is because the register ECC_DECIRQ_STA can only be cleared when the register ECC_IRQ_REG(op) is enabled. But actually we have disabled ECC IRQ in the previous ECC IRQ handle. So, there will keep receiving ECC decode IRQ. Now, we read the register ECC_DECIRQ_STA once again before completing the ecc done event. This ensures that there will be no extra ECC decode IRQ. MT2712 ECC HW is designed to generate only one decode IRQ each page, so there is no this problem on MT2712 platforms. Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> --- drivers/mtd/nand/mtk_ecc.c | 5 +++++ 1 file changed, 5 insertions(+)