Message ID | alpine.DEB.2.02.1401072116370.27579@ayla.of.borg (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Hi Geert, On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: > I was regularly getting I/O errors when using the Renesas RSPI/QSPI > driver on r8a7791: > > m25p80 spi0.0: error -110 reading SR > > Until I applied the following patch, which re-reads RSPI_SPSR on a time-out, > and continues if the condition has become true: > > diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c > index 4b31d89e8568..e63e30c500da 100644 > --- a/drivers/spi/spi-rspi.c > +++ b/drivers/spi/spi-rspi.c > @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data > *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); > rspi_enable_irq(rspi, enable_bit); > ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); > - if (ret == 0 && !(rspi->spsr & wait_mask)) > - return -ETIMEDOUT; > + if (ret == 0 && !(rspi->spsr & wait_mask)) { > + u8 spsr = rspi_read8(rspi, RSPI_SPSR); > + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, wait_mask = 0x%02x > ***\n", > + rspi->spsr, spsr, wait_mask); > + if (!(spsr & wait_mask)) > + return -ETIMEDOUT; > + } > > return 0; > } > > Now it prints from time to time: > > *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** > > which shows that rspi->spsr (as set from the interrupt handler) didn't > have bit 7 set, while RSPI_SPSR does have bit 7 set. > > So this looks like a race condition in the interrupt handling. What happens if you print rspi->spsr in the interrupt handler ? Does it have bit 7 set ? > I didn't notice any data corruption after the patch.
On Tue, Jan 07, 2014 at 09:27:18PM +0100, Geert Uytterhoeven wrote: > I was regularly getting I/O errors when using the Renesas RSPI/QSPI > driver on r8a7791: > > m25p80 spi0.0: error -110 reading SR > > Until I applied the following patch, which re-reads RSPI_SPSR on a time-out, > and continues if the condition has become true: > > diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c > index 4b31d89e8568..e63e30c500da 100644 > --- a/drivers/spi/spi-rspi.c > +++ b/drivers/spi/spi-rspi.c > @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data *rspi, u8 wait_mask, > rspi->spsr = rspi_read8(rspi, RSPI_SPSR); > rspi_enable_irq(rspi, enable_bit); > ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); > - if (ret == 0 && !(rspi->spsr & wait_mask)) > - return -ETIMEDOUT; > + if (ret == 0 && !(rspi->spsr & wait_mask)) { > + u8 spsr = rspi_read8(rspi, RSPI_SPSR); > + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, wait_mask = 0x%02x ***\n", > + rspi->spsr, spsr, wait_mask); > + if (!(spsr & wait_mask)) > + return -ETIMEDOUT; I'm entirely unsure if this is the right approach, but if it is should rspi->spsr = spsr go somewhere around here? > + } > > return 0; > } > > Now it prints from time to time: > > *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** > > which shows that rspi->spsr (as set from the interrupt handler) didn't > have bit 7 set, while RSPI_SPSR does have bit 7 set. > > So this looks like a race condition in the interrupt handling. > > I didn't notice any data corruption after the patch. > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- > To unsubscribe from this list: send the line "unsubscribe linux-sh" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Laurent, On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI >> driver on r8a7791: >> >> m25p80 spi0.0: error -110 reading SR >> >> Until I applied the following patch, which re-reads RSPI_SPSR on a time-out, >> and continues if the condition has become true: >> >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c >> index 4b31d89e8568..e63e30c500da 100644 >> --- a/drivers/spi/spi-rspi.c >> +++ b/drivers/spi/spi-rspi.c >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); >> rspi_enable_irq(rspi, enable_bit); >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); >> - if (ret == 0 && !(rspi->spsr & wait_mask)) >> - return -ETIMEDOUT; >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, wait_mask = > 0x%02x >> ***\n", >> + rspi->spsr, spsr, wait_mask); >> + if (!(spsr & wait_mask)) >> + return -ETIMEDOUT; >> + } >> >> return 0; >> } >> >> Now it prints from time to time: >> >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** >> >> which shows that rspi->spsr (as set from the interrupt handler) didn't >> have bit 7 set, while RSPI_SPSR does have bit 7 set. >> >> So this looks like a race condition in the interrupt handling. > > What happens if you print rspi->spsr in the interrupt handler ? Does it have > bit 7 set ? I haven't tried that yet. The driver is extremely interrupt-heavy (O(n), with n the number of bytes transfered), so adding a printk() as-is won't be a good idea. Will think a bit more about a better approach... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Geert, On Wednesday 08 January 2014 09:28:59 Geert Uytterhoeven wrote: > On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart wrote: > > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: > >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI > >> > >> driver on r8a7791: > >> m25p80 spi0.0: error -110 reading SR > >> > >> Until I applied the following patch, which re-reads RSPI_SPSR on a > >> time-out, and continues if the condition has become true: > >> > >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c > >> index 4b31d89e8568..e63e30c500da 100644 > >> --- a/drivers/spi/spi-rspi.c > >> +++ b/drivers/spi/spi-rspi.c > >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data > >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); > >> > >> rspi_enable_irq(rspi, enable_bit); > >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); > >> > >> - if (ret == 0 && !(rspi->spsr & wait_mask)) > >> - return -ETIMEDOUT; > >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { > >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); > >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, > >> wait_mask => > > 0x%02x > > > >> ***\n", > >> + rspi->spsr, spsr, wait_mask); > >> + if (!(spsr & wait_mask)) > >> + return -ETIMEDOUT; > >> + } > >> > >> return 0; > >> > >> } > >> > >> Now it prints from time to time: > >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** > >> > >> which shows that rspi->spsr (as set from the interrupt handler) didn't > >> have bit 7 set, while RSPI_SPSR does have bit 7 set. > >> > >> So this looks like a race condition in the interrupt handling. > > > > What happens if you print rspi->spsr in the interrupt handler ? Does it > > have bit 7 set ? > > I haven't tried that yet. The driver is extremely interrupt-heavy (O(n), > with n the number of bytes transfered), so adding a printk() as-is won't be > a good idea. > > Will think a bit more about a better approach... Just limit it to the first 10 interrupts then :-) What I'd like to know is whether the interrupt is trigerred before bit 7 gets set by the hardware, or if the rspi_wait_for_interrupt() function gets a stale value of rspi->spsr.
On Wed, Jan 8, 2014 at 9:41 AM, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > On Wednesday 08 January 2014 09:28:59 Geert Uytterhoeven wrote: >> On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart wrote: >> > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: >> >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI >> >> >> >> driver on r8a7791: >> >> m25p80 spi0.0: error -110 reading SR >> >> >> >> Until I applied the following patch, which re-reads RSPI_SPSR on a >> >> time-out, and continues if the condition has become true: >> >> >> >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c >> >> index 4b31d89e8568..e63e30c500da 100644 >> >> --- a/drivers/spi/spi-rspi.c >> >> +++ b/drivers/spi/spi-rspi.c >> >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data >> >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); >> >> >> >> rspi_enable_irq(rspi, enable_bit); >> >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); >> >> >> >> - if (ret == 0 && !(rspi->spsr & wait_mask)) >> >> - return -ETIMEDOUT; >> >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { >> >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); >> >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, >> >> wait_mask => >> > 0x%02x >> > >> >> ***\n", >> >> + rspi->spsr, spsr, wait_mask); >> >> + if (!(spsr & wait_mask)) >> >> + return -ETIMEDOUT; >> >> + } >> >> >> >> return 0; >> >> >> >> } >> >> >> >> Now it prints from time to time: >> >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** >> >> >> >> which shows that rspi->spsr (as set from the interrupt handler) didn't >> >> have bit 7 set, while RSPI_SPSR does have bit 7 set. >> >> >> >> So this looks like a race condition in the interrupt handling. >> > >> > What happens if you print rspi->spsr in the interrupt handler ? Does it >> > have bit 7 set ? >> >> I haven't tried that yet. The driver is extremely interrupt-heavy (O(n), >> with n the number of bytes transfered), so adding a printk() as-is won't be >> a good idea. >> >> Will think a bit more about a better approach... > > Just limit it to the first 10 interrupts then :-) That assumes the issue happens during the first 10 interrupts, which is usually not the case ;-) > What I'd like to know is whether the interrupt is trigerred before bit 7 gets > set by the hardware, or if the rspi_wait_for_interrupt() function gets a stale > value of rspi->spsr. It seems to get a stale value. I ended up doing the printk() anyway, and when reading using small buffer sizes, the issues is more likely to happen. If I add a second copy of rspi->spsr, and make that copy volatile OR add a few calls to smp_mb() around the places where it's written and read, the copy has the right value inside rspi_wait_for_interrupt(), BUT only if I also print the spsr value inside rspi_irq(). Note that the original rspi->spsr still has the stale value. I'd expect wake_up() and wait_event_timeout() to have the right memory barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add my own. And why does it need the extra printk()? What extra synchronization does that give? Note that the interrupt handler always runs on CPU core 0, while rspi_wait_for_interrupt() can run on core 0 or 1. Is there a cache-coherency issue between the two CPU cores? Am I missing something? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 8, 2014 at 10:13 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Wed, Jan 8, 2014 at 9:41 AM, Laurent Pinchart > <laurent.pinchart@ideasonboard.com> wrote: >> On Wednesday 08 January 2014 09:28:59 Geert Uytterhoeven wrote: >>> On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart wrote: >>> > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: >>> >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI >>> >> >>> >> driver on r8a7791: >>> >> m25p80 spi0.0: error -110 reading SR >>> >> >>> >> Until I applied the following patch, which re-reads RSPI_SPSR on a >>> >> time-out, and continues if the condition has become true: >>> >> >>> >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c >>> >> index 4b31d89e8568..e63e30c500da 100644 >>> >> --- a/drivers/spi/spi-rspi.c >>> >> +++ b/drivers/spi/spi-rspi.c >>> >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data >>> >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); >>> >> >>> >> rspi_enable_irq(rspi, enable_bit); >>> >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); >>> >> >>> >> - if (ret == 0 && !(rspi->spsr & wait_mask)) >>> >> - return -ETIMEDOUT; >>> >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { >>> >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); >>> >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, >>> >> wait_mask => >>> > 0x%02x >>> > >>> >> ***\n", >>> >> + rspi->spsr, spsr, wait_mask); >>> >> + if (!(spsr & wait_mask)) >>> >> + return -ETIMEDOUT; >>> >> + } >>> >> >>> >> return 0; >>> >> >>> >> } >>> >> >>> >> Now it prints from time to time: >>> >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** >>> >> >>> >> which shows that rspi->spsr (as set from the interrupt handler) didn't >>> >> have bit 7 set, while RSPI_SPSR does have bit 7 set. >>> >> >>> >> So this looks like a race condition in the interrupt handling. >>> > >>> > What happens if you print rspi->spsr in the interrupt handler ? Does it >>> > have bit 7 set ? >>> >>> I haven't tried that yet. The driver is extremely interrupt-heavy (O(n), >>> with n the number of bytes transfered), so adding a printk() as-is won't be >>> a good idea. >>> >>> Will think a bit more about a better approach... >> >> Just limit it to the first 10 interrupts then :-) > > That assumes the issue happens during the first 10 interrupts, which is > usually not the case ;-) > >> What I'd like to know is whether the interrupt is trigerred before bit 7 gets >> set by the hardware, or if the rspi_wait_for_interrupt() function gets a stale >> value of rspi->spsr. > > It seems to get a stale value. I ended up doing the printk() anyway, and > when reading using small buffer sizes, the issues is more likely to happen. > > If I add a second copy of rspi->spsr, and make that copy volatile OR add > a few calls to smp_mb() around the places where it's written and read, > the copy has the right value inside rspi_wait_for_interrupt(), BUT only if > I also print the spsr value inside rspi_irq(). Note that the original rspi->spsr > still has the stale value. > > I'd expect wake_up() and wait_event_timeout() to have the right memory > barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add > my own. And why does it need the extra printk()? What extra > synchronization does that give? > > Note that the interrupt handler always runs on CPU core 0, > while rspi_wait_for_interrupt() can run on core 0 or 1. > Is there a cache-coherency issue between the two CPU cores? > > Am I missing something? What happens if you run on a uniprocessor system or in case of R-Car disable SMP and run with a single CPU core? Cheers, / magnus -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Magnus, On Thu, Jan 9, 2014 at 4:26 AM, Magnus Damm <magnus.damm@gmail.com> wrote: >>> What I'd like to know is whether the interrupt is trigerred before bit 7 gets >>> set by the hardware, or if the rspi_wait_for_interrupt() function gets a stale >>> value of rspi->spsr. >> >> It seems to get a stale value. I ended up doing the printk() anyway, and >> when reading using small buffer sizes, the issues is more likely to happen. >> >> If I add a second copy of rspi->spsr, and make that copy volatile OR add >> a few calls to smp_mb() around the places where it's written and read, >> the copy has the right value inside rspi_wait_for_interrupt(), BUT only if >> I also print the spsr value inside rspi_irq(). Note that the original rspi->spsr >> still has the stale value. >> >> I'd expect wake_up() and wait_event_timeout() to have the right memory >> barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add >> my own. And why does it need the extra printk()? What extra >> synchronization does that give? >> >> Note that the interrupt handler always runs on CPU core 0, >> while rspi_wait_for_interrupt() can run on core 0 or 1. >> Is there a cache-coherency issue between the two CPU cores? >> >> Am I missing something? > > What happens if you run on a uniprocessor system or in case of R-Car > disable SMP and run with a single CPU core? As expected, I can't get it to fail in UP-mode. In SMP-mode, it fails in ca. 1% of the transfers. It seems to happen when switching between the write and the read part of a transfer, so the problem is less likely to happen when using large buffer sizes, as less transfers are involved for reading the same amount of data. If it fails while reading the JEDEC ID from the SPI FLASH during boot up, you end up without an MTD device. In other cases, the error is recoverable by reading again (from userspace). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Geert, On Wednesday 08 January 2014 14:13:27 Geert Uytterhoeven wrote: > On Wed, Jan 8, 2014 at 9:41 AM, Laurent Pinchart wrote: > > On Wednesday 08 January 2014 09:28:59 Geert Uytterhoeven wrote: > >> On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart wrote: > >> > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: > >> >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI > >> >> > >> >> driver on r8a7791: > >> >> m25p80 spi0.0: error -110 reading SR > >> >> > >> >> Until I applied the following patch, which re-reads RSPI_SPSR on a > >> >> time-out, and continues if the condition has become true: > >> >> > >> >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c > >> >> index 4b31d89e8568..e63e30c500da 100644 > >> >> --- a/drivers/spi/spi-rspi.c > >> >> +++ b/drivers/spi/spi-rspi.c > >> >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct > >> >> rspi_data > >> >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); > >> >> > >> >> rspi_enable_irq(rspi, enable_bit); > >> >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, > >> >> HZ); > >> >> > >> >> - if (ret == 0 && !(rspi->spsr & wait_mask)) > >> >> - return -ETIMEDOUT; > >> >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { > >> >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); > >> >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, > >> >> wait_mask => 0x%02x ***\n", > >> >> + rspi->spsr, spsr, wait_mask); > >> >> + if (!(spsr & wait_mask)) > >> >> + return -ETIMEDOUT; > >> >> + } > >> >> > >> >> return 0; > >> >> } > >> >> > >> >> Now it prints from time to time: > >> >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** > >> >> > >> >> which shows that rspi->spsr (as set from the interrupt handler) didn't > >> >> have bit 7 set, while RSPI_SPSR does have bit 7 set. > >> >> > >> >> So this looks like a race condition in the interrupt handling. > >> > > >> > What happens if you print rspi->spsr in the interrupt handler ? Does it > >> > have bit 7 set ? > >> > >> I haven't tried that yet. The driver is extremely interrupt-heavy (O(n), > >> with n the number of bytes transfered), so adding a printk() as-is won't > >> be a good idea. > >> > >> Will think a bit more about a better approach... > > > > Just limit it to the first 10 interrupts then :-) > > That assumes the issue happens during the first 10 interrupts, which is > usually not the case ;-) > > > What I'd like to know is whether the interrupt is trigerred before bit 7 > > gets set by the hardware, or if the rspi_wait_for_interrupt() function > > gets a stale value of rspi->spsr. > > It seems to get a stale value. I ended up doing the printk() anyway, and > when reading using small buffer sizes, the issues is more likely to happen. > > If I add a second copy of rspi->spsr, and make that copy volatile OR add > a few calls to smp_mb() around the places where it's written and read, > the copy has the right value inside rspi_wait_for_interrupt(), BUT only if > I also print the spsr value inside rspi_irq(). Note that the original > rspi->spsr still has the stale value. Do I understand it correctly that the SPSR value read from the register in rspi_irq() is always correct on the first read, and that only the rspi->spsr value stored in memory and read in rspi_wait_for_interrupt() is wrong ? > I'd expect wake_up() and wait_event_timeout() to have the right memory > barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add > my own. And why does it need the extra printk()? What extra synchronization > does that give? Have you tried adding explicit memory barriers without any extra printk ? > Note that the interrupt handler always runs on CPU core 0, while > rspi_wait_for_interrupt() can run on core 0 or 1. Is there a cache-coherency > issue between the two CPU cores? I'd be very surprised if that was the case, as there should be lots of other breakages, but let's not rule that out too fast. Magnus should be able to help you more than I can regarding that. > Am I missing something?
Hi Geert, On Thursday 09 January 2014 19:13:04 Laurent Pinchart wrote: > On Wednesday 08 January 2014 14:13:27 Geert Uytterhoeven wrote: > > On Wed, Jan 8, 2014 at 9:41 AM, Laurent Pinchart wrote: > > > On Wednesday 08 January 2014 09:28:59 Geert Uytterhoeven wrote: > > >> On Wed, Jan 8, 2014 at 1:27 AM, Laurent Pinchart wrote: > > >> > On Tuesday 07 January 2014 21:27:18 Geert Uytterhoeven wrote: > > >> >> I was regularly getting I/O errors when using the Renesas RSPI/QSPI > > >> >> > > >> >> driver on r8a7791: > > >> >> m25p80 spi0.0: error -110 reading SR > > >> >> > > >> >> Until I applied the following patch, which re-reads RSPI_SPSR on a > > >> >> time-out, and continues if the condition has become true: > > >> >> > > >> >> diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c > > >> >> index 4b31d89e8568..e63e30c500da 100644 > > >> >> --- a/drivers/spi/spi-rspi.c > > >> >> +++ b/drivers/spi/spi-rspi.c > > >> >> @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct > > >> >> rspi_data > > >> >> *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); > > >> >> > > >> >> rspi_enable_irq(rspi, enable_bit); > > >> >> ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, > > >> >> HZ); > > >> >> > > >> >> - if (ret == 0 && !(rspi->spsr & wait_mask)) > > >> >> - return -ETIMEDOUT; > > >> >> + if (ret == 0 && !(rspi->spsr & wait_mask)) { > > >> >> + u8 spsr = rspi_read8(rspi, RSPI_SPSR); > > >> >> + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, > > >> >> wait_mask => 0x%02x ***\n", > > >> >> + rspi->spsr, spsr, wait_mask); > > >> >> + if (!(spsr & wait_mask)) > > >> >> + return -ETIMEDOUT; > > >> >> + } > > >> >> > > >> >> return 0; > > >> >> > > >> >> } > > >> >> > > >> >> Now it prints from time to time: > > >> >> *** rspi->spsr = 0x20, real spsr = 0xa0, wait_mask = 0x80 *** > > >> >> > > >> >> which shows that rspi->spsr (as set from the interrupt handler) > > >> >> didn't > > >> >> have bit 7 set, while RSPI_SPSR does have bit 7 set. > > >> >> > > >> >> So this looks like a race condition in the interrupt handling. > > >> > > > >> > What happens if you print rspi->spsr in the interrupt handler ? Does > > >> > it > > >> > have bit 7 set ? > > >> > > >> I haven't tried that yet. The driver is extremely interrupt-heavy > > >> (O(n), > > >> with n the number of bytes transfered), so adding a printk() as-is > > >> won't > > >> be a good idea. > > >> > > >> Will think a bit more about a better approach... > > > > > > Just limit it to the first 10 interrupts then :-) > > > > That assumes the issue happens during the first 10 interrupts, which is > > usually not the case ;-) > > > > > What I'd like to know is whether the interrupt is trigerred before bit 7 > > > gets set by the hardware, or if the rspi_wait_for_interrupt() function > > > gets a stale value of rspi->spsr. > > > > It seems to get a stale value. I ended up doing the printk() anyway, and > > when reading using small buffer sizes, the issues is more likely to > > happen. > > > > If I add a second copy of rspi->spsr, and make that copy volatile OR add > > a few calls to smp_mb() around the places where it's written and read, > > the copy has the right value inside rspi_wait_for_interrupt(), BUT only if > > I also print the spsr value inside rspi_irq(). Note that the original > > rspi->spsr still has the stale value. > > Do I understand it correctly that the SPSR value read from the register in > rspi_irq() is always correct on the first read, and that only the rspi->spsr > value stored in memory and read in rspi_wait_for_interrupt() is wrong ? > > I'd expect wake_up() and wait_event_timeout() to have the right memory > > barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add > > my own. And why does it need the extra printk()? What extra > > synchronization > > does that give? > > Have you tried adding explicit memory barriers without any extra printk ? > > > Note that the interrupt handler always runs on CPU core 0, while > > rspi_wait_for_interrupt() can run on core 0 or 1. Is there a > > cache-coherency issue between the two CPU cores? > > I'd be very surprised if that was the case, as there should be lots of other > breakages, but let's not rule that out too fast. Magnus should be able to > help you more than I can regarding that. Could you verify that the SMP bit is set in ACTLR (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHHDDHJ.html) ? > > Am I missing something?
Hi Laurent, On Thu, Jan 9, 2014 at 7:14 PM, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: >> > It seems to get a stale value. I ended up doing the printk() anyway, and >> > when reading using small buffer sizes, the issues is more likely to >> > happen. >> > >> > If I add a second copy of rspi->spsr, and make that copy volatile OR add >> > a few calls to smp_mb() around the places where it's written and read, >> > the copy has the right value inside rspi_wait_for_interrupt(), BUT only if >> > I also print the spsr value inside rspi_irq(). Note that the original >> > rspi->spsr still has the stale value. >> >> Do I understand it correctly that the SPSR value read from the register in >> rspi_irq() is always correct on the first read, and that only the rspi->spsr >> value stored in memory and read in rspi_wait_for_interrupt() is wrong ? >> > I'd expect wake_up() and wait_event_timeout() to have the right memory >> > barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add >> > my own. And why does it need the extra printk()? What extra >> > synchronization >> > does that give? >> >> Have you tried adding explicit memory barriers without any extra printk ? Yes, cfr. "add a few calls to smp_mb()" above. >> > Note that the interrupt handler always runs on CPU core 0, while >> > rspi_wait_for_interrupt() can run on core 0 or 1. Is there a >> > cache-coherency issue between the two CPU cores? >> >> I'd be very surprised if that was the case, as there should be lots of other >> breakages, but let's not rule that out too fast. Magnus should be able to >> help you more than I can regarding that. > > Could you verify that the SMP bit is set in ACTLR > (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHHDDHJ.html) ACTLR = 0x000000c0 on CPU 0 ACTLR = 0x00000040 on CPU 1 0xc0 = EXCL | SMP 0x40 = SMP The different EXCL value seems strange to me, especially considering the comment about the matching cache controller configuration: [7] EXCL Exclusive cache bit. The exclusive cache configuration does not permit data to reside in L1 and L2 at the same time. The exclusive cache configuration provides support for only caching data on an eviction from L1 when the inner cache attributes are Write-Back, Cacheable and allocated in L1. Ensure that your cache controller is also configured for exclusive caching. 0 = disabled. This is the reset value. 1 = enabled. CCing the list with the more ARM-knowledgeable people... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 10, 2014 at 10:20 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Thu, Jan 9, 2014 at 7:14 PM, Laurent Pinchart > <laurent.pinchart@ideasonboard.com> wrote: >>> > It seems to get a stale value. I ended up doing the printk() anyway, and >>> > when reading using small buffer sizes, the issues is more likely to >>> > happen. >>> > >>> > If I add a second copy of rspi->spsr, and make that copy volatile OR add >>> > a few calls to smp_mb() around the places where it's written and read, >>> > the copy has the right value inside rspi_wait_for_interrupt(), BUT only if >>> > I also print the spsr value inside rspi_irq(). Note that the original >>> > rspi->spsr still has the stale value. >>> >>> Do I understand it correctly that the SPSR value read from the register in >>> rspi_irq() is always correct on the first read, and that only the rspi->spsr >>> value stored in memory and read in rspi_wait_for_interrupt() is wrong ? >>> > I'd expect wake_up() and wait_event_timeout() to have the right memory >>> > barriers (cfr. Documentation/memory-barriers.txt), so I don't have to add >>> > my own. And why does it need the extra printk()? What extra >>> > synchronization >>> > does that give? >>> >>> Have you tried adding explicit memory barriers without any extra printk ? > > Yes, cfr. "add a few calls to smp_mb()" above. > >>> > Note that the interrupt handler always runs on CPU core 0, while >>> > rspi_wait_for_interrupt() can run on core 0 or 1. Is there a >>> > cache-coherency issue between the two CPU cores? >>> >>> I'd be very surprised if that was the case, as there should be lots of other >>> breakages, but let's not rule that out too fast. Magnus should be able to >>> help you more than I can regarding that. >> >> Could you verify that the SMP bit is set in ACTLR >> (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHHDDHJ.html) > > ACTLR = 0x000000c0 on CPU 0 > ACTLR = 0x00000040 on CPU 1 > > 0xc0 = EXCL | SMP > 0x40 = SMP > > The different EXCL value seems strange to me, especially considering the comment > about the matching cache controller configuration: I tried enabling the EXCL bit for ACTLR on CPU 1, but it doesn't make a difference. One other thing: As I was playing with m25p80, I had #define DEBUG at the top of drivers/mtd/devices/m25p80.c. This doesn't cause much kernel output, just one message before every read: spi0.0: m25p80_read from 0x00108600, len 512 However, without the #define DEBUG, it's _much_ more difficult to trigger the issue. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c index 4b31d89e8568..e63e30c500da 100644 --- a/drivers/spi/spi-rspi.c +++ b/drivers/spi/spi-rspi.c @@ -442,8 +442,13 @@ static int rspi_wait_for_interrupt(struct rspi_data *rspi, u8 wait_mask, rspi->spsr = rspi_read8(rspi, RSPI_SPSR); rspi_enable_irq(rspi, enable_bit); ret = wait_event_timeout(rspi->wait, rspi->spsr & wait_mask, HZ); - if (ret == 0 && !(rspi->spsr & wait_mask)) - return -ETIMEDOUT; + if (ret == 0 && !(rspi->spsr & wait_mask)) { + u8 spsr = rspi_read8(rspi, RSPI_SPSR); + printk("*** rspi->spsr = 0x%02x, real spsr = 0x%02x, wait_mask = 0x%02x ***\n", + rspi->spsr, spsr, wait_mask); + if (!(spsr & wait_mask)) + return -ETIMEDOUT; + } return 0; }