Message ID | 1360941242-18153-14-git-send-email-g.liakhovetski@gmx.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Friday 15 February 2013, Guennadi Liakhovetski wrote: > Without barriers SDIO operations fail with runtime PM enabled. I don't understand how the changeset comment relates to the patch. > diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h > index d857f5c..a10ebd0 100644 > --- a/drivers/mmc/host/tmio_mmc.h > +++ b/drivers/mmc/host/tmio_mmc.h > @@ -159,19 +159,20 @@ int tmio_mmc_host_runtime_resume(struct device *dev); > > static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr) > { > - return readw(host->ctl + (addr << host->bus_shift)); > + return ioread16(host->ctl + (addr << host->bus_shift)); > } > As far as I know, all architectures are required to have the same barrier semantics on readw and ioread16. The only difference between the two is that ioread16 must be able top operate on an __iomem token returned by ioport_map() or pci_iomap, which readw does not have to, but does on ARM. > static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr, > u16 *buf, int count) > { > - readsw(host->ctl + (addr << host->bus_shift), buf, count); > + wmb(); > + ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count); > } Same thing here: both readsw and ioread16_rep are supposed to do the same thing, and I would assume that they should also include that barrier. For some reason I don't understand, they do not have the barrier on ARM at the moment, but I cannot say whether that is intentional or not. Maybe Russell can comment on this. Also, should the barrier not be /after/ the MMIO read, rather than before it? Typically the barrier should ensure that any read from memory after an MMIO read reflects the memory contents after any DMA is complete that the MMIO read has already claimed to be done. > static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr, > u16 *buf, int count) > { > - writesw(host->ctl + (addr << host->bus_shift), buf, count); > + iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count); > + wmb(); > } Similarly here: why do you have the wmb after the iowrite rather than before it? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Arnd On Mon, 18 Feb 2013, Arnd Bergmann wrote: > On Friday 15 February 2013, Guennadi Liakhovetski wrote: > > Without barriers SDIO operations fail with runtime PM enabled. > > I don't understand how the changeset comment relates to the patch. > > > diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h > > index d857f5c..a10ebd0 100644 > > --- a/drivers/mmc/host/tmio_mmc.h > > +++ b/drivers/mmc/host/tmio_mmc.h > > @@ -159,19 +159,20 @@ int tmio_mmc_host_runtime_resume(struct device *dev); > > > > static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr) > > { > > - return readw(host->ctl + (addr << host->bus_shift)); > > + return ioread16(host->ctl + (addr << host->bus_shift)); > > } > > > > As far as I know, all architectures are required to have the same barrier > semantics on readw and ioread16. The only difference between the two is > that ioread16 must be able top operate on an __iomem token returned by > ioport_map() or pci_iomap, which readw does not have to, but does on ARM. Indeed, the real difference are the barriers in repeated IO operations. > > static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr, > > u16 *buf, int count) > > { > > - readsw(host->ctl + (addr << host->bus_shift), buf, count); > > + wmb(); > > + ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count); > > } > > Same thing here: both readsw and ioread16_rep are supposed to do the same > thing, and I would assume that they should also include that barrier. > For some reason I don't understand, they do not have the barrier on > ARM at the moment, but I cannot say whether that is intentional or not. > > Maybe Russell can comment on this. > > Also, should the barrier not be /after/ the MMIO read, rather than before it? > Typically the barrier should ensure that any read from memory after an > MMIO read reflects the memory contents after any DMA is complete that > the MMIO read has already claimed to be done. Errors, that I've been observing were happening with no DMA, in pure PIO mode. Unfortunately, I don't have a good explanation, why the barriers _have_ to be there, where I put them. At some point during my testing, I had printk()s in the code and SDIO worked. Then the classical - remove printk()s - stops working. Delays didn't halp, but barriers did. The motivation to put a write barrier before a (repeated) read was to wait for completion of any write operations before starting a read. And indeed, normal write operations, like writew() / iowrite16() have a write barrier _before_ the write. So, isn't it possible, that the last write hasn't completed yet, while we begin with reading? But reads / writes should, probably, anyway be serialised on the bus... It's also possible, that these errors are related to runtime power-management, which would involve IO to other SoC peripherals. But they all should also contain barriers, so, this doesn't explain it immediately either. Thanks Guennadi > > static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr, > > u16 *buf, int count) > > { > > - writesw(host->ctl + (addr << host->bus_shift), buf, count); > > + iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count); > > + wmb(); > > } > > Similarly here: why do you have the wmb after the iowrite rather than before it? > > Arnd > --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 18 February 2013, Guennadi Liakhovetski wrote: > > Also, should the barrier not be after the MMIO read, rather than before it? > > Typically the barrier should ensure that any read from memory after an > > MMIO read reflects the memory contents after any DMA is complete that > > the MMIO read has already claimed to be done. > > Errors, that I've been observing were happening with no DMA, in pure PIO > mode. > > Unfortunately, I don't have a good explanation, why the barriers have to > be there, where I put them. At some point during my testing, I had > printk()s in the code and SDIO worked. Then the classical - remove > printk()s - stops working. Delays didn't halp, but barriers did. The > motivation to put a write barrier before a (repeated) read was to wait for > completion of any write operations before starting a read. And indeed, > normal write operations, like writew() / iowrite16() have a write barrier > before the write. So, isn't it possible, that the last write hasn't > completed yet, while we begin with reading? But reads / writes should, > probably, anyway be serialised on the bus... What kind of bus is this? All buses I have looked at do serialize reads and writes to the same address at the minimum, and all sane buses serialize them when they happen to the same device, but it's harder to do when you need to serialize e.g. a read with a previous write to another device or another bus connected to the same device. Let me try to say what I understand from reading the code: These accessors are only used on one function, and there is no DMA involved here. The function does (simplified): void tmio_mmc_pio_irq(struct page *, void __iomem *io, size_t count, bool read) { void *virt; /* called from interrupt handler, but let's disable interrupts some more ;) */ local_irq_disable(); /* if highmem, map the page */ virt = kmap_atomic(page); if (read) { wmb(); readsw(io, virt, count); } else { writesw(io, virt, count); rmb(); } kunmap_atomic(virt); } I don't think there is any other I/O involved, so the barriers are probably needed to synchronize with whoever is accessing the page on the other side of the kmap/kunmap. Did the bug happen on highmem? The barriers here should probably be outside of the I/O accessors and in the tmio_mmc_pio_irq() function, but I still can't explain why they are needed here. > It's also possible, that these errors are related to runtime > power-management, which would involve IO to other SoC peripherals. But > they all should also contain barriers, so, this doesn't explain it > immediately either. Maybe the runtime-pm code uses __raw_writel or writel_relaxed where it should be using writel? Or maybe there some effect with disabling the caches and not flushing them properly in advance here? In any case, I would expect a much more specific changeset text for a patch like this. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 18 Feb 2013, Arnd Bergmann wrote: > On Monday 18 February 2013, Guennadi Liakhovetski wrote: > > > Also, should the barrier not be after the MMIO read, rather than before it? > > > Typically the barrier should ensure that any read from memory after an > > > MMIO read reflects the memory contents after any DMA is complete that > > > the MMIO read has already claimed to be done. > > > > Errors, that I've been observing were happening with no DMA, in pure PIO > > mode. > > > > Unfortunately, I don't have a good explanation, why the barriers have to > > be there, where I put them. At some point during my testing, I had > > printk()s in the code and SDIO worked. Then the classical - remove > > printk()s - stops working. Delays didn't halp, but barriers did. The > > motivation to put a write barrier before a (repeated) read was to wait for > > completion of any write operations before starting a read. And indeed, > > normal write operations, like writew() / iowrite16() have a write barrier > > before the write. So, isn't it possible, that the last write hasn't > > completed yet, while we begin with reading? But reads / writes should, > > probably, anyway be serialised on the bus... > > What kind of bus is this? Sorry, I'm not sure how best to describe it, and I don't have sufficient information myself. In any case on a block-diagram of sh73a0 SDHI devices aren't directly connected to a common super-highway bus, instead they are on a bus-splitter. One more thing I forgot to mention - this error has been observed on SMP. > All buses I have looked at do serialize reads > and writes to the same address at the minimum, and all sane buses > serialize them when they happen to the same device, but it's harder > to do when you need to serialize e.g. a read with a previous write > to another device or another bus connected to the same device. > > Let me try to say what I understand from reading the code: These > accessors are only used on one function, and there is no DMA > involved here. The function does (simplified): > > void tmio_mmc_pio_irq(struct page *, void __iomem *io, size_t count, bool read) > { > void *virt; > > /* called from interrupt handler, but let's disable interrupts some more ;) */ > local_irq_disable(); > > /* if highmem, map the page */ > virt = kmap_atomic(page); > if (read) { > wmb(); > readsw(io, virt, count); > } else { > writesw(io, virt, count); > rmb(); > } > > kunmap_atomic(virt); > } > > I don't think there is any other I/O involved, so the barriers are probably > needed to synchronize with whoever is accessing the page on the other > side of the kmap/kunmap. Did the bug happen on highmem? No, don't think highmem was involved. > The barriers here should probably be outside of the I/O accessors > and in the tmio_mmc_pio_irq() function, but I still can't explain > why they are needed here. > > > It's also possible, that these errors are related to runtime > > power-management, which would involve IO to other SoC peripherals. But > > they all should also contain barriers, so, this doesn't explain it > > immediately either. > > Maybe the runtime-pm code uses __raw_writel or writel_relaxed where it > should be using writel? I checked clock code under drivers/sh/clk, I don't see any relevant __raw_* operations there. One I do see, however, GIC code does use *_relaxed() accessors, but under raw spinlocks... Do those provide sufficient barriers? Thanks Guennadi > Or maybe there some effect with disabling the caches and not flushing > them properly in advance here? > > In any case, I would expect a much more specific changeset text for a > patch like this. > > Arnd > --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 18 February 2013, Guennadi Liakhovetski wrote: > On Mon, 18 Feb 2013, Arnd Bergmann wrote: > > Sorry, I'm not sure how best to describe it, and I don't have sufficient > information myself. In any case on a block-diagram of sh73a0 SDHI devices > aren't directly connected to a common super-highway bus, instead they are > on a bus-splitter. One more thing I forgot to mention - this error has > been observed on SMP. Ah, I guess that explains it. If you have one CPU filling the buffer and another CPU reading it, you need an smp_wmb/wmp_rmb pair. That case can easily happen with tmio interrupt handler in PIO mode. > I checked clock code under drivers/sh/clk, I don't see any relevant > __raw_* operations there. One I do see, however, GIC code does use > *_relaxed() accessors, but under raw spinlocks... Do those provide > sufficient barriers? A spinlock by definition contains an SMP barrier, which seems to be what is missing here. I don't know if that is enough here, because the spinlock only needs barriers to protect against other users behind the same spinlock. In architecture independent code, any smp_wmb() on the writer side must be paired with an smp_rmb on the reader. You probably have the sufficient barriers on the side that initiates the I/O because that uses a dmb() in the MMIO instructions, but that is only by accident and not portable. Also, in the uniprocessor case, you have more barriers than you need, which limits the performance. The best solution for both performance and readability would be to use readw_relaxed() and writew_relaxed() and manually add the barriers you actually need, i.e. two smp_rmb()/smp_wmb() in the PIO case, and a wmb() before you start a real DMA to the device and an rmb() after a DMA from the device is complete. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 18 Feb 2013, Arnd Bergmann wrote: > On Monday 18 February 2013, Guennadi Liakhovetski wrote: > > On Mon, 18 Feb 2013, Arnd Bergmann wrote: > > > > Sorry, I'm not sure how best to describe it, and I don't have sufficient > > information myself. In any case on a block-diagram of sh73a0 SDHI devices > > aren't directly connected to a common super-highway bus, instead they are > > on a bus-splitter. One more thing I forgot to mention - this error has > > been observed on SMP. > > Ah, I guess that explains it. If you have one CPU filling the buffer > and another CPU reading it, you need an smp_wmb/wmp_rmb pair. > > That case can easily happen with tmio interrupt handler in PIO > mode. Ok, I also managed to reproduce it on UP, and it definitely is related to MMC_CAP_POWER_OFF_CARD, but I'll have to do some more work on it. I asked Chris to hold back on this patch for now. Thanks Guennadi > > I checked clock code under drivers/sh/clk, I don't see any relevant > > __raw_* operations there. One I do see, however, GIC code does use > > *_relaxed() accessors, but under raw spinlocks... Do those provide > > sufficient barriers? > > A spinlock by definition contains an SMP barrier, which seems to be > what is missing here. I don't know if that is enough here, because > the spinlock only needs barriers to protect against other users > behind the same spinlock. > > In architecture independent code, any smp_wmb() on the writer side > must be paired with an smp_rmb on the reader. You probably have the > sufficient barriers on the side that initiates the I/O because > that uses a dmb() in the MMIO instructions, but that is only > by accident and not portable. > > Also, in the uniprocessor case, you have more barriers than you > need, which limits the performance. The best solution for > both performance and readability would be to use readw_relaxed() > and writew_relaxed() and manually add the barriers you actually > need, i.e. two smp_rmb()/smp_wmb() in the PIO case, and a > wmb() before you start a real DMA to the device and an rmb() > after a DMA from the device is complete. > > Arnd > --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h index d857f5c..a10ebd0 100644 --- a/drivers/mmc/host/tmio_mmc.h +++ b/drivers/mmc/host/tmio_mmc.h @@ -159,19 +159,20 @@ int tmio_mmc_host_runtime_resume(struct device *dev); static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr) { - return readw(host->ctl + (addr << host->bus_shift)); + return ioread16(host->ctl + (addr << host->bus_shift)); } static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr, u16 *buf, int count) { - readsw(host->ctl + (addr << host->bus_shift), buf, count); + wmb(); + ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count); } static inline u32 sd_ctrl_read32(struct tmio_mmc_host *host, int addr) { - return readw(host->ctl + (addr << host->bus_shift)) | - readw(host->ctl + ((addr + 2) << host->bus_shift)) << 16; + return ioread16(host->ctl + (addr << host->bus_shift)) | + ioread16(host->ctl + ((addr + 2) << host->bus_shift)) << 16; } static inline void sd_ctrl_write16(struct tmio_mmc_host *host, int addr, u16 val) @@ -181,19 +182,20 @@ static inline void sd_ctrl_write16(struct tmio_mmc_host *host, int addr, u16 val */ if (host->pdata->write16_hook && host->pdata->write16_hook(host, addr)) return; - writew(val, host->ctl + (addr << host->bus_shift)); + iowrite16(val, host->ctl + (addr << host->bus_shift)); } static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr, u16 *buf, int count) { - writesw(host->ctl + (addr << host->bus_shift), buf, count); + iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count); + wmb(); } static inline void sd_ctrl_write32(struct tmio_mmc_host *host, int addr, u32 val) { - writew(val, host->ctl + (addr << host->bus_shift)); - writew(val >> 16, host->ctl + ((addr + 2) << host->bus_shift)); + iowrite16(val, host->ctl + (addr << host->bus_shift)); + iowrite16(val >> 16, host->ctl + ((addr + 2) << host->bus_shift)); }