diff mbox

[v4,13/13] mmc: tmio: add barriers to IO operations

Message ID 1360941242-18153-14-git-send-email-g.liakhovetski@gmx.de (mailing list archive)
State Superseded
Headers show

Commit Message

Guennadi Liakhovetski Feb. 15, 2013, 3:14 p.m. UTC
Without barriers SDIO operations fail with runtime PM enabled.

Reviewed-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
---
 drivers/mmc/host/tmio_mmc.h |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

Comments

Arnd Bergmann Feb. 18, 2013, 3:05 p.m. UTC | #1
On Friday 15 February 2013, Guennadi Liakhovetski wrote:
> Without barriers SDIO operations fail with runtime PM enabled.

I don't understand how the changeset comment relates to the patch.

> diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
> index d857f5c..a10ebd0 100644
> --- a/drivers/mmc/host/tmio_mmc.h
> +++ b/drivers/mmc/host/tmio_mmc.h
> @@ -159,19 +159,20 @@ int tmio_mmc_host_runtime_resume(struct device *dev);
>  
>  static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr)
>  {
> -	return readw(host->ctl + (addr << host->bus_shift));
> +	return ioread16(host->ctl + (addr << host->bus_shift));
>  }
>  

As far as I know, all architectures are required to have the same barrier
semantics on readw and ioread16. The only difference between the two is
that ioread16 must be able top operate on an __iomem token returned by
ioport_map() or pci_iomap, which readw does not have to, but does on ARM.

>  static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr,
>  		u16 *buf, int count)
>  {
> -	readsw(host->ctl + (addr << host->bus_shift), buf, count);
> +	wmb();
> +	ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count);
>  }

Same thing here: both readsw and ioread16_rep are supposed to do the same
thing, and I would assume that they should also include that barrier.
For some reason I don't understand, they do not have the barrier on
ARM at the moment, but I cannot say whether that is intentional or not.

Maybe Russell can comment on this.

Also, should the barrier not be /after/ the MMIO read, rather than before it?
Typically the barrier should ensure that any read from memory after an
MMIO read reflects the memory contents after any DMA is complete that
the MMIO read has already claimed to be done.

>  static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr,
>  		u16 *buf, int count)
>  {
> -	writesw(host->ctl + (addr << host->bus_shift), buf, count);
> +	iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count);
> +	wmb();
>  }

Similarly here: why do you have the wmb after the iowrite rather than before it?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guennadi Liakhovetski Feb. 18, 2013, 3:56 p.m. UTC | #2
Hi Arnd

On Mon, 18 Feb 2013, Arnd Bergmann wrote:

> On Friday 15 February 2013, Guennadi Liakhovetski wrote:
> > Without barriers SDIO operations fail with runtime PM enabled.
> 
> I don't understand how the changeset comment relates to the patch.
> 
> > diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
> > index d857f5c..a10ebd0 100644
> > --- a/drivers/mmc/host/tmio_mmc.h
> > +++ b/drivers/mmc/host/tmio_mmc.h
> > @@ -159,19 +159,20 @@ int tmio_mmc_host_runtime_resume(struct device *dev);
> >  
> >  static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr)
> >  {
> > -	return readw(host->ctl + (addr << host->bus_shift));
> > +	return ioread16(host->ctl + (addr << host->bus_shift));
> >  }
> >  
> 
> As far as I know, all architectures are required to have the same barrier
> semantics on readw and ioread16. The only difference between the two is
> that ioread16 must be able top operate on an __iomem token returned by
> ioport_map() or pci_iomap, which readw does not have to, but does on ARM.

Indeed, the real difference are the barriers in repeated IO operations.

> >  static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr,
> >  		u16 *buf, int count)
> >  {
> > -	readsw(host->ctl + (addr << host->bus_shift), buf, count);
> > +	wmb();
> > +	ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count);
> >  }
> 
> Same thing here: both readsw and ioread16_rep are supposed to do the same
> thing, and I would assume that they should also include that barrier.
> For some reason I don't understand, they do not have the barrier on
> ARM at the moment, but I cannot say whether that is intentional or not.
> 
> Maybe Russell can comment on this.
> 
> Also, should the barrier not be /after/ the MMIO read, rather than before it?
> Typically the barrier should ensure that any read from memory after an
> MMIO read reflects the memory contents after any DMA is complete that
> the MMIO read has already claimed to be done.

Errors, that I've been observing were happening with no DMA, in pure PIO 
mode.

Unfortunately, I don't have a good explanation, why the barriers _have_ to 
be there, where I put them. At some point during my testing, I had 
printk()s in the code and SDIO worked. Then the classical - remove 
printk()s - stops working. Delays didn't halp, but barriers did. The 
motivation to put a write barrier before a (repeated) read was to wait for 
completion of any write operations before starting a read. And indeed, 
normal write operations, like writew() / iowrite16() have a write barrier 
_before_ the write. So, isn't it possible, that the last write hasn't 
completed yet, while we begin with reading? But reads / writes should, 
probably, anyway be serialised on the bus...

It's also possible, that these errors are related to runtime 
power-management, which would involve IO to other SoC peripherals. But 
they all should also contain barriers, so, this doesn't explain it 
immediately either.

Thanks
Guennadi

> >  static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr,
> >  		u16 *buf, int count)
> >  {
> > -	writesw(host->ctl + (addr << host->bus_shift), buf, count);
> > +	iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count);
> > +	wmb();
> >  }
> 
> Similarly here: why do you have the wmb after the iowrite rather than before it?
> 
> 	Arnd
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 18, 2013, 4:34 p.m. UTC | #3
On Monday 18 February 2013, Guennadi Liakhovetski wrote:
> > Also, should the barrier not be after the MMIO read, rather than before it?
> > Typically the barrier should ensure that any read from memory after an
> > MMIO read reflects the memory contents after any DMA is complete that
> > the MMIO read has already claimed to be done.
> 
> Errors, that I've been observing were happening with no DMA, in pure PIO 
> mode.
> 
> Unfortunately, I don't have a good explanation, why the barriers have to 
> be there, where I put them. At some point during my testing, I had 
> printk()s in the code and SDIO worked. Then the classical - remove 
> printk()s - stops working. Delays didn't halp, but barriers did. The 
> motivation to put a write barrier before a (repeated) read was to wait for 
> completion of any write operations before starting a read. And indeed, 
> normal write operations, like writew() / iowrite16() have a write barrier 
> before the write. So, isn't it possible, that the last write hasn't 
> completed yet, while we begin with reading? But reads / writes should, 
> probably, anyway be serialised on the bus...

What kind of bus is this? All buses I have looked at do serialize reads
and writes to the same address at the minimum, and all sane buses
serialize them when they happen to the same device, but it's harder
to do when you need to serialize e.g. a read with a previous write
to another device or another bus connected to the same device.

Let me try to say what I understand from reading the code: These
accessors are only used on one function, and there is no DMA
involved here. The function does (simplified):

void tmio_mmc_pio_irq(struct page *, void __iomem *io, size_t count, bool read)
{
	void *virt;

	/* called from interrupt handler, but let's disable interrupts some more ;) */
	local_irq_disable();

	/* if highmem, map the page */
	virt = kmap_atomic(page);
	if (read) {
		wmb();
		readsw(io, virt, count);
	} else {
		writesw(io, virt, count);
		rmb();
	}

	kunmap_atomic(virt);
}

I don't think there is any other I/O involved, so the barriers are probably
needed to synchronize with whoever is accessing the page on the other
side of the kmap/kunmap. Did the bug happen on highmem?

The barriers here should probably be outside of the I/O accessors
and in the tmio_mmc_pio_irq() function, but I still can't explain
why they are needed here.

> It's also possible, that these errors are related to runtime 
> power-management, which would involve IO to other SoC peripherals. But 
> they all should also contain barriers, so, this doesn't explain it 
> immediately either.

Maybe the runtime-pm code uses __raw_writel or writel_relaxed where it
should be using writel?

Or maybe there some effect with disabling the caches and not flushing
them properly in advance here?

In any case, I would expect a much more specific changeset text for a
patch like this.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guennadi Liakhovetski Feb. 18, 2013, 5:20 p.m. UTC | #4
On Mon, 18 Feb 2013, Arnd Bergmann wrote:

> On Monday 18 February 2013, Guennadi Liakhovetski wrote:
> > > Also, should the barrier not be after the MMIO read, rather than before it?
> > > Typically the barrier should ensure that any read from memory after an
> > > MMIO read reflects the memory contents after any DMA is complete that
> > > the MMIO read has already claimed to be done.
> > 
> > Errors, that I've been observing were happening with no DMA, in pure PIO 
> > mode.
> > 
> > Unfortunately, I don't have a good explanation, why the barriers have to 
> > be there, where I put them. At some point during my testing, I had 
> > printk()s in the code and SDIO worked. Then the classical - remove 
> > printk()s - stops working. Delays didn't halp, but barriers did. The 
> > motivation to put a write barrier before a (repeated) read was to wait for 
> > completion of any write operations before starting a read. And indeed, 
> > normal write operations, like writew() / iowrite16() have a write barrier 
> > before the write. So, isn't it possible, that the last write hasn't 
> > completed yet, while we begin with reading? But reads / writes should, 
> > probably, anyway be serialised on the bus...
> 
> What kind of bus is this?

Sorry, I'm not sure how best to describe it, and I don't have sufficient 
information myself. In any case on a block-diagram of sh73a0 SDHI devices 
aren't directly connected to a common super-highway bus, instead they are 
on a bus-splitter. One more thing I forgot to mention - this error has 
been observed on SMP.

> All buses I have looked at do serialize reads
> and writes to the same address at the minimum, and all sane buses
> serialize them when they happen to the same device, but it's harder
> to do when you need to serialize e.g. a read with a previous write
> to another device or another bus connected to the same device.
> 
> Let me try to say what I understand from reading the code: These
> accessors are only used on one function, and there is no DMA
> involved here. The function does (simplified):
> 
> void tmio_mmc_pio_irq(struct page *, void __iomem *io, size_t count, bool read)
> {
> 	void *virt;
> 
> 	/* called from interrupt handler, but let's disable interrupts some more ;) */
> 	local_irq_disable();
> 
> 	/* if highmem, map the page */
> 	virt = kmap_atomic(page);
> 	if (read) {
> 		wmb();
> 		readsw(io, virt, count);
> 	} else {
> 		writesw(io, virt, count);
> 		rmb();
> 	}
> 
> 	kunmap_atomic(virt);
> }
> 
> I don't think there is any other I/O involved, so the barriers are probably
> needed to synchronize with whoever is accessing the page on the other
> side of the kmap/kunmap. Did the bug happen on highmem?

No, don't think highmem was involved.

> The barriers here should probably be outside of the I/O accessors
> and in the tmio_mmc_pio_irq() function, but I still can't explain
> why they are needed here.
> 
> > It's also possible, that these errors are related to runtime 
> > power-management, which would involve IO to other SoC peripherals. But 
> > they all should also contain barriers, so, this doesn't explain it 
> > immediately either.
> 
> Maybe the runtime-pm code uses __raw_writel or writel_relaxed where it
> should be using writel?

I checked clock code under drivers/sh/clk, I don't see any relevant 
__raw_* operations there. One I do see, however, GIC code does use 
*_relaxed() accessors, but under raw spinlocks... Do those provide 
sufficient barriers?

Thanks
Guennadi

> Or maybe there some effect with disabling the caches and not flushing
> them properly in advance here?
> 
> In any case, I would expect a much more specific changeset text for a
> patch like this.
> 
> 	Arnd
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 18, 2013, 10:11 p.m. UTC | #5
On Monday 18 February 2013, Guennadi Liakhovetski wrote:
> On Mon, 18 Feb 2013, Arnd Bergmann wrote:
> 
> Sorry, I'm not sure how best to describe it, and I don't have sufficient 
> information myself. In any case on a block-diagram of sh73a0 SDHI devices 
> aren't directly connected to a common super-highway bus, instead they are 
> on a bus-splitter. One more thing I forgot to mention - this error has 
> been observed on SMP.

Ah, I guess that explains it. If you have one CPU filling the buffer
and another CPU reading it, you need an smp_wmb/wmp_rmb pair.

That case can easily happen with tmio interrupt handler in PIO
mode.

> I checked clock code under drivers/sh/clk, I don't see any relevant 
> __raw_* operations there. One I do see, however, GIC code does use 
> *_relaxed() accessors, but under raw spinlocks... Do those provide 
> sufficient barriers?

A spinlock by definition contains an SMP barrier, which seems to be
what is missing here. I don't know if that is enough here, because
the spinlock only needs barriers to protect against other users
behind the same spinlock.

In architecture independent code, any smp_wmb() on the writer side
must be paired with an smp_rmb on the reader. You probably have the
sufficient barriers on the side that initiates the I/O because
that uses a dmb() in the MMIO instructions, but that is only
by accident and not portable.

Also, in the uniprocessor case, you have more barriers than you
need, which limits the performance. The best solution for
both performance and readability would be to use readw_relaxed()
and writew_relaxed() and manually add the barriers you actually
need, i.e. two smp_rmb()/smp_wmb() in the PIO case, and a
wmb() before you start a real DMA to the device and an rmb()
after a DMA from the device is complete.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guennadi Liakhovetski Feb. 19, 2013, 9:59 p.m. UTC | #6
On Mon, 18 Feb 2013, Arnd Bergmann wrote:

> On Monday 18 February 2013, Guennadi Liakhovetski wrote:
> > On Mon, 18 Feb 2013, Arnd Bergmann wrote:
> > 
> > Sorry, I'm not sure how best to describe it, and I don't have sufficient 
> > information myself. In any case on a block-diagram of sh73a0 SDHI devices 
> > aren't directly connected to a common super-highway bus, instead they are 
> > on a bus-splitter. One more thing I forgot to mention - this error has 
> > been observed on SMP.
> 
> Ah, I guess that explains it. If you have one CPU filling the buffer
> and another CPU reading it, you need an smp_wmb/wmp_rmb pair.
> 
> That case can easily happen with tmio interrupt handler in PIO
> mode.

Ok, I also managed to reproduce it on UP, and it definitely is related to 
MMC_CAP_POWER_OFF_CARD, but I'll have to do some more work on it. I asked 
Chris to hold back on this patch for now.

Thanks
Guennadi

> > I checked clock code under drivers/sh/clk, I don't see any relevant 
> > __raw_* operations there. One I do see, however, GIC code does use 
> > *_relaxed() accessors, but under raw spinlocks... Do those provide 
> > sufficient barriers?
> 
> A spinlock by definition contains an SMP barrier, which seems to be
> what is missing here. I don't know if that is enough here, because
> the spinlock only needs barriers to protect against other users
> behind the same spinlock.
> 
> In architecture independent code, any smp_wmb() on the writer side
> must be paired with an smp_rmb on the reader. You probably have the
> sufficient barriers on the side that initiates the I/O because
> that uses a dmb() in the MMIO instructions, but that is only
> by accident and not portable.
> 
> Also, in the uniprocessor case, you have more barriers than you
> need, which limits the performance. The best solution for
> both performance and readability would be to use readw_relaxed()
> and writew_relaxed() and manually add the barriers you actually
> need, i.e. two smp_rmb()/smp_wmb() in the PIO case, and a
> wmb() before you start a real DMA to the device and an rmb()
> after a DMA from the device is complete.
> 
> 	Arnd
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
index d857f5c..a10ebd0 100644
--- a/drivers/mmc/host/tmio_mmc.h
+++ b/drivers/mmc/host/tmio_mmc.h
@@ -159,19 +159,20 @@  int tmio_mmc_host_runtime_resume(struct device *dev);
 
 static inline u16 sd_ctrl_read16(struct tmio_mmc_host *host, int addr)
 {
-	return readw(host->ctl + (addr << host->bus_shift));
+	return ioread16(host->ctl + (addr << host->bus_shift));
 }
 
 static inline void sd_ctrl_read16_rep(struct tmio_mmc_host *host, int addr,
 		u16 *buf, int count)
 {
-	readsw(host->ctl + (addr << host->bus_shift), buf, count);
+	wmb();
+	ioread16_rep(host->ctl + (addr << host->bus_shift), buf, count);
 }
 
 static inline u32 sd_ctrl_read32(struct tmio_mmc_host *host, int addr)
 {
-	return readw(host->ctl + (addr << host->bus_shift)) |
-	       readw(host->ctl + ((addr + 2) << host->bus_shift)) << 16;
+	return ioread16(host->ctl + (addr << host->bus_shift)) |
+	       ioread16(host->ctl + ((addr + 2) << host->bus_shift)) << 16;
 }
 
 static inline void sd_ctrl_write16(struct tmio_mmc_host *host, int addr, u16 val)
@@ -181,19 +182,20 @@  static inline void sd_ctrl_write16(struct tmio_mmc_host *host, int addr, u16 val
 	 */
 	if (host->pdata->write16_hook && host->pdata->write16_hook(host, addr))
 		return;
-	writew(val, host->ctl + (addr << host->bus_shift));
+	iowrite16(val, host->ctl + (addr << host->bus_shift));
 }
 
 static inline void sd_ctrl_write16_rep(struct tmio_mmc_host *host, int addr,
 		u16 *buf, int count)
 {
-	writesw(host->ctl + (addr << host->bus_shift), buf, count);
+	iowrite16_rep(host->ctl + (addr << host->bus_shift), buf, count);
+	wmb();
 }
 
 static inline void sd_ctrl_write32(struct tmio_mmc_host *host, int addr, u32 val)
 {
-	writew(val, host->ctl + (addr << host->bus_shift));
-	writew(val >> 16, host->ctl + ((addr + 2) << host->bus_shift));
+	iowrite16(val, host->ctl + (addr << host->bus_shift));
+	iowrite16(val >> 16, host->ctl + ((addr + 2) << host->bus_shift));
 }