diff mbox

[1/3] ARM: Introduce atomic MMIO clear/set

Message ID 1376138582-7550-2-git-send-email-ezequiel.garcia@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ezequiel Garcia Aug. 10, 2013, 12:43 p.m. UTC
Some SoC have MMIO regions that are shared across orthogonal
subsystems. This commit implements a possible solution for the
thread-safe access of such regions through a spinlock-protected API
with clear-set semantics.

Concurrent access is protected with a single spinlock for the
entire MMIO address space. While this protects shared-registers,
it also serializes access to unrelated/unshared registers.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
---
 arch/arm/include/asm/io.h |  5 +++++
 arch/arm/kernel/io.c      | 24 ++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

Comments

Alexander Shiyan Aug. 10, 2013, 12:49 p.m. UTC | #1
> Some SoC have MMIO regions that are shared across orthogonal
> subsystems. This commit implements a possible solution for the
> thread-safe access of such regions through a spinlock-protected API
> with clear-set semantics.
> 
> Concurrent access is protected with a single spinlock for the
> entire MMIO address space. While this protects shared-registers,
> it also serializes access to unrelated/unshared registers.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> ---
>  arch/arm/include/asm/io.h |  5 +++++
>  arch/arm/kernel/io.c      | 24 ++++++++++++++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index d070741..c84658d 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -36,6 +36,11 @@
>  #define isa_bus_to_virt phys_to_virt
>  
>  /*
> + * Atomic MMIO-wide IO clear/set
> + */
> +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> +
> +/*
>   * Generic IO read/write.  These perform native-endian accesses.  Note
>   * that some architectures will want to re-define __raw_{read,write}w.
>   */
> diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> index dcd5b4d..3ab8201 100644
> --- a/arch/arm/kernel/io.c
> +++ b/arch/arm/kernel/io.c
> @@ -1,6 +1,30 @@
>  #include <linux/export.h>
>  #include <linux/types.h>
>  #include <linux/io.h>
> +#include <linux/spinlock.h>
> +
> +static DEFINE_SPINLOCK(__io_lock);
> +
> +/*
> + * Some platforms have MMIO regions that are shared across orthogonal
> + * subsystems. This API implements thread-safe access to such regions
> + * through a spinlock-protected API with clear-set semantics.
> + *
> + * Concurrent access is protected with a single spinlock for the entire MMIO
> + * address space. While this protects shared-registers, it also serializes
> + * access to unrelated/unshared registers.
> + *
> + * Using this API on frequently accessed registers in performance-critical
> + * paths is not recommended, as the spinlock used by this API would become
> + * highly contended.
> + */
> +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> +{
> +	spin_lock(&__io_lock);
> +	writel((readl(reg) & ~clear) | set, reg);
> +	spin_unlock(&__io_lock);
> +}
> +EXPORT_SYMBOL(atomic_io_clear_set);

So, one lock is used to all possible registers?
Seems a regmap-mmio can be used for such access.

---
Ezequiel Garcia Aug. 10, 2013, 2:02 p.m. UTC | #2
On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > Some SoC have MMIO regions that are shared across orthogonal
> > subsystems. This commit implements a possible solution for the
> > thread-safe access of such regions through a spinlock-protected API
> > with clear-set semantics.
> > 
> > Concurrent access is protected with a single spinlock for the
> > entire MMIO address space. While this protects shared-registers,
> > it also serializes access to unrelated/unshared registers.
> > 
> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > ---
> >  arch/arm/include/asm/io.h |  5 +++++
> >  arch/arm/kernel/io.c      | 24 ++++++++++++++++++++++++
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > index d070741..c84658d 100644
> > --- a/arch/arm/include/asm/io.h
> > +++ b/arch/arm/include/asm/io.h
> > @@ -36,6 +36,11 @@
> >  #define isa_bus_to_virt phys_to_virt
> >  
> >  /*
> > + * Atomic MMIO-wide IO clear/set
> > + */
> > +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> > +
> > +/*
> >   * Generic IO read/write.  These perform native-endian accesses.  Note
> >   * that some architectures will want to re-define __raw_{read,write}w.
> >   */
> > diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> > index dcd5b4d..3ab8201 100644
> > --- a/arch/arm/kernel/io.c
> > +++ b/arch/arm/kernel/io.c
> > @@ -1,6 +1,30 @@
> >  #include <linux/export.h>
> >  #include <linux/types.h>
> >  #include <linux/io.h>
> > +#include <linux/spinlock.h>
> > +
> > +static DEFINE_SPINLOCK(__io_lock);
> > +
> > +/*
> > + * Some platforms have MMIO regions that are shared across orthogonal
> > + * subsystems. This API implements thread-safe access to such regions
> > + * through a spinlock-protected API with clear-set semantics.
> > + *
> > + * Concurrent access is protected with a single spinlock for the entire MMIO
> > + * address space. While this protects shared-registers, it also serializes
> > + * access to unrelated/unshared registers.
> > + *
> > + * Using this API on frequently accessed registers in performance-critical
> > + * paths is not recommended, as the spinlock used by this API would become
> > + * highly contended.
> > + */
> > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > +{
> > +	spin_lock(&__io_lock);
> > +	writel((readl(reg) & ~clear) | set, reg);
> > +	spin_unlock(&__io_lock);
> > +}
> > +EXPORT_SYMBOL(atomic_io_clear_set);
> 
> So, one lock is used to all possible registers?
> Seems a regmap-mmio can be used for such access.
> 

Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.

However, after reading some code, I fail to see how that helps in this case.

Note that we need to access the *same* MMIO address from completely
different (and unrelated) drivers, such as watchdog and clocksource.

So I wonder who would "own" the regmap descriptor, and how does the other
one gets aware of that descriptor?

In addition given we can use orion_wdt (originally meant for mach-kirkwood)
to support mvebu SoC watchdog, we need to sort this out in a completely
multiplatform capable way.

Ideas?
Ezequiel Garcia Aug. 10, 2013, 2:09 p.m. UTC | #3
On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > Some SoC have MMIO regions that are shared across orthogonal
> > > subsystems. This commit implements a possible solution for the
> > > thread-safe access of such regions through a spinlock-protected API
> > > with clear-set semantics.
> > > 
> > > Concurrent access is protected with a single spinlock for the
> > > entire MMIO address space. While this protects shared-registers,
> > > it also serializes access to unrelated/unshared registers.
> > > 
> > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > > ---
> > >  arch/arm/include/asm/io.h |  5 +++++
> > >  arch/arm/kernel/io.c      | 24 ++++++++++++++++++++++++
> > >  2 files changed, 29 insertions(+)
> > > 
> > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > > index d070741..c84658d 100644
> > > --- a/arch/arm/include/asm/io.h
> > > +++ b/arch/arm/include/asm/io.h
> > > @@ -36,6 +36,11 @@
> > >  #define isa_bus_to_virt phys_to_virt
> > >  
> > >  /*
> > > + * Atomic MMIO-wide IO clear/set
> > > + */
> > > +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> > > +
> > > +/*
> > >   * Generic IO read/write.  These perform native-endian accesses.  Note
> > >   * that some architectures will want to re-define __raw_{read,write}w.
> > >   */
> > > diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> > > index dcd5b4d..3ab8201 100644
> > > --- a/arch/arm/kernel/io.c
> > > +++ b/arch/arm/kernel/io.c
> > > @@ -1,6 +1,30 @@
> > >  #include <linux/export.h>
> > >  #include <linux/types.h>
> > >  #include <linux/io.h>
> > > +#include <linux/spinlock.h>
> > > +
> > > +static DEFINE_SPINLOCK(__io_lock);
> > > +
> > > +/*
> > > + * Some platforms have MMIO regions that are shared across orthogonal
> > > + * subsystems. This API implements thread-safe access to such regions
> > > + * through a spinlock-protected API with clear-set semantics.
> > > + *
> > > + * Concurrent access is protected with a single spinlock for the entire MMIO
> > > + * address space. While this protects shared-registers, it also serializes
> > > + * access to unrelated/unshared registers.
> > > + *
> > > + * Using this API on frequently accessed registers in performance-critical
> > > + * paths is not recommended, as the spinlock used by this API would become
> > > + * highly contended.
> > > + */
> > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > +{
> > > +	spin_lock(&__io_lock);
> > > +	writel((readl(reg) & ~clear) | set, reg);
> > > +	spin_unlock(&__io_lock);
> > > +}
> > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > 
> > So, one lock is used to all possible registers?
> > Seems a regmap-mmio can be used for such access.
> > 
> 
> Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> 
> However, after reading some code, I fail to see how that helps in this case.
> 
> Note that we need to access the *same* MMIO address from completely
> different (and unrelated) drivers, such as watchdog and clocksource.
> 
> So I wonder who would "own" the regmap descriptor, and how does the other
> one gets aware of that descriptor?
> 
> In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> to support mvebu SoC watchdog, we need to sort this out in a completely
> multiplatform capable way.
> 
> Ideas?

Answering myself...

How about using drivers/mfd/syscon.c to create the regmap owner for the shared
register (TIMER_CTRL in this case, but others might appear) ?

Or adding a new mfd implementation if syscon does not fit ?

Does this sound like an overkill ?
Alexander Shiyan Aug. 10, 2013, 3:43 p.m. UTC | #4
> On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > subsystems. This commit implements a possible solution for the
> > > > thread-safe access of such regions through a spinlock-protected API
> > > > with clear-set semantics.
> > > > 
> > > > Concurrent access is protected with a single spinlock for the
> > > > entire MMIO address space. While this protects shared-registers,
> > > > it also serializes access to unrelated/unshared registers.
[...]
> > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > +{
> > > > +	spin_lock(&__io_lock);
> > > > +	writel((readl(reg) & ~clear) | set, reg);
> > > > +	spin_unlock(&__io_lock);
> > > > +}
> > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > > 
> > > So, one lock is used to all possible registers?
> > > Seems a regmap-mmio can be used for such access.
> > > 
> > 
> > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> > 
> > However, after reading some code, I fail to see how that helps in this case.
> > 
> > Note that we need to access the *same* MMIO address from completely
> > different (and unrelated) drivers, such as watchdog and clocksource.
> > 
> > So I wonder who would "own" the regmap descriptor, and how does the other
> > one gets aware of that descriptor?
> > 
> > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > to support mvebu SoC watchdog, we need to sort this out in a completely
> > multiplatform capable way.
> > 
> > Ideas?
> 
> Answering myself...
> 
> How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> register (TIMER_CTRL in this case, but others might appear) ?
> 
> Or adding a new mfd implementation if syscon does not fit ?
> 
> Does this sound like an overkill ?

Yes, syscon is designed especially for such cases.

---
Ezequiel Garcia Aug. 10, 2013, 3:55 p.m. UTC | #5
On Sat, Aug 10, 2013 at 07:43:08PM +0400, Alexander Shiyan wrote:
> > On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > > subsystems. This commit implements a possible solution for the
> > > > > thread-safe access of such regions through a spinlock-protected API
> > > > > with clear-set semantics.
> > > > > 
> > > > > Concurrent access is protected with a single spinlock for the
> > > > > entire MMIO address space. While this protects shared-registers,
> > > > > it also serializes access to unrelated/unshared registers.
> [...]
> > > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > > +{
> > > > > +	spin_lock(&__io_lock);
> > > > > +	writel((readl(reg) & ~clear) | set, reg);
> > > > > +	spin_unlock(&__io_lock);
> > > > > +}
> > > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > > > 
> > > > So, one lock is used to all possible registers?
> > > > Seems a regmap-mmio can be used for such access.
> > > > 
> > > 
> > > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> > > 
> > > However, after reading some code, I fail to see how that helps in this case.
> > > 
> > > Note that we need to access the *same* MMIO address from completely
> > > different (and unrelated) drivers, such as watchdog and clocksource.
> > > 
> > > So I wonder who would "own" the regmap descriptor, and how does the other
> > > one gets aware of that descriptor?
> > > 
> > > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > > to support mvebu SoC watchdog, we need to sort this out in a completely
> > > multiplatform capable way.
> > > 
> > > Ideas?
> > 
> > Answering myself...
> > 
> > How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> > register (TIMER_CTRL in this case, but others might appear) ?
> > 
> > Or adding a new mfd implementation if syscon does not fit ?
> > 
> > Does this sound like an overkill ?
> 
> Yes, syscon is designed especially for such cases.
> 

Indeed, syscon looks like a nice match for this use case.
(although it still looks like an overkill to me).

I've been trying to implement a working solution based in syscon but I'm
unable to overcome an issue.

The problem is that we need the register/regmap to initialize the clocksource
driver for this machine (aka the timer). Of course, this happens at a
*very* early point, way before the syscon driver is available... :-(

Maybe someone has an idea?
Ezequiel Garcia Aug. 12, 2013, 3:46 p.m. UTC | #6
On Sat, Aug 10, 2013 at 12:55:53PM -0300, Ezequiel Garcia wrote:
> On Sat, Aug 10, 2013 at 07:43:08PM +0400, Alexander Shiyan wrote:
> > > On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > > > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > > > subsystems. This commit implements a possible solution for the
> > > > > > thread-safe access of such regions through a spinlock-protected API
> > > > > > with clear-set semantics.
> > > > > > 
> > > > > > Concurrent access is protected with a single spinlock for the
> > > > > > entire MMIO address space. While this protects shared-registers,
> > > > > > it also serializes access to unrelated/unshared registers.
> > [...]
> > > > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > > > +{
> > > > > > +	spin_lock(&__io_lock);
> > > > > > +	writel((readl(reg) & ~clear) | set, reg);
> > > > > > +	spin_unlock(&__io_lock);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > > > > 
> > > > > So, one lock is used to all possible registers?
> > > > > Seems a regmap-mmio can be used for such access.
> > > > > 
> > > > 
> > > > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> > > > 
> > > > However, after reading some code, I fail to see how that helps in this case.
> > > > 
> > > > Note that we need to access the *same* MMIO address from completely
> > > > different (and unrelated) drivers, such as watchdog and clocksource.
> > > > 
> > > > So I wonder who would "own" the regmap descriptor, and how does the other
> > > > one gets aware of that descriptor?
> > > > 
> > > > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > > > to support mvebu SoC watchdog, we need to sort this out in a completely
> > > > multiplatform capable way.
> > > > 
> > > > Ideas?
> > > 
> > > Answering myself...
> > > 
> > > How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> > > register (TIMER_CTRL in this case, but others might appear) ?
> > > 
> > > Or adding a new mfd implementation if syscon does not fit ?
> > > 
> > > Does this sound like an overkill ?
> > 
> > Yes, syscon is designed especially for such cases.
> > 
> 
> Indeed, syscon looks like a nice match for this use case.
> (although it still looks like an overkill to me).
> 
> I've been trying to implement a working solution based in syscon but I'm
> unable to overcome an issue.
> 
> The problem is that we need the register/regmap to initialize the clocksource
> driver for this machine (aka the timer). Of course, this happens at a
> *very* early point, way before the syscon driver is available... :-(
> 
> Maybe someone has an idea?

Sebastian, Russell: I can't find the previous mail where you proposed
this solution to address the shared register issue between Kirkwood's
watchdog and clocksource.

Is this what you had in mind?

Do you think trying to use a regmap could be better (given we can
sort out the problem explained above)?
Sebastian Hesselbarth Aug. 12, 2013, 4:44 p.m. UTC | #7
On 08/12/13 17:46, Ezequiel Garcia wrote:
>> Indeed, syscon looks like a nice match for this use case.
>> (although it still looks like an overkill to me).
>>
>> I've been trying to implement a working solution based in syscon but I'm
>> unable to overcome an issue.
>>
>> The problem is that we need the register/regmap to initialize the clocksource
>> driver for this machine (aka the timer). Of course, this happens at a
>> *very* early point, way before the syscon driver is available... :-(
>>
>> Maybe someone has an idea?
>
> Sebastian, Russell: I can't find the previous mail where you proposed
> this solution to address the shared register issue between Kirkwood's
> watchdog and clocksource.

Russell first mentioned an atomic modify function here:
http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html

Linus Walleij already suggested mfd/syscon as a container
for protected register accesses later.

> Is this what you had in mind?

The pro of a generic atomic clear/set is that we can use it
very early, on all platforms, and from totally unrelated
drivers. As you already mentioned, using syscon from timers will
get us into into trouble, because it has not been registered.

> Do you think trying to use a regmap could be better (given we can
> sort out the problem explained above)?

Given the small number of registers we need to protect and especially
for using it in timers, I'd prefer your proposal. Otherwise, I guess,
we would have to mimic mfd/syscon for time-orion and time-armada-370-xp
and make wdt-orion depend on it. I doubt we can make any use of
mfd/syscon for the timer use case.

Sebastian
Ezequiel Garcia Aug. 12, 2013, 5:09 p.m. UTC | #8
Sebastian,

On Mon, Aug 12, 2013 at 06:44:10PM +0200, Sebastian Hesselbarth wrote:
> On 08/12/13 17:46, Ezequiel Garcia wrote:
> >> Indeed, syscon looks like a nice match for this use case.
> >> (although it still looks like an overkill to me).
> >>
> >> I've been trying to implement a working solution based in syscon but I'm
> >> unable to overcome an issue.
> >>
> >> The problem is that we need the register/regmap to initialize the clocksource
> >> driver for this machine (aka the timer). Of course, this happens at a
> >> *very* early point, way before the syscon driver is available... :-(
> >>
> >> Maybe someone has an idea?
> >
> > Sebastian, Russell: I can't find the previous mail where you proposed
> > this solution to address the shared register issue between Kirkwood's
> > watchdog and clocksource.
> 
> Russell first mentioned an atomic modify function here:
> http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html
> 

Thanks a lot for finding this thread. I see we all just went through the
same line of reasoning.

> 
> The pro of a generic atomic clear/set is that we can use it
> very early, on all platforms, and from totally unrelated
> drivers. As you already mentioned, using syscon from timers will
> get us into into trouble, because it has not been registered.
> 

Yes, indeed.

> > Do you think trying to use a regmap could be better (given we can
> > sort out the problem explained above)?
> 
> Given the small number of registers we need to protect and especially
> for using it in timers, I'd prefer your proposal. Otherwise, I guess,
> we would have to mimic mfd/syscon for time-orion and time-armada-370-xp
> and make wdt-orion depend on it. I doubt we can make any use of
> mfd/syscon for the timer use case.
> 

Then I think we all agree here. Just to confirm:

  * The proposed API is almost exactly the one proposed by Russell
    in the mail you just mentioned:
    http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html

  * Linus Walleij suggested mfd/syscon, but Russell, Mark and Linus
    itself seem to agree it's more heavy-weight than necessary.
    http://archive.arm.linux.org.uk/lurker/message/20130618.151116.712407e0.en.html
    http://archive.arm.linux.org.uk/lurker/message/20130618.183359.a6184b7f.en.html
    http://archive.arm.linux.org.uk/lurker/message/20130618.152300.bffa038f.en.html

The only open question is: given there's nothing arch-dependent in this
mechanism, should we keep this in arch/arm/kernel? And if not, where
should we move this?
Will Deacon Aug. 12, 2013, 6:29 p.m. UTC | #9
On Sat, Aug 10, 2013 at 01:43:00PM +0100, Ezequiel Garcia wrote:
> Some SoC have MMIO regions that are shared across orthogonal
> subsystems. This commit implements a possible solution for the
> thread-safe access of such regions through a spinlock-protected API
> with clear-set semantics.
> 
> Concurrent access is protected with a single spinlock for the
> entire MMIO address space. While this protects shared-registers,
> it also serializes access to unrelated/unshared registers.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>

[...]

> +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> +{
> +	spin_lock(&__io_lock);
> +	writel((readl(reg) & ~clear) | set, reg);
> +	spin_unlock(&__io_lock);
> +}

I appreciate that you've lifted this code from a previous driver, but this
doesn't really make any sense to me. The spin_unlock operation is
essentially a store to normal, cacheable memory, whilst the writel is an
__iowmb followed by a store to device memory.

This means that you don't have ordering guarantees between the two accesses
outside of the CPU, potentially giving you:

	spin_lock(&__io_lock);
	spin_unlock(&__io_lock);
	writel((readl(reg) & ~clear) | set, reg);

which is probably not what you want.

I suggest adding an iowmb after the writel if you really need this ordering
to be enforced (but this may have a significant performance impact,
depending on your SoC).

Will
Ezequiel Garcia Aug. 19, 2013, 4:59 p.m. UTC | #10
On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> On Sat, Aug 10, 2013 at 01:43:00PM +0100, Ezequiel Garcia wrote:
> > Some SoC have MMIO regions that are shared across orthogonal
> > subsystems. This commit implements a possible solution for the
> > thread-safe access of such regions through a spinlock-protected API
> > with clear-set semantics.
> > 
> > Concurrent access is protected with a single spinlock for the
> > entire MMIO address space. While this protects shared-registers,
> > it also serializes access to unrelated/unshared registers.
> > 
> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> 
> [...]
> 
> > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > +{
> > +	spin_lock(&__io_lock);
> > +	writel((readl(reg) & ~clear) | set, reg);
> > +	spin_unlock(&__io_lock);
> > +}
> 
> I appreciate that you've lifted this code from a previous driver, but this
> doesn't really make any sense to me. The spin_unlock operation is
> essentially a store to normal, cacheable memory, whilst the writel is an
> __iowmb followed by a store to device memory.
> 
> This means that you don't have ordering guarantees between the two accesses
> outside of the CPU, potentially giving you:
> 
> 	spin_lock(&__io_lock);
> 	spin_unlock(&__io_lock);
> 	writel((readl(reg) & ~clear) | set, reg);
> 
> which is probably not what you want.
> 
> I suggest adding an iowmb after the writel if you really need this ordering
> to be enforced (but this may have a significant performance impact,
> depending on your SoC).
> 

I don't want to argue with you, given I have zero knowledge about this
ordering issue. However let me ask you a question.

In arch/arm/include/asm/spinlock.h I'm seeing this comment:

""ARMv6 ticket-based spin-locking.
A memory barrier is required after we get a lock, and before we
release it, because V6 CPUs are assumed to have weakly ordered
memory.""

and also:

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
	smp_mb();
	lock->tickets.owner++;
	dsb_sev();
}

So, knowing this atomic API should work for every ARMv{N}, and not being very
sure what the call to dsb_sev() does. Would you care to explain how the above
is *not* enough to guarantee a memory barrier before the spin unlocking?

Thanks!
Matt Sealey Aug. 20, 2013, 2:32 p.m. UTC | #11
On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
<ezequiel.garcia@free-electrons.com> wrote:
> On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
>
>> This means that you don't have ordering guarantees between the two accesses
>> outside of the CPU, potentially giving you:
>>
>>       spin_lock(&__io_lock);
>>       spin_unlock(&__io_lock);
>>       writel((readl(reg) & ~clear) | set, reg);
>>
>> which is probably not what you want.
>>
>> I suggest adding an iowmb after the writel if you really need this ordering
>> to be enforced (but this may have a significant performance impact,
>> depending on your SoC).
>
> I don't want to argue with you, given I have zero knowledge about this
> ordering issue. However let me ask you a question.
>
> In arch/arm/include/asm/spinlock.h I'm seeing this comment:
>
> ""ARMv6 ticket-based spin-locking.
> A memory barrier is required after we get a lock, and before we
> release it, because V6 CPUs are assumed to have weakly ordered
> memory.""
>
> and also:
>
> static inline void arch_spin_unlock(arch_spinlock_t *lock)
> {
>         smp_mb();
>         lock->tickets.owner++;
>         dsb_sev();
> }
>
> So, knowing this atomic API should work for every ARMv{N}, and not being very
> sure what the call to dsb_sev() does. Would you care to explain how the above
> is *not* enough to guarantee a memory barrier before the spin unlocking?

arch_spin_[un]lock as an API is not guaranteed to use a barrier before
or after doing anything, even if this particular implementation does.

dsb_sev() is an SMP helper which does a synchronization barrier and
then sends events to other CPUs which informs them of the unlock. If
the other CPUs were in WFE state waiting on that spinlock, they can
now thunder in and attempt to lock it themselves. It's not really
relevant to the discussion as arch_spin_unlock is not guaranteed to
perform a barrier before returning.

On some other architecture there may be ISA additions which make
locking barrier-less, or on a specific implementation of an ARM
architecture SoC whereby there may be a bank of "hardware spinlocks"
available.

So, in this sense, you shouldn't rely on implementation-specific
behaviors of a function. If you need to be sure C follows B follows A,
insert a barrier yourself. Don't expect A to barrier for you just
because you saw some source code that does it today, as tomorrow it
may be different. It's not an optimization, just a potential source of
new bugs if the implementation changes underneath you later.

--
Matt
Ezequiel Garcia Aug. 20, 2013, 2:52 p.m. UTC | #12
On Tue, Aug 20, 2013 at 09:32:13AM -0500, Matt Sealey wrote:
> On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
> <ezequiel.garcia@free-electrons.com> wrote:
> > On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> >
> >> This means that you don't have ordering guarantees between the two accesses
> >> outside of the CPU, potentially giving you:
> >>
> >>       spin_lock(&__io_lock);
> >>       spin_unlock(&__io_lock);
> >>       writel((readl(reg) & ~clear) | set, reg);
> >>
> >> which is probably not what you want.
> >>
> >> I suggest adding an iowmb after the writel if you really need this ordering
> >> to be enforced (but this may have a significant performance impact,
> >> depending on your SoC).
> >
> > I don't want to argue with you, given I have zero knowledge about this
> > ordering issue. However let me ask you a question.
> >
> > In arch/arm/include/asm/spinlock.h I'm seeing this comment:
> >
> > ""ARMv6 ticket-based spin-locking.
> > A memory barrier is required after we get a lock, and before we
> > release it, because V6 CPUs are assumed to have weakly ordered
> > memory.""
> >
> > and also:
> >
> > static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > {
> >         smp_mb();
> >         lock->tickets.owner++;
> >         dsb_sev();
> > }
> >
> > So, knowing this atomic API should work for every ARMv{N}, and not being very
> > sure what the call to dsb_sev() does. Would you care to explain how the above
> > is *not* enough to guarantee a memory barrier before the spin unlocking?
> 
> arch_spin_[un]lock as an API is not guaranteed to use a barrier before
> or after doing anything, even if this particular implementation does.
> 
> dsb_sev() is an SMP helper which does a synchronization barrier and
> then sends events to other CPUs which informs them of the unlock. If
> the other CPUs were in WFE state waiting on that spinlock, they can
> now thunder in and attempt to lock it themselves. It's not really
> relevant to the discussion as arch_spin_unlock is not guaranteed to
> perform a barrier before returning.
> 
> On some other architecture there may be ISA additions which make
> locking barrier-less, or on a specific implementation of an ARM
> architecture SoC whereby there may be a bank of "hardware spinlocks"
> available.
> 
> So, in this sense, you shouldn't rely on implementation-specific
> behaviors of a function. If you need to be sure C follows B follows A,
> insert a barrier yourself. Don't expect A to barrier for you just
> because you saw some source code that does it today, as tomorrow it
> may be different. It's not an optimization, just a potential source of
> new bugs if the implementation changes underneath you later.
> 

Of course. I agree completely.

Thanks a lot,
Will Deacon Aug. 20, 2013, 3:04 p.m. UTC | #13
On Tue, Aug 20, 2013 at 03:52:00PM +0100, Ezequiel Garcia wrote:
> On Tue, Aug 20, 2013 at 09:32:13AM -0500, Matt Sealey wrote:
> > On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
> > <ezequiel.garcia@free-electrons.com> wrote:
> > > On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> > >> I suggest adding an iowmb after the writel if you really need this ordering
> > >> to be enforced (but this may have a significant performance impact,
> > >> depending on your SoC).
> > >
> > > I don't want to argue with you, given I have zero knowledge about this
> > > ordering issue. However let me ask you a question.
> > >
> > > In arch/arm/include/asm/spinlock.h I'm seeing this comment:
> > >
> > > ""ARMv6 ticket-based spin-locking.
> > > A memory barrier is required after we get a lock, and before we
> > > release it, because V6 CPUs are assumed to have weakly ordered
> > > memory.""
> > >
> > > and also:
> > >
> > > static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > {
> > >         smp_mb();
> > >         lock->tickets.owner++;
> > >         dsb_sev();
> > > }
> > >
> > > So, knowing this atomic API should work for every ARMv{N}, and not being very
> > > sure what the call to dsb_sev() does. Would you care to explain how the above
> > > is *not* enough to guarantee a memory barrier before the spin unlocking?
> > 
> > arch_spin_[un]lock as an API is not guaranteed to use a barrier before
> > or after doing anything, even if this particular implementation does.

[...]

> Of course. I agree completely.

Well, even if the barrier was guaranteed by the API, it's still not
sufficient to ensure ordering between two different memory types. For
example, on Cortex-A9 with PL310, you would also need to perform an
outer_sync() operation before the unlock.

Will
diff mbox

Patch

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index d070741..c84658d 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -36,6 +36,11 @@ 
 #define isa_bus_to_virt phys_to_virt
 
 /*
+ * Atomic MMIO-wide IO clear/set
+ */
+extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
+
+/*
  * Generic IO read/write.  These perform native-endian accesses.  Note
  * that some architectures will want to re-define __raw_{read,write}w.
  */
diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
index dcd5b4d..3ab8201 100644
--- a/arch/arm/kernel/io.c
+++ b/arch/arm/kernel/io.c
@@ -1,6 +1,30 @@ 
 #include <linux/export.h>
 #include <linux/types.h>
 #include <linux/io.h>
+#include <linux/spinlock.h>
+
+static DEFINE_SPINLOCK(__io_lock);
+
+/*
+ * Some platforms have MMIO regions that are shared across orthogonal
+ * subsystems. This API implements thread-safe access to such regions
+ * through a spinlock-protected API with clear-set semantics.
+ *
+ * Concurrent access is protected with a single spinlock for the entire MMIO
+ * address space. While this protects shared-registers, it also serializes
+ * access to unrelated/unshared registers.
+ *
+ * Using this API on frequently accessed registers in performance-critical
+ * paths is not recommended, as the spinlock used by this API would become
+ * highly contended.
+ */
+void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
+{
+	spin_lock(&__io_lock);
+	writel((readl(reg) & ~clear) | set, reg);
+	spin_unlock(&__io_lock);
+}
+EXPORT_SYMBOL(atomic_io_clear_set);
 
 /*
  * Copy data from IO memory space to "real" memory space.