diff mbox

[RFC,2/2] arm: implementation for HARDENED_ATOMIC

Message ID 1476802761-24340-3-git-send-email-colin@cvidal.org (mailing list archive)
State New, archived
Headers show

Commit Message

Colin Vidal Oct. 18, 2016, 2:59 p.m. UTC
This adds arm-specific code in order to support HARDENED_ATOMIC
feature. When overflow is detected in atomic_t, atomic64_t or
atomic_long_t, an exception is raised and call
hardened_atomic_overflow.

Signed-off-by: Colin Vidal <colin@cvidal.org>
---
 arch/arm/Kconfig              |   1 +
 arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
 arch/arm/mm/fault.c           |  15 ++
 3 files changed, 320 insertions(+), 130 deletions(-)

Comments

Kees Cook Oct. 18, 2016, 9:29 p.m. UTC | #1
On Tue, Oct 18, 2016 at 7:59 AM, Colin Vidal <colin@cvidal.org> wrote:
> This adds arm-specific code in order to support HARDENED_ATOMIC
> feature. When overflow is detected in atomic_t, atomic64_t or
> atomic_long_t, an exception is raised and call
> hardened_atomic_overflow.

Can you include some notes that this was originally in PaX/grsecurity,
and detail what is different from their implemention?

>
> Signed-off-by: Colin Vidal <colin@cvidal.org>
> ---
>  arch/arm/Kconfig              |   1 +
>  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
>  arch/arm/mm/fault.c           |  15 ++
>  3 files changed, 320 insertions(+), 130 deletions(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index b5d529f..fcf4a64 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -36,6 +36,7 @@ config ARM
>         select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
>         select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
>         select HAVE_ARCH_HARDENED_USERCOPY
> +       select HAVE_ARCH_HARDENED_ATOMIC
>         select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
>         select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
>         select HAVE_ARCH_MMAP_RND_BITS if MMU
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index 66d0e21..fdaee17 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -17,18 +17,52 @@
>  #include <linux/irqflags.h>
>  #include <asm/barrier.h>
>  #include <asm/cmpxchg.h>
> +#include <linux/bug.h>
>
>  #define ATOMIC_INIT(i) { (i) }
>
>  #ifdef __KERNEL__
>
> +#ifdef CONFIG_HARDENED_ATOMIC
> +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"

In PaX, I see a check for THUMB2 config:

#ifdef CONFIG_THUMB2_KERNEL
#define REFCOUNT_TRAP_INSN "bkpt        0xf1"
#else
#define REFCOUNT_TRAP_INSN "bkpt        0xf103"
#endif

That should probably stay unless I'm misunderstanding something. Also,
for your new ISNS define name, I'd leave "TRAP" in the name, since
that describes more clearly what it does.

Beyond these things, it looks great to me -- though I'm hardly an ARM expert. :)

Were you able to test on ARM with this for overflow with the lkdtm tests?

-Kees
Colin Vidal Oct. 19, 2016, 8:45 a.m. UTC | #2
Hi Kees,

> > This adds arm-specific code in order to support HARDENED_ATOMIC
> > feature. When overflow is detected in atomic_t, atomic64_t or
> > atomic_long_t, an exception is raised and call
> > hardened_atomic_overflow.
> 
> Can you include some notes that this was originally in PaX/grsecurity,
> and detail what is different from their implemention?

Of course. I add it in the next version.

> > Signed-off-by: Colin Vidal <colin@cvidal.org>
> > ---
> >  arch/arm/Kconfig              |   1 +
> >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
> >  arch/arm/mm/fault.c           |  15 ++
> >  3 files changed, 320 insertions(+), 130 deletions(-)
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index b5d529f..fcf4a64 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -36,6 +36,7 @@ config ARM
> >         select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
> >         select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> >         select HAVE_ARCH_HARDENED_USERCOPY
> > +       select HAVE_ARCH_HARDENED_ATOMIC
> >         select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> >         select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> >         select HAVE_ARCH_MMAP_RND_BITS if MMU
> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > index 66d0e21..fdaee17 100644
> > --- a/arch/arm/include/asm/atomic.h
> > +++ b/arch/arm/include/asm/atomic.h
> > @@ -17,18 +17,52 @@
> >  #include <linux/irqflags.h>
> >  #include <asm/barrier.h>
> >  #include <asm/cmpxchg.h>
> > +#include <linux/bug.h>
> > 
> >  #define ATOMIC_INIT(i) { (i) }
> > 
> >  #ifdef __KERNEL__
> > 
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> 
> In PaX, I see a check for THUMB2 config:
> 
> #ifdef CONFIG_THUMB2_KERNEL
> #define REFCOUNT_TRAP_INSN "bkpt        0xf1"
> #else
> #define REFCOUNT_TRAP_INSN "bkpt        0xf103"
> #endif
> 
> That should probably stay unless I'm misunderstanding something. Also,
> for your new ISNS define name, I'd leave "TRAP" in the name, since
> that describes more clearly what it does.

Oh yeah. I will add it. Actually I does not add it at first since I
does not really understand why there is a special case for Thumbs2 (as
far I understand, instructions size can also be 4 bytes). If ARM
experts are around, I would appreciate pointers about it :-)

> Beyond these things, it looks great to me -- though I'm hardly an ARM expert. :)
> 
> Were you able to test on ARM with this for overflow with the lkdtm tests?

Yep. I have to make more thorough tests, but things like

    echo HARDENED_ATOMIC_OVERFLOW > <debugfsmountpoint>/provoke-crash/DIRECT

raise the exception as well (with hardened message and stack trace). I
tested it with armv7/qemu.

Thanks!

Colin
Kees Cook Oct. 19, 2016, 8:11 p.m. UTC | #3
On Wed, Oct 19, 2016 at 1:45 AM, Colin Vidal <colin@cvidal.org> wrote:
> Hi Kees,
>
>> > This adds arm-specific code in order to support HARDENED_ATOMIC
>> > feature. When overflow is detected in atomic_t, atomic64_t or
>> > atomic_long_t, an exception is raised and call
>> > hardened_atomic_overflow.
>>
>> Can you include some notes that this was originally in PaX/grsecurity,
>> and detail what is different from their implemention?
>
> Of course. I add it in the next version.
>
>> > Signed-off-by: Colin Vidal <colin@cvidal.org>
>> > ---
>> >  arch/arm/Kconfig              |   1 +
>> >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
>> >  arch/arm/mm/fault.c           |  15 ++
>> >  3 files changed, 320 insertions(+), 130 deletions(-)
>> >
>> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> > index b5d529f..fcf4a64 100644
>> > --- a/arch/arm/Kconfig
>> > +++ b/arch/arm/Kconfig
>> > @@ -36,6 +36,7 @@ config ARM
>> >         select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
>> >         select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
>> >         select HAVE_ARCH_HARDENED_USERCOPY
>> > +       select HAVE_ARCH_HARDENED_ATOMIC
>> >         select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
>> >         select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
>> >         select HAVE_ARCH_MMAP_RND_BITS if MMU
>> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
>> > index 66d0e21..fdaee17 100644
>> > --- a/arch/arm/include/asm/atomic.h
>> > +++ b/arch/arm/include/asm/atomic.h
>> > @@ -17,18 +17,52 @@
>> >  #include <linux/irqflags.h>
>> >  #include <asm/barrier.h>
>> >  #include <asm/cmpxchg.h>
>> > +#include <linux/bug.h>
>> >
>> >  #define ATOMIC_INIT(i) { (i) }
>> >
>> >  #ifdef __KERNEL__
>> >
>> > +#ifdef CONFIG_HARDENED_ATOMIC
>> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
>>
>> In PaX, I see a check for THUMB2 config:
>>
>> #ifdef CONFIG_THUMB2_KERNEL
>> #define REFCOUNT_TRAP_INSN "bkpt        0xf1"
>> #else
>> #define REFCOUNT_TRAP_INSN "bkpt        0xf103"
>> #endif
>>
>> That should probably stay unless I'm misunderstanding something. Also,
>> for your new ISNS define name, I'd leave "TRAP" in the name, since
>> that describes more clearly what it does.
>
> Oh yeah. I will add it. Actually I does not add it at first since I
> does not really understand why there is a special case for Thumbs2 (as
> far I understand, instructions size can also be 4 bytes). If ARM
> experts are around, I would appreciate pointers about it :-)

Cool. Perhaps Takahiro or Mark (now on CC) can comment on it.

>> Beyond these things, it looks great to me -- though I'm hardly an ARM expert. :)
>>
>> Were you able to test on ARM with this for overflow with the lkdtm tests?
>
> Yep. I have to make more thorough tests, but things like
>
>     echo HARDENED_ATOMIC_OVERFLOW > <debugfsmountpoint>/provoke-crash/DIRECT
>
> raise the exception as well (with hardened message and stack trace). I
> tested it with armv7/qemu.

Great!

-Kees
AKASHI Takahiro Oct. 20, 2016, 5:58 a.m. UTC | #4
On Wed, Oct 19, 2016 at 01:11:46PM -0700, Kees Cook wrote:
> On Wed, Oct 19, 2016 at 1:45 AM, Colin Vidal <colin@cvidal.org> wrote:
> > Hi Kees,
> >
> >> > This adds arm-specific code in order to support HARDENED_ATOMIC
> >> > feature. When overflow is detected in atomic_t, atomic64_t or
> >> > atomic_long_t, an exception is raised and call
> >> > hardened_atomic_overflow.
> >>
> >> Can you include some notes that this was originally in PaX/grsecurity,
> >> and detail what is different from their implemention?
> >
> > Of course. I add it in the next version.
> >
> >> > Signed-off-by: Colin Vidal <colin@cvidal.org>
> >> > ---
> >> >  arch/arm/Kconfig              |   1 +
> >> >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
> >> >  arch/arm/mm/fault.c           |  15 ++
> >> >  3 files changed, 320 insertions(+), 130 deletions(-)
> >> >
> >> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> >> > index b5d529f..fcf4a64 100644
> >> > --- a/arch/arm/Kconfig
> >> > +++ b/arch/arm/Kconfig
> >> > @@ -36,6 +36,7 @@ config ARM
> >> >         select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
> >> >         select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> >> >         select HAVE_ARCH_HARDENED_USERCOPY
> >> > +       select HAVE_ARCH_HARDENED_ATOMIC
> >> >         select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> >> >         select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> >> >         select HAVE_ARCH_MMAP_RND_BITS if MMU
> >> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> >> > index 66d0e21..fdaee17 100644
> >> > --- a/arch/arm/include/asm/atomic.h
> >> > +++ b/arch/arm/include/asm/atomic.h
> >> > @@ -17,18 +17,52 @@
> >> >  #include <linux/irqflags.h>
> >> >  #include <asm/barrier.h>
> >> >  #include <asm/cmpxchg.h>
> >> > +#include <linux/bug.h>
> >> >
> >> >  #define ATOMIC_INIT(i) { (i) }
> >> >
> >> >  #ifdef __KERNEL__
> >> >
> >> > +#ifdef CONFIG_HARDENED_ATOMIC
> >> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> >>
> >> In PaX, I see a check for THUMB2 config:
> >>
> >> #ifdef CONFIG_THUMB2_KERNEL
> >> #define REFCOUNT_TRAP_INSN "bkpt        0xf1"
> >> #else
> >> #define REFCOUNT_TRAP_INSN "bkpt        0xf103"
> >> #endif
> >>
> >> That should probably stay unless I'm misunderstanding something. Also,
> >> for your new ISNS define name, I'd leave "TRAP" in the name, since
> >> that describes more clearly what it does.
> >
> > Oh yeah. I will add it. Actually I does not add it at first since I
> > does not really understand why there is a special case for Thumbs2 (as
> > far I understand, instructions size can also be 4 bytes). If ARM
> > experts are around, I would appreciate pointers about it :-)
> 
> Cool. Perhaps Takahiro or Mark (now on CC) can comment on it.

"bkpt" is a 16-bit instruction with an 8-bit immediate value in thumb2.

-Takahiro AKASHI

> >> Beyond these things, it looks great to me -- though I'm hardly an ARM expert. :)
> >>
> >> Were you able to test on ARM with this for overflow with the lkdtm tests?
> >
> > Yep. I have to make more thorough tests, but things like
> >
> >     echo HARDENED_ATOMIC_OVERFLOW > <debugfsmountpoint>/provoke-crash/DIRECT
> >
> > raise the exception as well (with hardened message and stack trace). I
> > tested it with armv7/qemu.
> 
> Great!
> 
> -Kees
> 
> -- 
> Kees Cook
> Nexus Security
Colin Vidal Oct. 20, 2016, 8:30 a.m. UTC | #5
On Thu, 2016-10-20 at 14:58 +0900, AKASHI Takahiro wrote:
> On Wed, Oct 19, 2016 at 01:11:46PM -0700, Kees Cook wrote:
> > 
> > On Wed, Oct 19, 2016 at 1:45 AM, Colin Vidal <colin@cvidal.org> wrote:
> > > 
> > > Hi Kees,
> > > 
> > > > 
> > > > > 
> > > > > This adds arm-specific code in order to support HARDENED_ATOMIC
> > > > > feature. When overflow is detected in atomic_t, atomic64_t or
> > > > > atomic_long_t, an exception is raised and call
> > > > > hardened_atomic_overflow.
> > > > 
> > > > Can you include some notes that this was originally in PaX/grsecurity,
> > > > and detail what is different from their implemention?
> > > 
> > > Of course. I add it in the next version.
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Colin Vidal <colin@cvidal.org>
> > > > > ---
> > > > >  arch/arm/Kconfig              |   1 +
> > > > >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
> > > > >  arch/arm/mm/fault.c           |  15 ++
> > > > >  3 files changed, 320 insertions(+), 130 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > > > > index b5d529f..fcf4a64 100644
> > > > > --- a/arch/arm/Kconfig
> > > > > +++ b/arch/arm/Kconfig
> > > > > @@ -36,6 +36,7 @@ config ARM
> > > > >         select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
> > > > >         select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> > > > >         select HAVE_ARCH_HARDENED_USERCOPY
> > > > > +       select HAVE_ARCH_HARDENED_ATOMIC
> > > > >         select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> > > > >         select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> > > > >         select HAVE_ARCH_MMAP_RND_BITS if MMU
> > > > > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > > > > index 66d0e21..fdaee17 100644
> > > > > --- a/arch/arm/include/asm/atomic.h
> > > > > +++ b/arch/arm/include/asm/atomic.h
> > > > > @@ -17,18 +17,52 @@
> > > > >  #include <linux/irqflags.h>
> > > > >  #include <asm/barrier.h>
> > > > >  #include <asm/cmpxchg.h>
> > > > > +#include <linux/bug.h>
> > > > > 
> > > > >  #define ATOMIC_INIT(i) { (i) }
> > > > > 
> > > > >  #ifdef __KERNEL__
> > > > > 
> > > > > +#ifdef CONFIG_HARDENED_ATOMIC
> > > > > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> > > > 
> > > > In PaX, I see a check for THUMB2 config:
> > > > 
> > > > #ifdef CONFIG_THUMB2_KERNEL
> > > > #define REFCOUNT_TRAP_INSN "bkpt        0xf1"
> > > > #else
> > > > #define REFCOUNT_TRAP_INSN "bkpt        0xf103"
> > > > #endif
> > > > 
> > > > That should probably stay unless I'm misunderstanding something. Also,
> > > > for your new ISNS define name, I'd leave "TRAP" in the name, since
> > > > that describes more clearly what it does.
> > > 
> > > Oh yeah. I will add it. Actually I does not add it at first since I
> > > does not really understand why there is a special case for Thumbs2 (as
> > > far I understand, instructions size can also be 4 bytes). If ARM
> > > experts are around, I would appreciate pointers about it :-)
> > 
> > Cool. Perhaps Takahiro or Mark (now on CC) can comment on it.
> 
> "bkpt" is a 16-bit instruction with an 8-bit immediate value in thumb2.

Ok, I better understand why PaX uses this guard now.

Thanks!

Colin
AKASHI Takahiro Oct. 25, 2016, 9:18 a.m. UTC | #6
On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> This adds arm-specific code in order to support HARDENED_ATOMIC
> feature. When overflow is detected in atomic_t, atomic64_t or
> atomic_long_t, an exception is raised and call
> hardened_atomic_overflow.
> 
> Signed-off-by: Colin Vidal <colin@cvidal.org>
> ---
>  arch/arm/Kconfig              |   1 +
>  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
>  arch/arm/mm/fault.c           |  15 ++
>  3 files changed, 320 insertions(+), 130 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index b5d529f..fcf4a64 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -36,6 +36,7 @@ config ARM
>  	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
>  	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
>  	select HAVE_ARCH_HARDENED_USERCOPY
> +	select HAVE_ARCH_HARDENED_ATOMIC
>  	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
>  	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
>  	select HAVE_ARCH_MMAP_RND_BITS if MMU
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index 66d0e21..fdaee17 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -17,18 +17,52 @@
>  #include <linux/irqflags.h>
>  #include <asm/barrier.h>
>  #include <asm/cmpxchg.h>
> +#include <linux/bug.h>
>  
>  #define ATOMIC_INIT(i)	{ (i) }
>  
>  #ifdef __KERNEL__
>  
> +#ifdef CONFIG_HARDENED_ATOMIC
> +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> +#define _ASM_EXTABLE(from, to)			\
> +	".pushsection __ex_table,\"a\"\n"	\
> +	".align 3\n"				\
> +	".long "#from","#to"\n"			\
> +	".popsection"
> +#define __OVERFLOW_POST				\
> +	"bvc 3f\n"				\
> +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> +	"3:\n"
> +#define __OVERFLOW_POST_RETURN			\
> +	"bvc 3f\n"				\
> +	"mov %0,%1\n"				\
> +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> +	"3:\n"
> +#define __OVERFLOW_EXTABLE			\
> +	"4:\n"					\
> +	_ASM_EXTABLE(2b, 4b)
> +#else
> +#define __OVERFLOW_POST
> +#define __OVERFLOW_POST_RETURN
> +#define __OVERFLOW_EXTABLE
> +#endif
> +
>  /*
>   * On ARM, ordinary assignment (str instruction) doesn't clear the local
>   * strex/ldrex monitor on some implementations. The reason we can use it for
>   * atomic_set() is the clrex or dummy strex done on every exception return.
>   */
>  #define atomic_read(v)	READ_ONCE((v)->counter)
> +static inline int atomic_read_wrap(const atomic_wrap_t *v)
> +{
> +	return atomic_read(v);
> +}
>  #define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> +static inline void atomic_set_wrap(atomic_wrap_t *v, int i)
> +{
> +	atomic_set(v, i);
> +}
>  
>  #if __LINUX_ARM_ARCH__ >= 6
>  
> @@ -38,38 +72,46 @@
>   * to ensure that the update happens.
>   */
>  
> -#define ATOMIC_OP(op, c_op, asm_op)					\
> -static inline void atomic_##op(int i, atomic_t *v)			\
> +#define __ATOMIC_OP(op, suffix, c_op, asm_op, post_op, extable)		\
> +static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
>  {									\
>  	unsigned long tmp;						\
>  	int result;							\
>  									\
>  	prefetchw(&v->counter);						\
> -	__asm__ __volatile__("@ atomic_" #op "\n"			\
> +	__asm__ __volatile__("@ atomic_" #op #suffix "\n"		\
>  "1:	ldrex	%0, [%3]\n"						\
>  "	" #asm_op "	%0, %0, %4\n"					\
> +        post_op                 					\
>  "	strex	%1, %0, [%3]\n"						\
>  "	teq	%1, #0\n"						\
> -"	bne	1b"							\
> +"	bne	1b\n"							\
> +        extable                 					\
>  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
>  	: "r" (&v->counter), "Ir" (i)					\
>  	: "cc");							\
>  }									\
>  
> -#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> -static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
> +#define ATOMIC_OP(op, c_op, asm_op)		                        \
> +	__ATOMIC_OP(op, _wrap, c_op, asm_op, , )			\
> +	__ATOMIC_OP(op, , c_op, asm_op##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
> +
> +#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op, post_op, extable)	\
> +static inline int atomic_##op##_return##suffix##_relaxed(int i, atomic##suffix##_t *v) \
>  {									\
>  	unsigned long tmp;						\
>  	int result;							\
>  									\
>  	prefetchw(&v->counter);						\
>  									\
> -	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
> +	__asm__ __volatile__("@ atomic_" #op "_return" #suffix "\n"	\
>  "1:	ldrex	%0, [%3]\n"						\
>  "	" #asm_op "	%0, %0, %4\n"					\
> +        post_op                 					\
>  "	strex	%1, %0, [%3]\n"						\
>  "	teq	%1, #0\n"						\
> -"	bne	1b"							\
> +"	bne	1b\n"							\
> +        extable                 					\
>  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
>  	: "r" (&v->counter), "Ir" (i)					\
>  	: "cc");							\
> @@ -77,6 +119,11 @@ static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
>  	return result;							\
>  }
>  
> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
> +	__ATOMIC_OP_RETURN(op, _wrap, c_op, asm_op, , )			\
> +	__ATOMIC_OP_RETURN(op, , c_op, asm_op##s,			\
> +			   __OVERFLOW_POST_RETURN, __OVERFLOW_EXTABLE)
> +

This definition will create atomic_add_return_wrap_relaxed(),
but should the name be atomic_add_return_relaxed_wrap()?

(I don't know we need _wrap version of _relaxed functions.
See Elena's atomic-long.h.)

Thanks,
-Takahiro AKASHI

>  #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
>  static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
>  {									\
> @@ -108,26 +155,34 @@ static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
>  #define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
>  #define atomic_fetch_xor_relaxed	atomic_fetch_xor_relaxed
>  
> -static inline int atomic_cmpxchg_relaxed(atomic_t *ptr, int old, int new)
> -{
> -	int oldval;
> -	unsigned long res;
> -
> -	prefetchw(&ptr->counter);
> -
> -	do {
> -		__asm__ __volatile__("@ atomic_cmpxchg\n"
> -		"ldrex	%1, [%3]\n"
> -		"mov	%0, #0\n"
> -		"teq	%1, %4\n"
> -		"strexeq %0, %5, [%3]\n"
> -		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
> -		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
> -		    : "cc");
> -	} while (res);
> -
> -	return oldval;
> +#define __ATOMIC_CMPXCHG_RELAXED(suffix)			       	\
> +static inline int atomic_cmpxchg##suffix##_relaxed(atomic##suffix##_t *ptr, \
> +						   int old, int new)	\
> +{									\
> +	int oldval;                                                     \
> +	unsigned long res;                                              \
> +									\
> +	prefetchw(&ptr->counter);					\
> +									\
> +	do {								\
> +	        __asm__ __volatile__("@ atomic_cmpxchg" #suffix "\n"	\
> +		"ldrex	%1, [%3]\n"					\
> +		"mov	%0, #0\n"					\
> +		"teq	%1, %4\n"					\
> +		"strexeq %0, %5, [%3]\n"				\
> +		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter) \
> +		    : "r" (&ptr->counter), "Ir" (old), "r" (new)        \
> +		    : "cc");                                            \
> +	} while (res);							\
> +									\
> +	return oldval;							\
>  }
> +
> +__ATOMIC_CMPXCHG_RELAXED()
> +__ATOMIC_CMPXCHG_RELAXED(_wrap)
> +
> +#undef __ATOMIC_CMPXCHG_RELAXED
> +
>  #define atomic_cmpxchg_relaxed		atomic_cmpxchg_relaxed
>  
>  static inline int __atomic_add_unless(atomic_t *v, int a, int u)
> @@ -141,12 +196,21 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
>  	__asm__ __volatile__ ("@ atomic_add_unless\n"
>  "1:	ldrex	%0, [%4]\n"
>  "	teq	%0, %5\n"
> -"	beq	2f\n"
> -"	add	%1, %0, %6\n"
> +"	beq	4f\n"
> +"	adds	%1, %0, %6\n"
> +
> +#ifdef CONFIG_HARDENED_ATOMIC
> +"       bvc 3f\n"
> +"2:     "HARDENED_ATOMIC_INSN"\n"
> +"3:\n"
> +#endif
>  "	strex	%2, %1, [%4]\n"
>  "	teq	%2, #0\n"
>  "	bne	1b\n"
> -"2:"
> +"4:"
> +#ifdef CONFIG_HARDENED_ATOMIC
> +        _ASM_EXTABLE(2b, 4b)
> +#endif
>  	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (u), "r" (a)
>  	: "cc");
> @@ -163,8 +227,8 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
>  #error SMP not supported on pre-ARMv6 CPUs
>  #endif
>  
> -#define ATOMIC_OP(op, c_op, asm_op)					\
> -static inline void atomic_##op(int i, atomic_t *v)			\
> +#define __ATOMIC_OP(op, suffix, c_op, asm_op)				\
> +static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
>  {									\
>  	unsigned long flags;						\
>  									\
> @@ -173,8 +237,12 @@ static inline void atomic_##op(int i, atomic_t *v)			\
>  	raw_local_irq_restore(flags);					\
>  }									\
>  
> -#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> -static inline int atomic_##op##_return(int i, atomic_t *v)		\
> +#define ATOMIC_OP(op, c_op, asm_op)		                        \
> +	__ATOMIC_OP(op, _wrap, c_op, asm_op)				\
> +	__ATOMIC_OP(op, , c_op, asm_op)
> +
> +#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op)			\
> +static inline int atomic_##op##_return##suffix(int i, atomic##suffix##_t *v) \
>  {									\
>  	unsigned long flags;						\
>  	int val;							\
> @@ -187,6 +255,10 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
>  	return val;							\
>  }
>  
> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
> +	__ATOMIC_OP_RETURN(op, wrap, c_op, asm_op)			\
> +	__ATOMIC_OP_RETURN(op, , c_op, asm_op)
> +
>  #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
>  static inline int atomic_fetch_##op(int i, atomic_t *v)			\
>  {									\
> @@ -215,6 +287,11 @@ static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
>  	return ret;
>  }
>  
> +static inline int atomic_cmpxchg_wrap(atomic_wrap_t *v, int old, int new)
> +{
> +	return atomic_cmpxchg((atomic_t *)v, old, new);
> +}
> +
>  static inline int __atomic_add_unless(atomic_t *v, int a, int u)
>  {
>  	int c, old;
> @@ -227,6 +304,11 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
>  
>  #endif /* __LINUX_ARM_ARCH__ */
>  
> +static inline int __atomic_add_unless_wrap(atomic_wrap_t *v, int a, int u)
> +{
> +	return __atomic_add_unless((atomic_t *)v, a, u);
> +}
> +
>  #define ATOMIC_OPS(op, c_op, asm_op)					\
>  	ATOMIC_OP(op, c_op, asm_op)					\
>  	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> @@ -250,18 +332,30 @@ ATOMIC_OPS(xor, ^=, eor)
>  #undef ATOMIC_OPS
>  #undef ATOMIC_FETCH_OP
>  #undef ATOMIC_OP_RETURN
> +#undef __ATOMIC_OP_RETURN
>  #undef ATOMIC_OP
> +#undef __ATOMIC_OP
>  
>  #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
> -
> +#define atomic_xchg_wrap(v, new) atomic_xchg(v, new)
>  #define atomic_inc(v)		atomic_add(1, v)
> +static inline void atomic_inc_wrap(atomic_wrap_t *v)
> +{
> +	atomic_add_wrap(1, v);
> +}
>  #define atomic_dec(v)		atomic_sub(1, v)
> +static inline void atomic_dec_wrap(atomic_wrap_t *v)
> +{
> +	atomic_sub_wrap(1, v);
> +}
>  
>  #define atomic_inc_and_test(v)	(atomic_add_return(1, v) == 0)
>  #define atomic_dec_and_test(v)	(atomic_sub_return(1, v) == 0)
>  #define atomic_inc_return_relaxed(v)    (atomic_add_return_relaxed(1, v))
> +#define atomic_inc_return_wrap_relaxed(v) (atomic_add_return_wrap_relaxed(1, v))
>  #define atomic_dec_return_relaxed(v)    (atomic_sub_return_relaxed(1, v))
>  #define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
> +#define atomic_sub_and_test_wrap(i, v) (atomic_sub_return_wrap(i, v) == 0)
>  
>  #define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
>  
> @@ -270,62 +364,81 @@ typedef struct {
>  	long long counter;
>  } atomic64_t;
>  
> +#ifdef CONFIG_HARDENED_ATOMIC
> +typedef struct {
> +	long long counter;
> +} atomic64_wrap_t;
> +#else
> +typedef atomic64_t atomic64_wrap_t;
> +#endif
> +
>  #define ATOMIC64_INIT(i) { (i) }
>  
> -#ifdef CONFIG_ARM_LPAE
> -static inline long long atomic64_read(const atomic64_t *v)
> -{
> -	long long result;
> +#define __ATOMIC64_READ(suffix, asm_op)					\
> +static inline long long						        \
> +atomic64_read##suffix(const atomic64##suffix##_t *v)			\
> +{						                        \
> +	long long result;                                               \
> +									\
> +	__asm__ __volatile__("@ atomic64_read" #suffix "\n"		\
> +"	" #asm_op " %0, %H0, [%1]"					        \
> +	: "=&r" (result)						\
> +	: "r" (&v->counter), "Qo" (v->counter)	                        \
> +	);         							\
> +									\
> +	return result;							\
> +}
>  
> -	__asm__ __volatile__("@ atomic64_read\n"
> -"	ldrd	%0, %H0, [%1]"
> -	: "=&r" (result)
> -	: "r" (&v->counter), "Qo" (v->counter)
> -	);
> +#ifdef CONFIG_ARM_LPAE
> +__ATOMIC64_READ(, ldrd)
> +__ATOMIC64_READ(wrap, ldrd)
>  
> -	return result;
> +#define __ATOMIC64_SET(suffix)					        \
> +static inline void atomic64_set##suffix(atomic64##suffix##_t *v, long long i) \
> +{									\
> +        __asm__ __volatile__("@ atomic64_set" #suffix "\n"		\
> +"	strd	%2, %H2, [%1]"					        \
> +	: "=Qo" (v->counter)						\
> +	: "r" (&v->counter), "r" (i)		                        \
> +	);							        \
>  }
>  
> -static inline void atomic64_set(atomic64_t *v, long long i)
> -{
> -	__asm__ __volatile__("@ atomic64_set\n"
> -"	strd	%2, %H2, [%1]"
> -	: "=Qo" (v->counter)
> -	: "r" (&v->counter), "r" (i)
> -	);
> -}
> -#else
> -static inline long long atomic64_read(const atomic64_t *v)
> -{
> -	long long result;
> +__ATOMIC64_SET()
> +__ATOMIC64_SET(_wrap)
>  
> -	__asm__ __volatile__("@ atomic64_read\n"
> -"	ldrexd	%0, %H0, [%1]"
> -	: "=&r" (result)
> -	: "r" (&v->counter), "Qo" (v->counter)
> -	);
> +#undef __ATOMIC64
>  
> -	return result;
> +#else
> +__ATOMIC64_READ(, ldrexd)
> +__ATOMIC64_READ(_wrap, ldrexd)
> +
> +#define __ATOMIC64_SET(suffix)					        \
> +static inline void atomic64_set##suffix(atomic64##suffix##_t *v, long long i) \
> +{									\
> +	long long tmp;                                                  \
> +									\
> +	prefetchw(&v->counter);						\
> +	__asm__ __volatile__("@ atomic64_set" #suffix"\n"               \
> +"1:	ldrexd	%0, %H0, [%2]\n"                                        \
> +"	strexd	%0, %3, %H3, [%2]\n"                                    \
> +"	teq	%0, #0\n"                                               \
> +"	bne	1b"                                                     \
> +	: "=&r" (tmp), "=Qo" (v->counter)				\
> +	: "r" (&v->counter), "r" (i)		                        \
> +	: "cc");                                                        \
>  }
>  
> -static inline void atomic64_set(atomic64_t *v, long long i)
> -{
> -	long long tmp;
> +__ATOMIC64_SET()
> +__ATOMIC64_SET(_wrap)
> +
> +#undef __ATOMIC64_SET
>  
> -	prefetchw(&v->counter);
> -	__asm__ __volatile__("@ atomic64_set\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	strexd	%0, %3, %H3, [%2]\n"
> -"	teq	%0, #0\n"
> -"	bne	1b"
> -	: "=&r" (tmp), "=Qo" (v->counter)
> -	: "r" (&v->counter), "r" (i)
> -	: "cc");
> -}
>  #endif
>  
> -#define ATOMIC64_OP(op, op1, op2)					\
> -static inline void atomic64_##op(long long i, atomic64_t *v)		\
> +#undef __ATOMIC64_READ
> +
> +#define __ATOMIC64_OP(op, suffix, op1, op2, post_op, extable)		\
> +static inline void atomic64_##op##suffix(long long i, atomic64##suffix##_t *v) \
>  {									\
>  	long long result;						\
>  	unsigned long tmp;						\
> @@ -335,17 +448,31 @@ static inline void atomic64_##op(long long i, atomic64_t *v)		\
>  "1:	ldrexd	%0, %H0, [%3]\n"					\
>  "	" #op1 " %Q0, %Q0, %Q4\n"					\
>  "	" #op2 " %R0, %R0, %R4\n"					\
> +        post_op					                        \
>  "	strexd	%1, %0, %H0, [%3]\n"					\
>  "	teq	%1, #0\n"						\
> -"	bne	1b"							\
> +"	bne	1b\n"							\
> +	extable                                                         \
>  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
>  	: "r" (&v->counter), "r" (i)					\
>  	: "cc");							\
> -}									\
> +}
>  
> -#define ATOMIC64_OP_RETURN(op, op1, op2)				\
> +#define ATOMIC64_OP(op, op1, op2)		                        \
> +	__ATOMIC64_OP(op, _wrap, op1, op2, , )				\
> +	__ATOMIC64_OP(op, , op1, op2##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
> +
> +#undef __OVERFLOW_POST_RETURN
> +#define __OVERFLOW_POST_RETURN			\
> +	"bvc 3f\n"				\
> +	"mov %0, %1\n"				\
> +	"mov %H0, %H1\n"			\
> +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> +	"3:\n"
> +
> +#define __ATOMIC64_OP_RETURN(op, suffix, op1, op2, post_op, extable)	\
>  static inline long long							\
> -atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
> +atomic64_##op##_return##suffix##_relaxed(long long i, atomic64##suffix##_t *v) \
>  {									\
>  	long long result;						\
>  	unsigned long tmp;						\
> @@ -356,9 +483,11 @@ atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
>  "1:	ldrexd	%0, %H0, [%3]\n"					\
>  "	" #op1 " %Q0, %Q0, %Q4\n"					\
>  "	" #op2 " %R0, %R0, %R4\n"					\
> +	post_op                                                         \
>  "	strexd	%1, %0, %H0, [%3]\n"					\
>  "	teq	%1, #0\n"						\
> -"	bne	1b"							\
> +"	bne	1b\n"							\
> +	extable                                                         \
>  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
>  	: "r" (&v->counter), "r" (i)					\
>  	: "cc");							\
> @@ -366,6 +495,11 @@ atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
>  	return result;							\
>  }
>  
> +#define ATOMIC64_OP_RETURN(op, op1, op2)	                        \
> +	__ATOMIC64_OP_RETURN(op, _wrap, op1, op2, , )			\
> +	__ATOMIC64_OP_RETURN(op, , op1, op2##s, __OVERFLOW_POST_RETURN, \
> +			     __OVERFLOW_EXTABLE)
> +
>  #define ATOMIC64_FETCH_OP(op, op1, op2)					\
>  static inline long long							\
>  atomic64_fetch_##op##_relaxed(long long i, atomic64_t *v)		\
> @@ -422,70 +556,98 @@ ATOMIC64_OPS(xor, eor, eor)
>  #undef ATOMIC64_OPS
>  #undef ATOMIC64_FETCH_OP
>  #undef ATOMIC64_OP_RETURN
> +#undef __ATOMIC64_OP_RETURN
>  #undef ATOMIC64_OP
> -
> -static inline long long
> -atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new)
> -{
> -	long long oldval;
> -	unsigned long res;
> -
> -	prefetchw(&ptr->counter);
> -
> -	do {
> -		__asm__ __volatile__("@ atomic64_cmpxchg\n"
> -		"ldrexd		%1, %H1, [%3]\n"
> -		"mov		%0, #0\n"
> -		"teq		%1, %4\n"
> -		"teqeq		%H1, %H4\n"
> -		"strexdeq	%0, %5, %H5, [%3]"
> -		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
> -		: "r" (&ptr->counter), "r" (old), "r" (new)
> -		: "cc");
> -	} while (res);
> -
> -	return oldval;
> +#undef __ATOMIC64_OP
> +#undef __OVERFLOW_EXTABLE
> +#undef __OVERFLOW_POST_RETURN
> +#undef __OVERFLOW_RETURN
> +
> +#define __ATOMIC64_CMPXCHG_RELAXED(suffix)	                        \
> +static inline long long	atomic64_cmpxchg##suffix##_relaxed(             \
> +	atomic64##suffix##_t *ptr, long long old, long long new)	\
> +{									\
> +	long long oldval;                                               \
> +	unsigned long res;						\
> +									\
> +	prefetchw(&ptr->counter);					\
> +									\
> +	do {								\
> +		__asm__ __volatile__("@ atomic64_cmpxchg" #suffix "\n"  \
> +		"ldrexd		%1, %H1, [%3]\n"			\
> +		"mov		%0, #0\n"				\
> +		"teq		%1, %4\n"				\
> +		"teqeq		%H1, %H4\n"				\
> +		"strexdeq	%0, %5, %H5, [%3]"			\
> +		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)     \
> +		: "r" (&ptr->counter), "r" (old), "r" (new)             \
> +		: "cc");                                                \
> +	} while (res);							\
> +									\
> +	return oldval;							\
>  }
> -#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
>  
> -static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new)
> -{
> -	long long result;
> -	unsigned long tmp;
> -
> -	prefetchw(&ptr->counter);
> +__ATOMIC64_CMPXCHG_RELAXED()
> +__ATOMIC64_CMPXCHG_RELAXED(_wrap)
> +#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
>  
> -	__asm__ __volatile__("@ atomic64_xchg\n"
> -"1:	ldrexd	%0, %H0, [%3]\n"
> -"	strexd	%1, %4, %H4, [%3]\n"
> -"	teq	%1, #0\n"
> -"	bne	1b"
> -	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
> -	: "r" (&ptr->counter), "r" (new)
> -	: "cc");
> +#undef __ATOMIC64_CMPXCHG_RELAXED
>  
> -	return result;
> +#define __ATOMIC64_XCHG_RELAXED(suffix)					\
> +static inline long long atomic64_xchg##suffix##_relaxed(                \
> +	atomic64##suffix##_t *ptr, long long new)			\
> +{									\
> +	long long result;                                               \
> +	unsigned long tmp;						\
> +									\
> +	prefetchw(&ptr->counter);					\
> +									\
> +	__asm__ __volatile__("@ atomic64_xchg" #suffix "\n"		\
> +"1:	ldrexd	%0, %H0, [%3]\n"                                        \
> +"	strexd	%1, %4, %H4, [%3]\n"                                    \
> +"	teq	%1, #0\n"                                               \
> +"	bne	1b"                                                     \
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)             \
> +	: "r" (&ptr->counter), "r" (new)                                \
> +	: "cc");                                                        \
> +									\
> +	return result;							\
>  }
> +
> +__ATOMIC64_XCHG_RELAXED()
> +__ATOMIC64_XCHG_RELAXED(_wrap)
>  #define atomic64_xchg_relaxed		atomic64_xchg_relaxed
>  
> +#undef __ATOMIC64_XCHG_RELAXED
> +
>  static inline long long atomic64_dec_if_positive(atomic64_t *v)
>  {
>  	long long result;
> -	unsigned long tmp;
> +	u64 tmp;
>  
>  	smp_mb();
>  	prefetchw(&v->counter);
>  
>  	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
> -"1:	ldrexd	%0, %H0, [%3]\n"
> -"	subs	%Q0, %Q0, #1\n"
> -"	sbc	%R0, %R0, #0\n"
> +"1:	ldrexd	%1, %H1, [%3]\n"
> +"	subs	%Q0, %Q1, #1\n"
> +"	sbcs	%R0, %R1, #0\n"
> +#ifdef CONFIG_HARDENED_ATOMIC
> +"	bvc	3f\n"
> +"	mov	%Q0, %Q1\n"
> +"	mov	%R0, %R1\n"
> +"2:	"HARDENED_ATOMIC_INSN"\n"
> +"3:\n"
> +#endif
>  "	teq	%R0, #0\n"
> -"	bmi	2f\n"
> +"	bmi	4f\n"
>  "	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b\n"
> -"2:"
> +"4:\n"
> +#ifdef CONFIG_HARDENED_ATOMIC
> +       _ASM_EXTABLE(2b, 4b)
> +#endif
>  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter)
>  	: "cc");
> @@ -509,13 +671,21 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
>  "	teq	%0, %5\n"
>  "	teqeq	%H0, %H5\n"
>  "	moveq	%1, #0\n"
> -"	beq	2f\n"
> +"	beq	4f\n"
>  "	adds	%Q0, %Q0, %Q6\n"
> -"	adc	%R0, %R0, %R6\n"
> +"	adcs	%R0, %R0, %R6\n"
> +#ifdef CONFIG_HARDENED_ATOMIC
> +"       bvc     3f\n"
> +"2:     "HARDENED_ATOMIC_INSN"\n"
> +"3:\n"
> +#endif
>  "	strexd	%2, %0, %H0, [%4]\n"
>  "	teq	%2, #0\n"
>  "	bne	1b\n"
> -"2:"
> +"4:\n"
> +#ifdef CONFIG_HARDENED_ATOMIC
> +       _ASM_EXTABLE(2b, 4b)
> +#endif
>  	: "=&r" (val), "+r" (ret), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (u), "r" (a)
>  	: "cc");
> @@ -529,6 +699,7 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
>  #define atomic64_add_negative(a, v)	(atomic64_add_return((a), (v)) < 0)
>  #define atomic64_inc(v)			atomic64_add(1LL, (v))
>  #define atomic64_inc_return_relaxed(v)	atomic64_add_return_relaxed(1LL, (v))
> +#define atomic64_inc_return_wrap_relaxed(v) atomic64_add_return_wrap_relaxed(1LL, v)
>  #define atomic64_inc_and_test(v)	(atomic64_inc_return(v) == 0)
>  #define atomic64_sub_and_test(a, v)	(atomic64_sub_return((a), (v)) == 0)
>  #define atomic64_dec(v)			atomic64_sub(1LL, (v))
> @@ -536,6 +707,9 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
>  #define atomic64_dec_and_test(v)	(atomic64_dec_return((v)) == 0)
>  #define atomic64_inc_not_zero(v)	atomic64_add_unless((v), 1LL, 0LL)
>  
> +#define atomic64_inc_wrap(v) atomic64_add_wrap(1LL, v)
> +#define atomic64_dec_wrap(v) atomic64_sub_wrap(1LL, v)
> +
>  #endif /* !CONFIG_GENERIC_ATOMIC64 */
>  #endif
>  #endif
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 3a2e678..ce8ee00 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
>  	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
>  	struct siginfo info;
>  
> +#ifdef CONFIG_HARDENED_ATOMIC
> +	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
> +		unsigned long pc = instruction_pointer(regs);
> +		unsigned int bkpt;
> +
> +		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
> +		    cpu_to_le32(bkpt) == 0xe12f1073) {
> +			current->thread.error_code = ifsr;
> +			current->thread.trap_no = 0;
> +			hardened_atomic_overflow(regs);
> +			fixup_exception(regs);
> +			return;
> +		}
> +	}
> +#endif
>  	if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
>  		return;
>  
> -- 
> 2.7.4
>
Colin Vidal Oct. 25, 2016, 3:02 p.m. UTC | #7
On Tue, 2016-10-25 at 18:18 +0900, AKASHI Takahiro wrote:
> On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> > 
> > This adds arm-specific code in order to support HARDENED_ATOMIC
> > feature. When overflow is detected in atomic_t, atomic64_t or
> > atomic_long_t, an exception is raised and call
> > hardened_atomic_overflow.
> > 
> > Signed-off-by: Colin Vidal <colin@cvidal.org>
> > ---
> >  arch/arm/Kconfig              |   1 +
> >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
> >  arch/arm/mm/fault.c           |  15 ++
> >  3 files changed, 320 insertions(+), 130 deletions(-)
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index b5d529f..fcf4a64 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -36,6 +36,7 @@ config ARM
> >  	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
> >  	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> >  	select HAVE_ARCH_HARDENED_USERCOPY
> > +	select HAVE_ARCH_HARDENED_ATOMIC
> >  	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> >  	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> >  	select HAVE_ARCH_MMAP_RND_BITS if MMU
> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > index 66d0e21..fdaee17 100644
> > --- a/arch/arm/include/asm/atomic.h
> > +++ b/arch/arm/include/asm/atomic.h
> > @@ -17,18 +17,52 @@
> >  #include <linux/irqflags.h>
> >  #include <asm/barrier.h>
> >  #include <asm/cmpxchg.h>
> > +#include <linux/bug.h>
> >  
> >  #define ATOMIC_INIT(i)	{ (i) }
> >  
> >  #ifdef __KERNEL__
> >  
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> > +#define _ASM_EXTABLE(from, to)			\
> > +	".pushsection __ex_table,\"a\"\n"	\
> > +	".align 3\n"				\
> > +	".long "#from","#to"\n"			\
> > +	".popsection"
> > +#define __OVERFLOW_POST				\
> > +	"bvc 3f\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_POST_RETURN			\
> > +	"bvc 3f\n"				\
> > +	"mov %0,%1\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_EXTABLE			\
> > +	"4:\n"					\
> > +	_ASM_EXTABLE(2b, 4b)
> > +#else
> > +#define __OVERFLOW_POST
> > +#define __OVERFLOW_POST_RETURN
> > +#define __OVERFLOW_EXTABLE
> > +#endif
> > +
> >  /*
> >   * On ARM, ordinary assignment (str instruction) doesn't clear the local
> >   * strex/ldrex monitor on some implementations. The reason we can use it for
> >   * atomic_set() is the clrex or dummy strex done on every exception return.
> >   */
> >  #define atomic_read(v)	READ_ONCE((v)->counter)
> > +static inline int atomic_read_wrap(const atomic_wrap_t *v)
> > +{
> > +	return atomic_read(v);
> > +}
> >  #define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> > +static inline void atomic_set_wrap(atomic_wrap_t *v, int i)
> > +{
> > +	atomic_set(v, i);
> > +}
> >  
> >  #if __LINUX_ARM_ARCH__ >= 6
> >  
> > @@ -38,38 +72,46 @@
> >   * to ensure that the update happens.
> >   */
> >  
> > -#define ATOMIC_OP(op, c_op, asm_op)					\
> > -static inline void atomic_##op(int i, atomic_t *v)			\
> > +#define __ATOMIC_OP(op, suffix, c_op, asm_op, post_op, extable)		\
> > +static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
> >  {									\
> >  	unsigned long tmp;						\
> >  	int result;							\
> >  									\
> >  	prefetchw(&v->counter);						\
> > -	__asm__ __volatile__("@ atomic_" #op "\n"			\
> > +	__asm__ __volatile__("@ atomic_" #op #suffix "\n"		\
> >  "1:	ldrex	%0, [%3]\n"						\
> >  "	" #asm_op "	%0, %0, %4\n"					\
> > +        post_op                 					\
> >  "	strex	%1, %0, [%3]\n"						\
> >  "	teq	%1, #0\n"						\
> > -"	bne	1b"							\
> > +"	bne	1b\n"							\
> > +        extable                 					\
> >  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
> >  	: "r" (&v->counter), "Ir" (i)					\
> >  	: "cc");							\
> >  }									\
> >  
> > -#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> > -static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
> > +#define ATOMIC_OP(op, c_op, asm_op)		                        \
> > +	__ATOMIC_OP(op, _wrap, c_op, asm_op, , )			\
> > +	__ATOMIC_OP(op, , c_op, asm_op##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
> > +
> > +#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op, post_op, extable)	\
> > +static inline int atomic_##op##_return##suffix##_relaxed(int i, atomic##suffix##_t *v) \
> >  {									\
> >  	unsigned long tmp;						\
> >  	int result;							\
> >  									\
> >  	prefetchw(&v->counter);						\
> >  									\
> > -	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
> > +	__asm__ __volatile__("@ atomic_" #op "_return" #suffix "\n"	\
> >  "1:	ldrex	%0, [%3]\n"						\
> >  "	" #asm_op "	%0, %0, %4\n"					\
> > +        post_op                 					\
> >  "	strex	%1, %0, [%3]\n"						\
> >  "	teq	%1, #0\n"						\
> > -"	bne	1b"							\
> > +"	bne	1b\n"							\
> > +        extable                 					\
> >  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
> >  	: "r" (&v->counter), "Ir" (i)					\
> >  	: "cc");							\
> > @@ -77,6 +119,11 @@ static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
> >  	return result;							\
> >  }
> >  
> > +#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
> > +	__ATOMIC_OP_RETURN(op, _wrap, c_op, asm_op, , )			\
> > +	__ATOMIC_OP_RETURN(op, , c_op, asm_op##s,			\
> > +			   __OVERFLOW_POST_RETURN, __OVERFLOW_EXTABLE)
> > +
> 
> This definition will create atomic_add_return_wrap_relaxed(),
> but should the name be atomic_add_return_relaxed_wrap()?
> 
> (I don't know we need _wrap version of _relaxed functions.
> See Elena's atomic-long.h.)
> 
> Thanks,
> -Takahiro AKASHI

Hi Takahiro,

as far I understand, the name should be atomic_add_return_wrap_relaxed
since atomic_add_return_wrap is created by __atomic_op_fence (in order
to put memory barrier around the call to
atomic_add_return_wrap_relaxed) in include/linux/atomic.h.

Am I wrong?

Thanks!

Colin
AKASHI Takahiro Oct. 26, 2016, 7:24 a.m. UTC | #8
On Tue, Oct 25, 2016 at 05:02:47PM +0200, Colin Vidal wrote:
> On Tue, 2016-10-25 at 18:18 +0900, AKASHI Takahiro wrote:
> > On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> > > 
> > > This adds arm-specific code in order to support HARDENED_ATOMIC
> > > feature. When overflow is detected in atomic_t, atomic64_t or
> > > atomic_long_t, an exception is raised and call
> > > hardened_atomic_overflow.
> > > 
> > > Signed-off-by: Colin Vidal <colin@cvidal.org>
> > > ---
> > >  arch/arm/Kconfig              |   1 +
> > >  arch/arm/include/asm/atomic.h | 434 +++++++++++++++++++++++++++++-------------
> > >  arch/arm/mm/fault.c           |  15 ++
> > >  3 files changed, 320 insertions(+), 130 deletions(-)
> > > 
> > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > > index b5d529f..fcf4a64 100644
> > > --- a/arch/arm/Kconfig
> > > +++ b/arch/arm/Kconfig
> > > @@ -36,6 +36,7 @@ config ARM
> > >  	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
> > >  	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> > >  	select HAVE_ARCH_HARDENED_USERCOPY
> > > +	select HAVE_ARCH_HARDENED_ATOMIC
> > >  	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> > >  	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> > >  	select HAVE_ARCH_MMAP_RND_BITS if MMU
> > > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > > index 66d0e21..fdaee17 100644
> > > --- a/arch/arm/include/asm/atomic.h
> > > +++ b/arch/arm/include/asm/atomic.h
> > > @@ -17,18 +17,52 @@
> > >  #include <linux/irqflags.h>
> > >  #include <asm/barrier.h>
> > >  #include <asm/cmpxchg.h>
> > > +#include <linux/bug.h>
> > >  
> > >  #define ATOMIC_INIT(i)	{ (i) }
> > >  
> > >  #ifdef __KERNEL__
> > >  
> > > +#ifdef CONFIG_HARDENED_ATOMIC
> > > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> > > +#define _ASM_EXTABLE(from, to)			\
> > > +	".pushsection __ex_table,\"a\"\n"	\
> > > +	".align 3\n"				\
> > > +	".long "#from","#to"\n"			\
> > > +	".popsection"
> > > +#define __OVERFLOW_POST				\
> > > +	"bvc 3f\n"				\
> > > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > > +	"3:\n"
> > > +#define __OVERFLOW_POST_RETURN			\
> > > +	"bvc 3f\n"				\
> > > +	"mov %0,%1\n"				\
> > > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > > +	"3:\n"
> > > +#define __OVERFLOW_EXTABLE			\
> > > +	"4:\n"					\
> > > +	_ASM_EXTABLE(2b, 4b)
> > > +#else
> > > +#define __OVERFLOW_POST
> > > +#define __OVERFLOW_POST_RETURN
> > > +#define __OVERFLOW_EXTABLE
> > > +#endif
> > > +
> > >  /*
> > >   * On ARM, ordinary assignment (str instruction) doesn't clear the local
> > >   * strex/ldrex monitor on some implementations. The reason we can use it for
> > >   * atomic_set() is the clrex or dummy strex done on every exception return.
> > >   */
> > >  #define atomic_read(v)	READ_ONCE((v)->counter)
> > > +static inline int atomic_read_wrap(const atomic_wrap_t *v)
> > > +{
> > > +	return atomic_read(v);
> > > +}
> > >  #define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> > > +static inline void atomic_set_wrap(atomic_wrap_t *v, int i)
> > > +{
> > > +	atomic_set(v, i);
> > > +}
> > >  
> > >  #if __LINUX_ARM_ARCH__ >= 6
> > >  
> > > @@ -38,38 +72,46 @@
> > >   * to ensure that the update happens.
> > >   */
> > >  
> > > -#define ATOMIC_OP(op, c_op, asm_op)					\
> > > -static inline void atomic_##op(int i, atomic_t *v)			\
> > > +#define __ATOMIC_OP(op, suffix, c_op, asm_op, post_op, extable)		\
> > > +static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
> > >  {									\
> > >  	unsigned long tmp;						\
> > >  	int result;							\
> > >  									\
> > >  	prefetchw(&v->counter);						\
> > > -	__asm__ __volatile__("@ atomic_" #op "\n"			\
> > > +	__asm__ __volatile__("@ atomic_" #op #suffix "\n"		\
> > >  "1:	ldrex	%0, [%3]\n"						\
> > >  "	" #asm_op "	%0, %0, %4\n"					\
> > > +        post_op                 					\
> > >  "	strex	%1, %0, [%3]\n"						\
> > >  "	teq	%1, #0\n"						\
> > > -"	bne	1b"							\
> > > +"	bne	1b\n"							\
> > > +        extable                 					\
> > >  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
> > >  	: "r" (&v->counter), "Ir" (i)					\
> > >  	: "cc");							\
> > >  }									\
> > >  
> > > -#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
> > > -static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
> > > +#define ATOMIC_OP(op, c_op, asm_op)		                        \
> > > +	__ATOMIC_OP(op, _wrap, c_op, asm_op, , )			\
> > > +	__ATOMIC_OP(op, , c_op, asm_op##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
> > > +
> > > +#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op, post_op, extable)	\
> > > +static inline int atomic_##op##_return##suffix##_relaxed(int i, atomic##suffix##_t *v) \
> > >  {									\
> > >  	unsigned long tmp;						\
> > >  	int result;							\
> > >  									\
> > >  	prefetchw(&v->counter);						\
> > >  									\
> > > -	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
> > > +	__asm__ __volatile__("@ atomic_" #op "_return" #suffix "\n"	\
> > >  "1:	ldrex	%0, [%3]\n"						\
> > >  "	" #asm_op "	%0, %0, %4\n"					\
> > > +        post_op                 					\
> > >  "	strex	%1, %0, [%3]\n"						\
> > >  "	teq	%1, #0\n"						\
> > > -"	bne	1b"							\
> > > +"	bne	1b\n"							\
> > > +        extable                 					\
> > >  	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
> > >  	: "r" (&v->counter), "Ir" (i)					\
> > >  	: "cc");							\
> > > @@ -77,6 +119,11 @@ static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
> > >  	return result;							\
> > >  }
> > >  
> > > +#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
> > > +	__ATOMIC_OP_RETURN(op, _wrap, c_op, asm_op, , )			\
> > > +	__ATOMIC_OP_RETURN(op, , c_op, asm_op##s,			\
> > > +			   __OVERFLOW_POST_RETURN, __OVERFLOW_EXTABLE)
> > > +
> > 
> > This definition will create atomic_add_return_wrap_relaxed(),
> > but should the name be atomic_add_return_relaxed_wrap()?
> > 
> > (I don't know we need _wrap version of _relaxed functions.
> > See Elena's atomic-long.h.)
> > 
> > Thanks,
> > -Takahiro AKASHI
> 
> Hi Takahiro,
> 
> as far I understand, the name should be atomic_add_return_wrap_relaxed
> since atomic_add_return_wrap is created by __atomic_op_fence (in order
> to put memory barrier around the call to
> atomic_add_return_wrap_relaxed) in include/linux/atomic.h.

# I know that this is not a "technical" issue.

My point is that _wrap would be the last suffix of the names of all
the functions including _relaxed variants for consistency.

Say, Elena's atomic-long.h defines:
===8<===
#define ATOMIC_LONG_ADD_SUB_OP(op, mo, suffix)                          \
static inline long                                                      \
atomic_long_##op##_return##mo##suffix(long i, atomic_long##suffix##_t *l)\
{                                                                       \
        ATOMIC_LONG_PFX(suffix##_t) *v = (ATOMIC_LONG_PFX(suffix##_t) *)l;\
                                                                        \
        return (long)ATOMIC_LONG_PFX(_##op##_return##mo##suffix)(i, v);\
}
ATOMIC_LONG_ADD_SUB_OP(add,,)
ATOMIC_LONG_ADD_SUB_OP(add, _relaxed,)
  ...
#ifdef CONFIG_HARDENED_ATOMIC
ATOMIC_LONG_ADD_SUB_OP(add,,_wrap)
  ...
===>8===

It would naturally lead to "atomic_long_add_relaxed_wrap" although
this function is not actually defined here.

> Am I wrong?

No, I'm not saying that you are wrong. It's a matter of the naming scheme.
We need some consensus.

Thanks,
-Takahiro AKASHI

> Thanks!
> 
> Colin
Colin Vidal Oct. 26, 2016, 8:20 a.m. UTC | #9
Hi Takahiro,

> > > > +#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
> > > > +	__ATOMIC_OP_RETURN(op, _wrap, c_op, asm_op, , )			\
> > > > +	__ATOMIC_OP_RETURN(op, , c_op, asm_op##s,			\
> > > > +			   __OVERFLOW_POST_RETURN, __OVERFLOW_EXTABLE)
> > > > +
> > > 
> > > This definition will create atomic_add_return_wrap_relaxed(),
> > > but should the name be atomic_add_return_relaxed_wrap()?
> > > 
> > > (I don't know we need _wrap version of _relaxed functions.
> > > See Elena's atomic-long.h.)
> > > 
> > > Thanks,
> > > -Takahiro AKASHI
> > 
> > Hi Takahiro,
> > 
> > as far I understand, the name should be atomic_add_return_wrap_relaxed
> > since atomic_add_return_wrap is created by __atomic_op_fence (in order
> > to put memory barrier around the call to
> > atomic_add_return_wrap_relaxed) in include/linux/atomic.h.
> 
> # I know that this is not a "technical" issue.

Oh ok, sorry, I misunderstood and I was confused.

> 
> My point is that _wrap would be the last suffix of the names of all
> the functions including _relaxed variants for consistency.
> 
> Say, Elena's atomic-long.h defines:
> ===8<===
> #define ATOMIC_LONG_ADD_SUB_OP(op, mo, suffix)                          \
> static inline long                                                      \
> atomic_long_##op##_return##mo##suffix(long i, atomic_long##suffix##_t *l)\
> {                                                                       \
>         ATOMIC_LONG_PFX(suffix##_t) *v = (ATOMIC_LONG_PFX(suffix##_t) *)l;\
>                                                                         \
>         return (long)ATOMIC_LONG_PFX(_##op##_return##mo##suffix)(i, v);\
> }
> ATOMIC_LONG_ADD_SUB_OP(add,,)
> ATOMIC_LONG_ADD_SUB_OP(add, _relaxed,)
>   ...
> #ifdef CONFIG_HARDENED_ATOMIC
> ATOMIC_LONG_ADD_SUB_OP(add,,_wrap)
>   ...
> ===>8===
> 
> It would naturally lead to "atomic_long_add_relaxed_wrap" although
> this function is not actually defined here.
> 

I understand your point, and the need of consistency. "_wrap" is
appended to any "user" function of the atomic system, and it is
mandatory to have a consistant name for it, I fully agree.

However, in the internal implementation of atomic, "_relaxed" is
appended to any function that will be bounded by memory barriers.
Therefore, a name like "atomic_add_return_wrap_relaxed" makes it easy
to see that this function will be embedded in another one. On the other
hand "atomic_add_return_relaxed_wrap" is less clear, since it suggest
that is a "user" function. I would involve a kind on inconsistency on
the naming of relaxed functions.

That said, I really don't know which is the "best" naming... :-)

Thanks!

Colin
Mark Rutland Oct. 27, 2016, 11:08 a.m. UTC | #10
On Wed, Oct 26, 2016 at 10:20:16AM +0200, Colin Vidal wrote:
> > My point is that _wrap would be the last suffix of the names of all
> > the functions including _relaxed variants for consistency.
> > 
> > Say, Elena's atomic-long.h defines:
> > ===8<===
> > #define ATOMIC_LONG_ADD_SUB_OP(op, mo, suffix)                          \
> > static inline long                                                      \
> > atomic_long_##op##_return##mo##suffix(long i, atomic_long##suffix##_t *l)\
> > {                                                                       \
> >         ATOMIC_LONG_PFX(suffix##_t) *v = (ATOMIC_LONG_PFX(suffix##_t) *)l;\
> >                                                                         \
> >         return (long)ATOMIC_LONG_PFX(_##op##_return##mo##suffix)(i, v);\
> > }
> > ATOMIC_LONG_ADD_SUB_OP(add,,)
> > ATOMIC_LONG_ADD_SUB_OP(add, _relaxed,)
> >   ...
> > #ifdef CONFIG_HARDENED_ATOMIC
> > ATOMIC_LONG_ADD_SUB_OP(add,,_wrap)
> >   ...
> > ===>8===
> > 
> > It would naturally lead to "atomic_long_add_relaxed_wrap" although
> > this function is not actually defined here.
> > 
> 
> I understand your point, and the need of consistency. "_wrap" is
> appended to any "user" function of the atomic system, and it is
> mandatory to have a consistant name for it, I fully agree.
> 
> However, in the internal implementation of atomic, "_relaxed" is
> appended to any function that will be bounded by memory barriers.
> Therefore, a name like "atomic_add_return_wrap_relaxed" makes it easy
> to see that this function will be embedded in another one. On the other
> hand "atomic_add_return_relaxed_wrap" is less clear, since it suggest
> that is a "user" function. I would involve a kind on inconsistency on
> the naming of relaxed functions.
> 
> That said, I really don't know which is the "best" naming... :-)

Given it looks like there's at least one necessary round of bikeshedding, it
would be best to discuss this with the usual atomic maintainers included, so as
to avoid several more rounds over subsequent postings. i.e.

[mark@leverpostej:~/src/linux]% ./scripts/get_maintainer.pl -f \
	include/asm-generic/atomic.h \
	include/asm-generic/atomic-long.h \
	include/linux/atomic.h
Arnd Bergmann <arnd@arndb.de> (maintainer:GENERIC INCLUDE/ASM HEADER FILES)
Ingo Molnar <mingo@kernel.org> (commit_signer:8/11=73%)
Peter Zijlstra <peterz@infradead.org> (commit_signer:6/11=55%,authored:5/11=45%,added_lines:491/671=73%,removed_lines:186/225=83%)
Frederic Weisbecker <fweisbec@gmail.com> (commit_signer:3/11=27%,authored:3/11=27%,added_lines:42/671=6%,removed_lines:21/225=9%)
Boqun Feng <boqun.feng@gmail.com> (commit_signer:1/11=9%,authored:1/11=9%)
Chris Metcalf <cmetcalf@ezchip.com> (commit_signer:1/11=9%)
Davidlohr Bueso <dave@stgolabs.net> (authored:1/11=9%,added_lines:128/671=19%)
linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES)
linux-kernel@vger.kernel.org (open list)

Thanks,
Mark.
Mark Rutland Oct. 27, 2016, 1:24 p.m. UTC | #11
Hi,

On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> This adds arm-specific code in order to support HARDENED_ATOMIC
> feature. When overflow is detected in atomic_t, atomic64_t or
> atomic_long_t, an exception is raised and call
> hardened_atomic_overflow.

I have some comments below, but for real review this needs to go via the
linux-arm-kernel list.

> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index 66d0e21..fdaee17 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -17,18 +17,52 @@
>  #include <linux/irqflags.h>
>  #include <asm/barrier.h>
>  #include <asm/cmpxchg.h>
> +#include <linux/bug.h>
>  
>  #define ATOMIC_INIT(i)	{ (i) }
>  
>  #ifdef __KERNEL__
>  
> +#ifdef CONFIG_HARDENED_ATOMIC
> +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"

Please put the immediate in a #define somewhere.

What about thumb2 kernels?

> +#define _ASM_EXTABLE(from, to)			\
> +	".pushsection __ex_table,\"a\"\n"	\
> +	".align 3\n"				\
> +	".long "#from","#to"\n"			\
> +	".popsection"
> +#define __OVERFLOW_POST				\
> +	"bvc 3f\n"				\
> +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> +	"3:\n"
> +#define __OVERFLOW_POST_RETURN			\
> +	"bvc 3f\n"				\
> +	"mov %0,%1\n"				\
> +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> +	"3:\n"
> +#define __OVERFLOW_EXTABLE			\
> +	"4:\n"					\
> +	_ASM_EXTABLE(2b, 4b)
> +#else
> +#define __OVERFLOW_POST
> +#define __OVERFLOW_POST_RETURN
> +#define __OVERFLOW_EXTABLE
> +#endif
> +

All this should live close to the assembly using it, to make it possible
to follow.

This may also not be the best way of structuring this code. The
additional indirection of passing this in at a high level makes it hard
to read and potentially fragile. For single instructions it was ok, but
I'm not so sure that it's ok for larger sequences like this.

> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 3a2e678..ce8ee00 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
>  	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
>  	struct siginfo info;
>  
> +#ifdef CONFIG_HARDENED_ATOMIC
> +	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {

You'll need to justify why this isn't in the ifsr_info table. It has the
same basic shape as the usual set of handlers.

I note that we don't seem to use SW breakpoints at all currently, and I
suspect there's a reason for that which we need to consider.

Also, if this *must* live here, please make it a static inline with an
empty stub, rather than an ifdef'd block.

> +		unsigned long pc = instruction_pointer(regs);
> +		unsigned int bkpt;
> +
> +		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
> +		    cpu_to_le32(bkpt) == 0xe12f1073) {

This appears to be the A1 encoding from the ARM ARM. What about the T1
encoding, i.e. thumb?

Regardless, *please* de-magic the number using a #define.

Also, this should be le32_to_cpu -- in the end we're treating the
coverted value as cpu-native. The variable itself should be a __le32.

Thanks,
Mark.

> +			current->thread.error_code = ifsr;
> +			current->thread.trap_no = 0;
> +			hardened_atomic_overflow(regs);
> +			fixup_exception(regs);
> +			return;
> +		}
> +	}
> +#endif
>  	if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
>  		return;
>  
> -- 
> 2.7.4
>
Kees Cook Oct. 27, 2016, 9:37 p.m. UTC | #12
On Thu, Oct 27, 2016 at 4:08 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Oct 26, 2016 at 10:20:16AM +0200, Colin Vidal wrote:
> Given it looks like there's at least one necessary round of bikeshedding, it
> would be best to discuss this with the usual atomic maintainers included, so as
> to avoid several more rounds over subsequent postings. i.e.
>
> [mark@leverpostej:~/src/linux]% ./scripts/get_maintainer.pl -f \
>         include/asm-generic/atomic.h \
>         include/asm-generic/atomic-long.h \
>         include/linux/atomic.h
> Arnd Bergmann <arnd@arndb.de> (maintainer:GENERIC INCLUDE/ASM HEADER FILES)
> Ingo Molnar <mingo@kernel.org> (commit_signer:8/11=73%)
> Peter Zijlstra <peterz@infradead.org> (commit_signer:6/11=55%,authored:5/11=45%,added_lines:491/671=73%,removed_lines:186/225=83%)
> Frederic Weisbecker <fweisbec@gmail.com> (commit_signer:3/11=27%,authored:3/11=27%,added_lines:42/671=6%,removed_lines:21/225=9%)
> Boqun Feng <boqun.feng@gmail.com> (commit_signer:1/11=9%,authored:1/11=9%)
> Chris Metcalf <cmetcalf@ezchip.com> (commit_signer:1/11=9%)
> Davidlohr Bueso <dave@stgolabs.net> (authored:1/11=9%,added_lines:128/671=19%)
> linux-arch@vger.kernel.org (open list:GENERIC INCLUDE/ASM HEADER FILES)
> linux-kernel@vger.kernel.org (open list)

Yeah, I'll be bringing the hardened atomic patch up during kernel
summit, but I think the RFC has gotten to the point where we need to
loop in more maintainers.

Thanks!

-Kees
AKASHI Takahiro Oct. 28, 2016, 5:18 a.m. UTC | #13
On Thu, Oct 27, 2016 at 02:24:36PM +0100, Mark Rutland wrote:
> Hi,
> 
> On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> > This adds arm-specific code in order to support HARDENED_ATOMIC
> > feature. When overflow is detected in atomic_t, atomic64_t or
> > atomic_long_t, an exception is raised and call
> > hardened_atomic_overflow.
> 
> I have some comments below, but for real review this needs to go via the
> linux-arm-kernel list.

Yeah, definitely, but

> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > index 66d0e21..fdaee17 100644
> > --- a/arch/arm/include/asm/atomic.h
> > +++ b/arch/arm/include/asm/atomic.h
> > @@ -17,18 +17,52 @@
> >  #include <linux/irqflags.h>
> >  #include <asm/barrier.h>
> >  #include <asm/cmpxchg.h>
> > +#include <linux/bug.h>
> >  
> >  #define ATOMIC_INIT(i)	{ (i) }
> >  
> >  #ifdef __KERNEL__
> >  
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> 
> Please put the immediate in a #define somewhere.
> 
> What about thumb2 kernels?
> 
> > +#define _ASM_EXTABLE(from, to)			\
> > +	".pushsection __ex_table,\"a\"\n"	\
> > +	".align 3\n"				\
> > +	".long "#from","#to"\n"			\
> > +	".popsection"
> > +#define __OVERFLOW_POST				\
> > +	"bvc 3f\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_POST_RETURN			\
> > +	"bvc 3f\n"				\
> > +	"mov %0,%1\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_EXTABLE			\
> > +	"4:\n"					\
> > +	_ASM_EXTABLE(2b, 4b)
> > +#else
> > +#define __OVERFLOW_POST
> > +#define __OVERFLOW_POST_RETURN
> > +#define __OVERFLOW_EXTABLE
> > +#endif
> > +
> 
> All this should live close to the assembly using it, to make it possible
> to follow.
> 
> This may also not be the best way of structuring this code. The
> additional indirection of passing this in at a high level makes it hard
> to read and potentially fragile. For single instructions it was ok, but
> I'm not so sure that it's ok for larger sequences like this.

I did the similar thing for arm64 port (ll/sc version) as the current
macros are already complicated and I have no other better idea to
generate definitions for protected and _wrap versions equally.

See below. What are different from Colin's arm port are:
  * control __HARDENED_ATOMIC_CHECK/__CL* directly instead of passing them
    as an argument
  * modify regs->pc directly in a handler to skip "brk" (not shown in this
    hunk) instead of using _ASM_EXTABLE (& fixup_exception())

Anyway, I'm putting off posting my arm64 port while some discussions are
going on against Elena's x86 patch.

Thanks,
-Takahiro AKASHI
===8<===
#define ATOMIC_OP(op, asm_op, wrap, cl)					\
  ...

#define ATOMIC_OP_RETURN(name, mb, acq, rel, op, asm_op, wrap, cl)	\
__LL_SC_INLINE int							\
__LL_SC_PREFIX(atomic_##op##_return##wrap##name(int i, atomic##wrap##_t *v))\
{									\
	unsigned long tmp;						\
	int result;							\
									\
	asm volatile("// atomic_" #op "_return" #name "\n"		\
"	prfm	pstl1strm, %2\n"					\
"1:	ld" #acq "xr	%w0, %2\n"					\
"	" #asm_op "	%w0, %w0, %w3\n"				\
	__HARDENED_ATOMIC_CHECK						\
"	st" #rel "xr	%w1, %w0, %2\n"					\
"	cbnz	%w1, 1b\n"						\
"	" #mb "\n"							\
"4:"									\
	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
	: "Ir" (i)							\
	: cl);								\
									\
	return result;							\
}									\
__LL_SC_EXPORT(atomic_##op##_return##wrap##name);

#define ATOMIC_FETCH_OP(name, mb, acq, rel, op, asm_op, wrap, cl)	\
  ...

#define ATOMIC_OPS(...)							\
	ATOMIC_OP(__VA_ARGS__, __CL)					\
	ATOMIC_OP_RETURN(        , dmb ish,  , l, __VA_ARGS__, __CL_MEM)\
	ATOMIC_OP_RETURN(_relaxed,        ,  ,  , __VA_ARGS__, __CL)	\
	ATOMIC_OP_RETURN(_acquire,        , a,  , __VA_ARGS__, __CL_MEM)\
	ATOMIC_OP_RETURN(_release,        ,  , l, __VA_ARGS__, __CL_MEM)\
	ATOMIC_FETCH_OP (        , dmb ish,  , l, __VA_ARGS__, __CL_MEM)\
	ATOMIC_FETCH_OP (_relaxed,        ,  ,  , __VA_ARGS__, __CL)	\
	ATOMIC_FETCH_OP (_acquire,        , a,  , __VA_ARGS__, __CL_MEM)\
	ATOMIC_FETCH_OP (_release,        ,  , l, __VA_ARGS__, __CL_MEM)

#ifdef CONFIG_HARDENED_ATOMIC
#define __HARDENED_ATOMIC_CHECK						\
"	bvc	3f\n"							\
"2:	brk	" __stringify(BUG_ATOMIC_OVERFLOW_BRK_IMM) "\n"		\
"	b	4f\n"							\
"3:"
#define __CL     "cc"
#define __CL_MEM "cc", "memory"

ATOMIC_OPS(add, adds,      )
ATOMIC_OPS(sub, subs,      )
#else
#define __HARDENED_ATOMIC_CHECK
#define __CL
#define __CL_MEM

ATOMIC_OPS(add,  add,      )
ATOMIC_OPS(sub,  sub,      )
#endif

#undef __HARDENED_ATOMIC_CHECK
#define __HARDENED_ATOMIC_CHECK
#undef __CL
#undef __CL_MEM
#define __CL
#define __CL_MEM "memory"
ATOMIC_OPS(add,  add, _wrap)
ATOMIC_OPS(sub,  sub, _wrap)
===>8===


> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 3a2e678..ce8ee00 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
> >  	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
> >  	struct siginfo info;
> >  
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
> 
> You'll need to justify why this isn't in the ifsr_info table. It has the
> same basic shape as the usual set of handlers.
> 
> I note that we don't seem to use SW breakpoints at all currently, and I
> suspect there's a reason for that which we need to consider.
> 
> Also, if this *must* live here, please make it a static inline with an
> empty stub, rather than an ifdef'd block.
> 
> > +		unsigned long pc = instruction_pointer(regs);
> > +		unsigned int bkpt;
> > +
> > +		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
> > +		    cpu_to_le32(bkpt) == 0xe12f1073) {
> 
> This appears to be the A1 encoding from the ARM ARM. What about the T1
> encoding, i.e. thumb?
> 
> Regardless, *please* de-magic the number using a #define.
> 
> Also, this should be le32_to_cpu -- in the end we're treating the
> coverted value as cpu-native. The variable itself should be a __le32.
> 
> Thanks,
> Mark.
> 
> > +			current->thread.error_code = ifsr;
> > +			current->thread.trap_no = 0;
> > +			hardened_atomic_overflow(regs);
> > +			fixup_exception(regs);
> > +			return;
> > +		}
> > +	}
> > +#endif
> >  	if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
> >  		return;
> >  
> > -- 
> > 2.7.4
> >
Colin Vidal Oct. 28, 2016, 8:33 a.m. UTC | #14
Hi Mark,

On Thu, 2016-10-27 at 14:24 +0100, Mark Rutland wrote:
> Hi,
> 
> On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> > 
> > This adds arm-specific code in order to support HARDENED_ATOMIC
> > feature. When overflow is detected in atomic_t, atomic64_t or
> > atomic_long_t, an exception is raised and call
> > hardened_atomic_overflow.
> 
> I have some comments below, but for real review this needs to go via the
> linux-arm-kernel list.

Thanks a lot for that! Yep, I will CC linux-arm-kernel, linux-kernel
and maintainers of atomic/arm for the next RFC.

I take info account your remarks for the next RFC, but I have some
questions for some of them bellow.

> > 
> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > index 66d0e21..fdaee17 100644
> > --- a/arch/arm/include/asm/atomic.h
> > +++ b/arch/arm/include/asm/atomic.h
> > @@ -17,18 +17,52 @@
> >  #include <linux/irqflags.h>
> >  #include <asm/barrier.h>
> >  #include <asm/cmpxchg.h>
> > +#include <linux/bug.h>
> >  
> >  #define ATOMIC_INIT(i)	{ (i) }
> >  
> >  #ifdef __KERNEL__
> >  
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> 
> Please put the immediate in a #define somewhere.

I an not sure to get what do you mean here. 0xf103 should be in a
#define? (as for the A1 encoded version, for sure).

> 
> What about thumb2 kernels?
> 
> > 
> > +#define _ASM_EXTABLE(from, to)			\
> > +	".pushsection __ex_table,\"a\"\n"	\
> > +	".align 3\n"				\
> > +	".long "#from","#to"\n"			\
> > +	".popsection"
> > +#define __OVERFLOW_POST				\
> > +	"bvc 3f\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_POST_RETURN			\
> > +	"bvc 3f\n"				\
> > +	"mov %0,%1\n"				\
> > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > +	"3:\n"
> > +#define __OVERFLOW_EXTABLE			\
> > +	"4:\n"					\
> > +	_ASM_EXTABLE(2b, 4b)
> > +#else
> > +#define __OVERFLOW_POST
> > +#define __OVERFLOW_POST_RETURN
> > +#define __OVERFLOW_EXTABLE
> > +#endif
> > +
> All this should live close to the assembly using it, to make it possible
> to follow.
> 
> This may also not be the best way of structuring this code. The
> additional indirection of passing this in at a high level makes it hard
> to read and potentially fragile. For single instructions it was ok, but
> I'm not so sure that it's ok for larger sequences like this.
> 

I agree that is quite difficult to follow, but as Takahiro said, the
current macro are already complex. Still, I will try to put those
definitions closer of the code using them (for instance, just before
the macro expansions?), but I am not sure that would change anything:
the code using it is broke up in different area of the file anyways.

Besides, I really would avoid using an extable entry, since the fixup
address is never used, I am not sure it is mandatory (and it seems
Takahiro does not using it, too).

> > 
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 3a2e678..ce8ee00 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
> >  	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
> >  	struct siginfo info;
> >  
> > +#ifdef CONFIG_HARDENED_ATOMIC
> > +	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
> 
> You'll need to justify why this isn't in the ifsr_info table. It has the
> same basic shape as the usual set of handlers.
> 
> I note that we don't seem to use SW breakpoints at all currently, and I
> suspect there's a reason for that which we need to consider.
> 
> Also, if this *must* live here, please make it a static inline with an
> empty stub, rather than an ifdef'd block.
> 
> > 
> > +		unsigned long pc = instruction_pointer(regs);
> > +		unsigned int bkpt;
> > +
> > +		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
> > +		    cpu_to_le32(bkpt) == 0xe12f1073) {
> 
> This appears to be the A1 encoding from the ARM ARM. What about the T1
> encoding, i.e. thumb?
> 
> Regardless, *please* de-magic the number using a #define.
> 
> Also, this should be le32_to_cpu -- in the end we're treating the
> coverted value as cpu-native. The variable itself should be a __le32.
> 

I must confess that I am still very confused about the code in fault.c
(and the ifsr_info table). I will take time to try to understand a
little bit more that code before submitting a new RFC. Still, it you
have some pointers/documentations about it, I would be grateful.

Thanks!

Colin
Mark Rutland Oct. 28, 2016, 10:20 a.m. UTC | #15
On Fri, Oct 28, 2016 at 10:33:15AM +0200, Colin Vidal wrote:
> On Thu, 2016-10-27 at 14:24 +0100, Mark Rutland wrote:
> > On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
> > > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
> > 
> > Please put the immediate in a #define somewhere.
> 
> I an not sure to get what do you mean here. 0xf103 should be in a
> #define? (as for the A1 encoded version, for sure).

I mean have something like:

#ifdef CONFIG_THUMB2_KERNEL
#define ATOMIC_BKPT_IMM 0xf1
#else
#define ATOMIC_BKPT_IMM	0xf103
#endif

With code having something like:

HARDENED_ATOMIC_INSN "bkpt" #ATOMIC_BKPT_IMM

... or passing that in via an %i template to the asm.

That way, we can also build the encoding for the exception path from the
same immediate. That will keep the two in sync, making the relationship
between the two obvious. e.g. first add something like arm_get_bkpt(imm)
to <asm/insn.h>.

> > > +#define _ASM_EXTABLE(from, to)			\
> > > +	".pushsection __ex_table,\"a\"\n"	\
> > > +	".align 3\n"				\
> > > +	".long "#from","#to"\n"			\
> > > +	".popsection"
> > > +#define __OVERFLOW_POST				\
> > > +	"bvc 3f\n"				\
> > > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > > +	"3:\n"
> > > +#define __OVERFLOW_POST_RETURN			\
> > > +	"bvc 3f\n"				\
> > > +	"mov %0,%1\n"				\
> > > +	"2: "HARDENED_ATOMIC_INSN"\n"		\
> > > +	"3:\n"
> > > +#define __OVERFLOW_EXTABLE			\
> > > +	"4:\n"					\
> > > +	_ASM_EXTABLE(2b, 4b)
> > > +#else
> > > +#define __OVERFLOW_POST
> > > +#define __OVERFLOW_POST_RETURN
> > > +#define __OVERFLOW_EXTABLE
> > > +#endif
> > > +
> > All this should live close to the assembly using it, to make it possible
> > to follow.
> > 
> > This may also not be the best way of structuring this code. The
> > additional indirection of passing this in at a high level makes it hard
> > to read and potentially fragile. For single instructions it was ok, but
> > I'm not so sure that it's ok for larger sequences like this.
> 
> I agree that is quite difficult to follow, but as Takahiro said, the
> current macro are already complex. Still, I will try to put those
> definitions closer of the code using them (for instance, just before
> the macro expansions?), but I am not sure that would change anything:
> the code using it is broke up in different area of the file anyways.

I'm not sure that's true. IIRC each of the cases above is only used by
one template case, and we could instead fold that into the template.
For example, if we had something like the ALTERNATIVE macros that worked
on some boolean passed in, so you could have:

#define __foo_asm_template(__bool, params ...)			\
	asm(							\
	"insn reg, whatever"					\
	TEMPLATE(__bool, "code_if_bool", "code if_not_bool")	\
	"more insns..."						\
	clobbers_etc)

That way all the code is in one place, and easier to follow, and we can
generate both the instrumented and non-instrumented variants from the
same template.

> Besides, I really would avoid using an extable entry, since the fixup
> address is never used, I am not sure it is mandatory (and it seems
> Takahiro does not using it, too).

From the grsecurity blog post [1], per the comments in the ARM detection
logic, this is used to continue after warning (and not incrementing) in
the overflow case.

What do we do differently in the overflow case?

> > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > > index 3a2e678..ce8ee00 100644
> > > --- a/arch/arm/mm/fault.c
> > > +++ b/arch/arm/mm/fault.c
> > > @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
> > >  	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
> > >  	struct siginfo info;
> > >  
> > > +#ifdef CONFIG_HARDENED_ATOMIC
> > > +	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
> > 
> > You'll need to justify why this isn't in the ifsr_info table. It has the
> > same basic shape as the usual set of handlers.
> > 
> > I note that we don't seem to use SW breakpoints at all currently, and I
> > suspect there's a reason for that which we need to consider.
> > 
> > Also, if this *must* live here, please make it a static inline with an
> > empty stub, rather than an ifdef'd block.
> > 
> > > 
> > > +		unsigned long pc = instruction_pointer(regs);
> > > +		unsigned int bkpt;
> > > +
> > > +		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
> > > +		    cpu_to_le32(bkpt) == 0xe12f1073) {
> > 
> > This appears to be the A1 encoding from the ARM ARM. What about the T1
> > encoding, i.e. thumb?
> > 
> > Regardless, *please* de-magic the number using a #define.
> > 
> > Also, this should be le32_to_cpu -- in the end we're treating the
> > coverted value as cpu-native. The variable itself should be a __le32.
> 
> I must confess that I am still very confused about the code in fault.c
> (and the ifsr_info table). I will take time to try to understand a
> little bit more that code before submitting a new RFC. Still, it you
> have some pointers/documentations about it, I would be grateful.

Unfortuantely, I'm not aware of any documentation for this code. The
ifsr_info table is just a set of handlers indexed by the ifsr value.

Ignoring the conflict with the HW breakpoint handler, you'd be able to
install this in the FAULT_CODE_DEBUG slot of ifsr_info.

Thanks,
Mark.

[1] https://forums.grsecurity.net/viewtopic.php?f=7&t=4173
David Windsor Oct. 28, 2016, 10:59 a.m. UTC | #16
On Fri, Oct 28, 2016 at 6:20 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Oct 28, 2016 at 10:33:15AM +0200, Colin Vidal wrote:
>> On Thu, 2016-10-27 at 14:24 +0100, Mark Rutland wrote:
>> > On Tue, Oct 18, 2016 at 04:59:21PM +0200, Colin Vidal wrote:
>> > > +#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
>> >
>> > Please put the immediate in a #define somewhere.
>>
>> I an not sure to get what do you mean here. 0xf103 should be in a
>> #define? (as for the A1 encoded version, for sure).
>
> I mean have something like:
>
> #ifdef CONFIG_THUMB2_KERNEL
> #define ATOMIC_BKPT_IMM 0xf1
> #else
> #define ATOMIC_BKPT_IMM 0xf103
> #endif
>
> With code having something like:
>
> HARDENED_ATOMIC_INSN "bkpt" #ATOMIC_BKPT_IMM
>
> ... or passing that in via an %i template to the asm.
>
> That way, we can also build the encoding for the exception path from the
> same immediate. That will keep the two in sync, making the relationship
> between the two obvious. e.g. first add something like arm_get_bkpt(imm)
> to <asm/insn.h>.
>
>> > > +#define _ASM_EXTABLE(from, to)                   \
>> > > + ".pushsection __ex_table,\"a\"\n"       \
>> > > + ".align 3\n"                            \
>> > > + ".long "#from","#to"\n"                 \
>> > > + ".popsection"
>> > > +#define __OVERFLOW_POST                          \
>> > > + "bvc 3f\n"                              \
>> > > + "2: "HARDENED_ATOMIC_INSN"\n"           \
>> > > + "3:\n"
>> > > +#define __OVERFLOW_POST_RETURN                   \
>> > > + "bvc 3f\n"                              \
>> > > + "mov %0,%1\n"                           \
>> > > + "2: "HARDENED_ATOMIC_INSN"\n"           \
>> > > + "3:\n"
>> > > +#define __OVERFLOW_EXTABLE                       \
>> > > + "4:\n"                                  \
>> > > + _ASM_EXTABLE(2b, 4b)
>> > > +#else
>> > > +#define __OVERFLOW_POST
>> > > +#define __OVERFLOW_POST_RETURN
>> > > +#define __OVERFLOW_EXTABLE
>> > > +#endif
>> > > +
>> > All this should live close to the assembly using it, to make it possible
>> > to follow.
>> >
>> > This may also not be the best way of structuring this code. The
>> > additional indirection of passing this in at a high level makes it hard
>> > to read and potentially fragile. For single instructions it was ok, but
>> > I'm not so sure that it's ok for larger sequences like this.
>>
>> I agree that is quite difficult to follow, but as Takahiro said, the
>> current macro are already complex. Still, I will try to put those
>> definitions closer of the code using them (for instance, just before
>> the macro expansions?), but I am not sure that would change anything:
>> the code using it is broke up in different area of the file anyways.
>
> I'm not sure that's true. IIRC each of the cases above is only used by
> one template case, and we could instead fold that into the template.
> For example, if we had something like the ALTERNATIVE macros that worked
> on some boolean passed in, so you could have:
>
> #define __foo_asm_template(__bool, params ...)                  \
>         asm(                                                    \
>         "insn reg, whatever"                                    \
>         TEMPLATE(__bool, "code_if_bool", "code if_not_bool")    \
>         "more insns..."                                         \
>         clobbers_etc)
>
> That way all the code is in one place, and easier to follow, and we can
> generate both the instrumented and non-instrumented variants from the
> same template.
>
>> Besides, I really would avoid using an extable entry, since the fixup
>> address is never used, I am not sure it is mandatory (and it seems
>> Takahiro does not using it, too).
>
> From the grsecurity blog post [1], per the comments in the ARM detection
> logic, this is used to continue after warning (and not incrementing) in
> the overflow case.
>
> What do we do differently in the overflow case?
>

In the overflow case, exception/trap code is called, which invokes
hardened_atomic_report_overflow(), which then calls BUG().  At least
on x86.

>> > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
>> > > index 3a2e678..ce8ee00 100644
>> > > --- a/arch/arm/mm/fault.c
>> > > +++ b/arch/arm/mm/fault.c
>> > > @@ -580,6 +580,21 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
>> > >   const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
>> > >   struct siginfo info;
>> > >
>> > > +#ifdef CONFIG_HARDENED_ATOMIC
>> > > + if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
>> >
>> > You'll need to justify why this isn't in the ifsr_info table. It has the
>> > same basic shape as the usual set of handlers.
>> >
>> > I note that we don't seem to use SW breakpoints at all currently, and I
>> > suspect there's a reason for that which we need to consider.
>> >
>> > Also, if this *must* live here, please make it a static inline with an
>> > empty stub, rather than an ifdef'd block.
>> >
>> > >
>> > > +         unsigned long pc = instruction_pointer(regs);
>> > > +         unsigned int bkpt;
>> > > +
>> > > +         if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
>> > > +             cpu_to_le32(bkpt) == 0xe12f1073) {
>> >
>> > This appears to be the A1 encoding from the ARM ARM. What about the T1
>> > encoding, i.e. thumb?
>> >
>> > Regardless, *please* de-magic the number using a #define.
>> >
>> > Also, this should be le32_to_cpu -- in the end we're treating the
>> > coverted value as cpu-native. The variable itself should be a __le32.
>>
>> I must confess that I am still very confused about the code in fault.c
>> (and the ifsr_info table). I will take time to try to understand a
>> little bit more that code before submitting a new RFC. Still, it you
>> have some pointers/documentations about it, I would be grateful.
>
> Unfortuantely, I'm not aware of any documentation for this code. The
> ifsr_info table is just a set of handlers indexed by the ifsr value.
>
> Ignoring the conflict with the HW breakpoint handler, you'd be able to
> install this in the FAULT_CODE_DEBUG slot of ifsr_info.
>
> Thanks,
> Mark.
>
> [1] https://forums.grsecurity.net/viewtopic.php?f=7&t=4173
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b5d529f..fcf4a64 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -36,6 +36,7 @@  config ARM
 	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
 	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
 	select HAVE_ARCH_HARDENED_USERCOPY
+	select HAVE_ARCH_HARDENED_ATOMIC
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index 66d0e21..fdaee17 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -17,18 +17,52 @@ 
 #include <linux/irqflags.h>
 #include <asm/barrier.h>
 #include <asm/cmpxchg.h>
+#include <linux/bug.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
 
 #ifdef __KERNEL__
 
+#ifdef CONFIG_HARDENED_ATOMIC
+#define HARDENED_ATOMIC_INSN "bkpt 0xf103"
+#define _ASM_EXTABLE(from, to)			\
+	".pushsection __ex_table,\"a\"\n"	\
+	".align 3\n"				\
+	".long "#from","#to"\n"			\
+	".popsection"
+#define __OVERFLOW_POST				\
+	"bvc 3f\n"				\
+	"2: "HARDENED_ATOMIC_INSN"\n"		\
+	"3:\n"
+#define __OVERFLOW_POST_RETURN			\
+	"bvc 3f\n"				\
+	"mov %0,%1\n"				\
+	"2: "HARDENED_ATOMIC_INSN"\n"		\
+	"3:\n"
+#define __OVERFLOW_EXTABLE			\
+	"4:\n"					\
+	_ASM_EXTABLE(2b, 4b)
+#else
+#define __OVERFLOW_POST
+#define __OVERFLOW_POST_RETURN
+#define __OVERFLOW_EXTABLE
+#endif
+
 /*
  * On ARM, ordinary assignment (str instruction) doesn't clear the local
  * strex/ldrex monitor on some implementations. The reason we can use it for
  * atomic_set() is the clrex or dummy strex done on every exception return.
  */
 #define atomic_read(v)	READ_ONCE((v)->counter)
+static inline int atomic_read_wrap(const atomic_wrap_t *v)
+{
+	return atomic_read(v);
+}
 #define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
+static inline void atomic_set_wrap(atomic_wrap_t *v, int i)
+{
+	atomic_set(v, i);
+}
 
 #if __LINUX_ARM_ARCH__ >= 6
 
@@ -38,38 +72,46 @@ 
  * to ensure that the update happens.
  */
 
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static inline void atomic_##op(int i, atomic_t *v)			\
+#define __ATOMIC_OP(op, suffix, c_op, asm_op, post_op, extable)		\
+static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	prefetchw(&v->counter);						\
-	__asm__ __volatile__("@ atomic_" #op "\n"			\
+	__asm__ __volatile__("@ atomic_" #op #suffix "\n"		\
 "1:	ldrex	%0, [%3]\n"						\
 "	" #asm_op "	%0, %0, %4\n"					\
+        post_op                 					\
 "	strex	%1, %0, [%3]\n"						\
 "	teq	%1, #0\n"						\
-"	bne	1b"							\
+"	bne	1b\n"							\
+        extable                 					\
 	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
 	: "r" (&v->counter), "Ir" (i)					\
 	: "cc");							\
 }									\
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
+#define ATOMIC_OP(op, c_op, asm_op)		                        \
+	__ATOMIC_OP(op, _wrap, c_op, asm_op, , )			\
+	__ATOMIC_OP(op, , c_op, asm_op##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
+
+#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op, post_op, extable)	\
+static inline int atomic_##op##_return##suffix##_relaxed(int i, atomic##suffix##_t *v) \
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	prefetchw(&v->counter);						\
 									\
-	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
+	__asm__ __volatile__("@ atomic_" #op "_return" #suffix "\n"	\
 "1:	ldrex	%0, [%3]\n"						\
 "	" #asm_op "	%0, %0, %4\n"					\
+        post_op                 					\
 "	strex	%1, %0, [%3]\n"						\
 "	teq	%1, #0\n"						\
-"	bne	1b"							\
+"	bne	1b\n"							\
+        extable                 					\
 	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
 	: "r" (&v->counter), "Ir" (i)					\
 	: "cc");							\
@@ -77,6 +119,11 @@  static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
 	return result;							\
 }
 
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
+	__ATOMIC_OP_RETURN(op, _wrap, c_op, asm_op, , )			\
+	__ATOMIC_OP_RETURN(op, , c_op, asm_op##s,			\
+			   __OVERFLOW_POST_RETURN, __OVERFLOW_EXTABLE)
+
 #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
 static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
 {									\
@@ -108,26 +155,34 @@  static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
 #define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
 #define atomic_fetch_xor_relaxed	atomic_fetch_xor_relaxed
 
-static inline int atomic_cmpxchg_relaxed(atomic_t *ptr, int old, int new)
-{
-	int oldval;
-	unsigned long res;
-
-	prefetchw(&ptr->counter);
-
-	do {
-		__asm__ __volatile__("@ atomic_cmpxchg\n"
-		"ldrex	%1, [%3]\n"
-		"mov	%0, #0\n"
-		"teq	%1, %4\n"
-		"strexeq %0, %5, [%3]\n"
-		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
-		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
-		    : "cc");
-	} while (res);
-
-	return oldval;
+#define __ATOMIC_CMPXCHG_RELAXED(suffix)			       	\
+static inline int atomic_cmpxchg##suffix##_relaxed(atomic##suffix##_t *ptr, \
+						   int old, int new)	\
+{									\
+	int oldval;                                                     \
+	unsigned long res;                                              \
+									\
+	prefetchw(&ptr->counter);					\
+									\
+	do {								\
+	        __asm__ __volatile__("@ atomic_cmpxchg" #suffix "\n"	\
+		"ldrex	%1, [%3]\n"					\
+		"mov	%0, #0\n"					\
+		"teq	%1, %4\n"					\
+		"strexeq %0, %5, [%3]\n"				\
+		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter) \
+		    : "r" (&ptr->counter), "Ir" (old), "r" (new)        \
+		    : "cc");                                            \
+	} while (res);							\
+									\
+	return oldval;							\
 }
+
+__ATOMIC_CMPXCHG_RELAXED()
+__ATOMIC_CMPXCHG_RELAXED(_wrap)
+
+#undef __ATOMIC_CMPXCHG_RELAXED
+
 #define atomic_cmpxchg_relaxed		atomic_cmpxchg_relaxed
 
 static inline int __atomic_add_unless(atomic_t *v, int a, int u)
@@ -141,12 +196,21 @@  static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 	__asm__ __volatile__ ("@ atomic_add_unless\n"
 "1:	ldrex	%0, [%4]\n"
 "	teq	%0, %5\n"
-"	beq	2f\n"
-"	add	%1, %0, %6\n"
+"	beq	4f\n"
+"	adds	%1, %0, %6\n"
+
+#ifdef CONFIG_HARDENED_ATOMIC
+"       bvc 3f\n"
+"2:     "HARDENED_ATOMIC_INSN"\n"
+"3:\n"
+#endif
 "	strex	%2, %1, [%4]\n"
 "	teq	%2, #0\n"
 "	bne	1b\n"
-"2:"
+"4:"
+#ifdef CONFIG_HARDENED_ATOMIC
+        _ASM_EXTABLE(2b, 4b)
+#endif
 	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (u), "r" (a)
 	: "cc");
@@ -163,8 +227,8 @@  static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 #error SMP not supported on pre-ARMv6 CPUs
 #endif
 
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static inline void atomic_##op(int i, atomic_t *v)			\
+#define __ATOMIC_OP(op, suffix, c_op, asm_op)				\
+static inline void atomic_##op##suffix(int i, atomic##suffix##_t *v)	\
 {									\
 	unsigned long flags;						\
 									\
@@ -173,8 +237,12 @@  static inline void atomic_##op(int i, atomic_t *v)			\
 	raw_local_irq_restore(flags);					\
 }									\
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int atomic_##op##_return(int i, atomic_t *v)		\
+#define ATOMIC_OP(op, c_op, asm_op)		                        \
+	__ATOMIC_OP(op, _wrap, c_op, asm_op)				\
+	__ATOMIC_OP(op, , c_op, asm_op)
+
+#define __ATOMIC_OP_RETURN(op, suffix, c_op, asm_op)			\
+static inline int atomic_##op##_return##suffix(int i, atomic##suffix##_t *v) \
 {									\
 	unsigned long flags;						\
 	int val;							\
@@ -187,6 +255,10 @@  static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	return val;							\
 }
 
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)	                        \
+	__ATOMIC_OP_RETURN(op, wrap, c_op, asm_op)			\
+	__ATOMIC_OP_RETURN(op, , c_op, asm_op)
+
 #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
 static inline int atomic_fetch_##op(int i, atomic_t *v)			\
 {									\
@@ -215,6 +287,11 @@  static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
 	return ret;
 }
 
+static inline int atomic_cmpxchg_wrap(atomic_wrap_t *v, int old, int new)
+{
+	return atomic_cmpxchg((atomic_t *)v, old, new);
+}
+
 static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 {
 	int c, old;
@@ -227,6 +304,11 @@  static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 
 #endif /* __LINUX_ARM_ARCH__ */
 
+static inline int __atomic_add_unless_wrap(atomic_wrap_t *v, int a, int u)
+{
+	return __atomic_add_unless((atomic_t *)v, a, u);
+}
+
 #define ATOMIC_OPS(op, c_op, asm_op)					\
 	ATOMIC_OP(op, c_op, asm_op)					\
 	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
@@ -250,18 +332,30 @@  ATOMIC_OPS(xor, ^=, eor)
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
+#undef __ATOMIC_OP_RETURN
 #undef ATOMIC_OP
+#undef __ATOMIC_OP
 
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-
+#define atomic_xchg_wrap(v, new) atomic_xchg(v, new)
 #define atomic_inc(v)		atomic_add(1, v)
+static inline void atomic_inc_wrap(atomic_wrap_t *v)
+{
+	atomic_add_wrap(1, v);
+}
 #define atomic_dec(v)		atomic_sub(1, v)
+static inline void atomic_dec_wrap(atomic_wrap_t *v)
+{
+	atomic_sub_wrap(1, v);
+}
 
 #define atomic_inc_and_test(v)	(atomic_add_return(1, v) == 0)
 #define atomic_dec_and_test(v)	(atomic_sub_return(1, v) == 0)
 #define atomic_inc_return_relaxed(v)    (atomic_add_return_relaxed(1, v))
+#define atomic_inc_return_wrap_relaxed(v) (atomic_add_return_wrap_relaxed(1, v))
 #define atomic_dec_return_relaxed(v)    (atomic_sub_return_relaxed(1, v))
 #define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
+#define atomic_sub_and_test_wrap(i, v) (atomic_sub_return_wrap(i, v) == 0)
 
 #define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
 
@@ -270,62 +364,81 @@  typedef struct {
 	long long counter;
 } atomic64_t;
 
+#ifdef CONFIG_HARDENED_ATOMIC
+typedef struct {
+	long long counter;
+} atomic64_wrap_t;
+#else
+typedef atomic64_t atomic64_wrap_t;
+#endif
+
 #define ATOMIC64_INIT(i) { (i) }
 
-#ifdef CONFIG_ARM_LPAE
-static inline long long atomic64_read(const atomic64_t *v)
-{
-	long long result;
+#define __ATOMIC64_READ(suffix, asm_op)					\
+static inline long long						        \
+atomic64_read##suffix(const atomic64##suffix##_t *v)			\
+{						                        \
+	long long result;                                               \
+									\
+	__asm__ __volatile__("@ atomic64_read" #suffix "\n"		\
+"	" #asm_op " %0, %H0, [%1]"					        \
+	: "=&r" (result)						\
+	: "r" (&v->counter), "Qo" (v->counter)	                        \
+	);         							\
+									\
+	return result;							\
+}
 
-	__asm__ __volatile__("@ atomic64_read\n"
-"	ldrd	%0, %H0, [%1]"
-	: "=&r" (result)
-	: "r" (&v->counter), "Qo" (v->counter)
-	);
+#ifdef CONFIG_ARM_LPAE
+__ATOMIC64_READ(, ldrd)
+__ATOMIC64_READ(wrap, ldrd)
 
-	return result;
+#define __ATOMIC64_SET(suffix)					        \
+static inline void atomic64_set##suffix(atomic64##suffix##_t *v, long long i) \
+{									\
+        __asm__ __volatile__("@ atomic64_set" #suffix "\n"		\
+"	strd	%2, %H2, [%1]"					        \
+	: "=Qo" (v->counter)						\
+	: "r" (&v->counter), "r" (i)		                        \
+	);							        \
 }
 
-static inline void atomic64_set(atomic64_t *v, long long i)
-{
-	__asm__ __volatile__("@ atomic64_set\n"
-"	strd	%2, %H2, [%1]"
-	: "=Qo" (v->counter)
-	: "r" (&v->counter), "r" (i)
-	);
-}
-#else
-static inline long long atomic64_read(const atomic64_t *v)
-{
-	long long result;
+__ATOMIC64_SET()
+__ATOMIC64_SET(_wrap)
 
-	__asm__ __volatile__("@ atomic64_read\n"
-"	ldrexd	%0, %H0, [%1]"
-	: "=&r" (result)
-	: "r" (&v->counter), "Qo" (v->counter)
-	);
+#undef __ATOMIC64
 
-	return result;
+#else
+__ATOMIC64_READ(, ldrexd)
+__ATOMIC64_READ(_wrap, ldrexd)
+
+#define __ATOMIC64_SET(suffix)					        \
+static inline void atomic64_set##suffix(atomic64##suffix##_t *v, long long i) \
+{									\
+	long long tmp;                                                  \
+									\
+	prefetchw(&v->counter);						\
+	__asm__ __volatile__("@ atomic64_set" #suffix"\n"               \
+"1:	ldrexd	%0, %H0, [%2]\n"                                        \
+"	strexd	%0, %3, %H3, [%2]\n"                                    \
+"	teq	%0, #0\n"                                               \
+"	bne	1b"                                                     \
+	: "=&r" (tmp), "=Qo" (v->counter)				\
+	: "r" (&v->counter), "r" (i)		                        \
+	: "cc");                                                        \
 }
 
-static inline void atomic64_set(atomic64_t *v, long long i)
-{
-	long long tmp;
+__ATOMIC64_SET()
+__ATOMIC64_SET(_wrap)
+
+#undef __ATOMIC64_SET
 
-	prefetchw(&v->counter);
-	__asm__ __volatile__("@ atomic64_set\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	strexd	%0, %3, %H3, [%2]\n"
-"	teq	%0, #0\n"
-"	bne	1b"
-	: "=&r" (tmp), "=Qo" (v->counter)
-	: "r" (&v->counter), "r" (i)
-	: "cc");
-}
 #endif
 
-#define ATOMIC64_OP(op, op1, op2)					\
-static inline void atomic64_##op(long long i, atomic64_t *v)		\
+#undef __ATOMIC64_READ
+
+#define __ATOMIC64_OP(op, suffix, op1, op2, post_op, extable)		\
+static inline void atomic64_##op##suffix(long long i, atomic64##suffix##_t *v) \
 {									\
 	long long result;						\
 	unsigned long tmp;						\
@@ -335,17 +448,31 @@  static inline void atomic64_##op(long long i, atomic64_t *v)		\
 "1:	ldrexd	%0, %H0, [%3]\n"					\
 "	" #op1 " %Q0, %Q0, %Q4\n"					\
 "	" #op2 " %R0, %R0, %R4\n"					\
+        post_op					                        \
 "	strexd	%1, %0, %H0, [%3]\n"					\
 "	teq	%1, #0\n"						\
-"	bne	1b"							\
+"	bne	1b\n"							\
+	extable                                                         \
 	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
 	: "r" (&v->counter), "r" (i)					\
 	: "cc");							\
-}									\
+}
 
-#define ATOMIC64_OP_RETURN(op, op1, op2)				\
+#define ATOMIC64_OP(op, op1, op2)		                        \
+	__ATOMIC64_OP(op, _wrap, op1, op2, , )				\
+	__ATOMIC64_OP(op, , op1, op2##s, __OVERFLOW_POST, __OVERFLOW_EXTABLE)
+
+#undef __OVERFLOW_POST_RETURN
+#define __OVERFLOW_POST_RETURN			\
+	"bvc 3f\n"				\
+	"mov %0, %1\n"				\
+	"mov %H0, %H1\n"			\
+	"2: "HARDENED_ATOMIC_INSN"\n"		\
+	"3:\n"
+
+#define __ATOMIC64_OP_RETURN(op, suffix, op1, op2, post_op, extable)	\
 static inline long long							\
-atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
+atomic64_##op##_return##suffix##_relaxed(long long i, atomic64##suffix##_t *v) \
 {									\
 	long long result;						\
 	unsigned long tmp;						\
@@ -356,9 +483,11 @@  atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
 "1:	ldrexd	%0, %H0, [%3]\n"					\
 "	" #op1 " %Q0, %Q0, %Q4\n"					\
 "	" #op2 " %R0, %R0, %R4\n"					\
+	post_op                                                         \
 "	strexd	%1, %0, %H0, [%3]\n"					\
 "	teq	%1, #0\n"						\
-"	bne	1b"							\
+"	bne	1b\n"							\
+	extable                                                         \
 	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
 	: "r" (&v->counter), "r" (i)					\
 	: "cc");							\
@@ -366,6 +495,11 @@  atomic64_##op##_return_relaxed(long long i, atomic64_t *v)		\
 	return result;							\
 }
 
+#define ATOMIC64_OP_RETURN(op, op1, op2)	                        \
+	__ATOMIC64_OP_RETURN(op, _wrap, op1, op2, , )			\
+	__ATOMIC64_OP_RETURN(op, , op1, op2##s, __OVERFLOW_POST_RETURN, \
+			     __OVERFLOW_EXTABLE)
+
 #define ATOMIC64_FETCH_OP(op, op1, op2)					\
 static inline long long							\
 atomic64_fetch_##op##_relaxed(long long i, atomic64_t *v)		\
@@ -422,70 +556,98 @@  ATOMIC64_OPS(xor, eor, eor)
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
 #undef ATOMIC64_OP_RETURN
+#undef __ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
-
-static inline long long
-atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new)
-{
-	long long oldval;
-	unsigned long res;
-
-	prefetchw(&ptr->counter);
-
-	do {
-		__asm__ __volatile__("@ atomic64_cmpxchg\n"
-		"ldrexd		%1, %H1, [%3]\n"
-		"mov		%0, #0\n"
-		"teq		%1, %4\n"
-		"teqeq		%H1, %H4\n"
-		"strexdeq	%0, %5, %H5, [%3]"
-		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
-		: "r" (&ptr->counter), "r" (old), "r" (new)
-		: "cc");
-	} while (res);
-
-	return oldval;
+#undef __ATOMIC64_OP
+#undef __OVERFLOW_EXTABLE
+#undef __OVERFLOW_POST_RETURN
+#undef __OVERFLOW_RETURN
+
+#define __ATOMIC64_CMPXCHG_RELAXED(suffix)	                        \
+static inline long long	atomic64_cmpxchg##suffix##_relaxed(             \
+	atomic64##suffix##_t *ptr, long long old, long long new)	\
+{									\
+	long long oldval;                                               \
+	unsigned long res;						\
+									\
+	prefetchw(&ptr->counter);					\
+									\
+	do {								\
+		__asm__ __volatile__("@ atomic64_cmpxchg" #suffix "\n"  \
+		"ldrexd		%1, %H1, [%3]\n"			\
+		"mov		%0, #0\n"				\
+		"teq		%1, %4\n"				\
+		"teqeq		%H1, %H4\n"				\
+		"strexdeq	%0, %5, %H5, [%3]"			\
+		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)     \
+		: "r" (&ptr->counter), "r" (old), "r" (new)             \
+		: "cc");                                                \
+	} while (res);							\
+									\
+	return oldval;							\
 }
-#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
 
-static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new)
-{
-	long long result;
-	unsigned long tmp;
-
-	prefetchw(&ptr->counter);
+__ATOMIC64_CMPXCHG_RELAXED()
+__ATOMIC64_CMPXCHG_RELAXED(_wrap)
+#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
 
-	__asm__ __volatile__("@ atomic64_xchg\n"
-"1:	ldrexd	%0, %H0, [%3]\n"
-"	strexd	%1, %4, %H4, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
-	: "r" (&ptr->counter), "r" (new)
-	: "cc");
+#undef __ATOMIC64_CMPXCHG_RELAXED
 
-	return result;
+#define __ATOMIC64_XCHG_RELAXED(suffix)					\
+static inline long long atomic64_xchg##suffix##_relaxed(                \
+	atomic64##suffix##_t *ptr, long long new)			\
+{									\
+	long long result;                                               \
+	unsigned long tmp;						\
+									\
+	prefetchw(&ptr->counter);					\
+									\
+	__asm__ __volatile__("@ atomic64_xchg" #suffix "\n"		\
+"1:	ldrexd	%0, %H0, [%3]\n"                                        \
+"	strexd	%1, %4, %H4, [%3]\n"                                    \
+"	teq	%1, #0\n"                                               \
+"	bne	1b"                                                     \
+	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)             \
+	: "r" (&ptr->counter), "r" (new)                                \
+	: "cc");                                                        \
+									\
+	return result;							\
 }
+
+__ATOMIC64_XCHG_RELAXED()
+__ATOMIC64_XCHG_RELAXED(_wrap)
 #define atomic64_xchg_relaxed		atomic64_xchg_relaxed
 
+#undef __ATOMIC64_XCHG_RELAXED
+
 static inline long long atomic64_dec_if_positive(atomic64_t *v)
 {
 	long long result;
-	unsigned long tmp;
+	u64 tmp;
 
 	smp_mb();
 	prefetchw(&v->counter);
 
 	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
-"1:	ldrexd	%0, %H0, [%3]\n"
-"	subs	%Q0, %Q0, #1\n"
-"	sbc	%R0, %R0, #0\n"
+"1:	ldrexd	%1, %H1, [%3]\n"
+"	subs	%Q0, %Q1, #1\n"
+"	sbcs	%R0, %R1, #0\n"
+#ifdef CONFIG_HARDENED_ATOMIC
+"	bvc	3f\n"
+"	mov	%Q0, %Q1\n"
+"	mov	%R0, %R1\n"
+"2:	"HARDENED_ATOMIC_INSN"\n"
+"3:\n"
+#endif
 "	teq	%R0, #0\n"
-"	bmi	2f\n"
+"	bmi	4f\n"
 "	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b\n"
-"2:"
+"4:\n"
+#ifdef CONFIG_HARDENED_ATOMIC
+       _ASM_EXTABLE(2b, 4b)
+#endif
 	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter)
 	: "cc");
@@ -509,13 +671,21 @@  static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
 "	teq	%0, %5\n"
 "	teqeq	%H0, %H5\n"
 "	moveq	%1, #0\n"
-"	beq	2f\n"
+"	beq	4f\n"
 "	adds	%Q0, %Q0, %Q6\n"
-"	adc	%R0, %R0, %R6\n"
+"	adcs	%R0, %R0, %R6\n"
+#ifdef CONFIG_HARDENED_ATOMIC
+"       bvc     3f\n"
+"2:     "HARDENED_ATOMIC_INSN"\n"
+"3:\n"
+#endif
 "	strexd	%2, %0, %H0, [%4]\n"
 "	teq	%2, #0\n"
 "	bne	1b\n"
-"2:"
+"4:\n"
+#ifdef CONFIG_HARDENED_ATOMIC
+       _ASM_EXTABLE(2b, 4b)
+#endif
 	: "=&r" (val), "+r" (ret), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (u), "r" (a)
 	: "cc");
@@ -529,6 +699,7 @@  static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
 #define atomic64_add_negative(a, v)	(atomic64_add_return((a), (v)) < 0)
 #define atomic64_inc(v)			atomic64_add(1LL, (v))
 #define atomic64_inc_return_relaxed(v)	atomic64_add_return_relaxed(1LL, (v))
+#define atomic64_inc_return_wrap_relaxed(v) atomic64_add_return_wrap_relaxed(1LL, v)
 #define atomic64_inc_and_test(v)	(atomic64_inc_return(v) == 0)
 #define atomic64_sub_and_test(a, v)	(atomic64_sub_return((a), (v)) == 0)
 #define atomic64_dec(v)			atomic64_sub(1LL, (v))
@@ -536,6 +707,9 @@  static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
 #define atomic64_dec_and_test(v)	(atomic64_dec_return((v)) == 0)
 #define atomic64_inc_not_zero(v)	atomic64_add_unless((v), 1LL, 0LL)
 
+#define atomic64_inc_wrap(v) atomic64_add_wrap(1LL, v)
+#define atomic64_dec_wrap(v) atomic64_sub_wrap(1LL, v)
+
 #endif /* !CONFIG_GENERIC_ATOMIC64 */
 #endif
 #endif
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 3a2e678..ce8ee00 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -580,6 +580,21 @@  do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
 	const struct fsr_info *inf = ifsr_info + fsr_fs(ifsr);
 	struct siginfo info;
 
+#ifdef CONFIG_HARDENED_ATOMIC
+	if (fsr_fs(ifsr) == FAULT_CODE_DEBUG) {
+		unsigned long pc = instruction_pointer(regs);
+		unsigned int bkpt;
+
+		if (!probe_kernel_address((const unsigned int *)pc, bkpt) &&
+		    cpu_to_le32(bkpt) == 0xe12f1073) {
+			current->thread.error_code = ifsr;
+			current->thread.trap_no = 0;
+			hardened_atomic_overflow(regs);
+			fixup_exception(regs);
+			return;
+		}
+	}
+#endif
 	if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
 		return;