diff mbox

arm64: percpu: Make this_cpu accessors pre-empt safe

Message ID 1426776751-20526-1-git-send-email-steve.capper@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Capper March 19, 2015, 2:52 p.m. UTC
this_cpu operations were implemented for arm64 in:
 5284e1b arm64: xchg: Implement cmpxchg_double
 f97fc81 arm64: percpu: Implement this_cpu operations

Unfortunately, it is possible for pre-emption to take place between
address generation and data access. This can lead to cases where data
is being manipulated by this_cpu for a different CPU than it was
called on. Which effectively breaks the spec.

This patch disables pre-emption for the this_cpu operations
guaranteeing that address generation and data manipulation.

Fixes: 5284e1b: arm64: xchg: Implement cmpxchg_double
Fixes: f97fc81: arm64: percpu: Implement this_cpu operations
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/include/asm/cmpxchg.h | 32 ++++++++++++++++------
 arch/arm64/include/asm/percpu.h  | 57 ++++++++++++++++++++++++++++++++--------
 2 files changed, 70 insertions(+), 19 deletions(-)

Comments

Mark Rutland March 19, 2015, 3:44 p.m. UTC | #1
Hi Steve,

Thanks for putting this together!

On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
> this_cpu operations were implemented for arm64 in:
>  5284e1b arm64: xchg: Implement cmpxchg_double
>  f97fc81 arm64: percpu: Implement this_cpu operations
> 
> Unfortunately, it is possible for pre-emption to take place between
> address generation and data access. This can lead to cases where data
> is being manipulated by this_cpu for a different CPU than it was
> called on. Which effectively breaks the spec.
> 
> This patch disables pre-emption for the this_cpu operations
> guaranteeing that address generation and data manipulation.

Shouldn't that last sentence end with "occur on the same CPU", or
something like that?

[...]

> +/*
> + * Modules aren't allowed to use preempt_enable_no_resched, and it is
> + * undef'ed. If we are unable to use preempt_enable_no_resched, then
> + * fallback to the standard preempt_enable.
> + */
> +#ifdef preempt_enable_no_resched
> +#define __pcp_preempt_enable()	preempt_enable_no_resched()
> +#else
> +#define __pcp_preempt_enable()	preempt_enable()
> +#endif /* preempt_enable_no_resched */

I think it would be worth mentioning in the comment why we want to use
preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
sequences where we want to have as few retries as possible).

Other than those points, the patch looks good to me, feel free to add:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

It's a shame there don't seem to be any this_cpu_* self-tests; I've
booted a kernel with this applied, but I didn't have anything that
exploded without this, so I'd feel uneasy giving a Tested-by.

Mark.
Steve Capper March 19, 2015, 3:55 p.m. UTC | #2
On 19 March 2015 at 15:44, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Steve,
>

Hey Mark,

> Thanks for putting this together!
>
> On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
>> this_cpu operations were implemented for arm64 in:
>>  5284e1b arm64: xchg: Implement cmpxchg_double
>>  f97fc81 arm64: percpu: Implement this_cpu operations
>>
>> Unfortunately, it is possible for pre-emption to take place between
>> address generation and data access. This can lead to cases where data
>> is being manipulated by this_cpu for a different CPU than it was
>> called on. Which effectively breaks the spec.
>>
>> This patch disables pre-emption for the this_cpu operations
>> guaranteeing that address generation and data manipulation.
>
> Shouldn't that last sentence end with "occur on the same CPU", or
> something like that?

Gah! Yes it should, thank you.

>
> [...]
>
>> +/*
>> + * Modules aren't allowed to use preempt_enable_no_resched, and it is
>> + * undef'ed. If we are unable to use preempt_enable_no_resched, then
>> + * fallback to the standard preempt_enable.
>> + */
>> +#ifdef preempt_enable_no_resched
>> +#define __pcp_preempt_enable()       preempt_enable_no_resched()
>> +#else
>> +#define __pcp_preempt_enable()       preempt_enable()
>> +#endif /* preempt_enable_no_resched */
>
> I think it would be worth mentioning in the comment why we want to use
> preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
> sequences where we want to have as few retries as possible).

Good point, I'll add something.


>
> Other than those points, the patch looks good to me, feel free to add:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks!

>
> It's a shame there don't seem to be any this_cpu_* self-tests; I've
> booted a kernel with this applied, but I didn't have anything that
> exploded without this, so I'd feel uneasy giving a Tested-by.

There is the percpu_test module, I ran this and it appeared to pass.
Also, I ran the traditional hackbench tests.

Thanks Mark, I'll send out a V2 shortly with the amended commit log
and expanded comment.

Cheers,
Will Deacon March 19, 2015, 4 p.m. UTC | #3
On Thu, Mar 19, 2015 at 03:44:36PM +0000, Mark Rutland wrote:
> On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
> > this_cpu operations were implemented for arm64 in:
> >  5284e1b arm64: xchg: Implement cmpxchg_double
> >  f97fc81 arm64: percpu: Implement this_cpu operations
> > 
> > Unfortunately, it is possible for pre-emption to take place between
> > address generation and data access. This can lead to cases where data
> > is being manipulated by this_cpu for a different CPU than it was
> > called on. Which effectively breaks the spec.
> > 
> > This patch disables pre-emption for the this_cpu operations
> > guaranteeing that address generation and data manipulation.
> 
> Shouldn't that last sentence end with "occur on the same CPU", or
> something like that?
> 
> [...]
> 
> > +/*
> > + * Modules aren't allowed to use preempt_enable_no_resched, and it is
> > + * undef'ed. If we are unable to use preempt_enable_no_resched, then
> > + * fallback to the standard preempt_enable.
> > + */
> > +#ifdef preempt_enable_no_resched
> > +#define __pcp_preempt_enable()	preempt_enable_no_resched()
> > +#else
> > +#define __pcp_preempt_enable()	preempt_enable()
> > +#endif /* preempt_enable_no_resched */
> 
> I think it would be worth mentioning in the comment why we want to use
> preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
> sequences where we want to have as few retries as possible).

Hmm, I'm not sure I agree with that. In the interest of throughput, I can
understand that you want to minimise the retries but since preempt kernels
are all about minimising latency then actually scheduling when a cmpxchg
loop fail sounds pretty ideal to me.

Why can't we just use preempt_enable?

Will
Mark Rutland March 19, 2015, 4:11 p.m. UTC | #4
On Thu, Mar 19, 2015 at 04:00:09PM +0000, Will Deacon wrote:
> On Thu, Mar 19, 2015 at 03:44:36PM +0000, Mark Rutland wrote:
> > On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
> > > this_cpu operations were implemented for arm64 in:
> > >  5284e1b arm64: xchg: Implement cmpxchg_double
> > >  f97fc81 arm64: percpu: Implement this_cpu operations
> > > 
> > > Unfortunately, it is possible for pre-emption to take place between
> > > address generation and data access. This can lead to cases where data
> > > is being manipulated by this_cpu for a different CPU than it was
> > > called on. Which effectively breaks the spec.
> > > 
> > > This patch disables pre-emption for the this_cpu operations
> > > guaranteeing that address generation and data manipulation.
> > 
> > Shouldn't that last sentence end with "occur on the same CPU", or
> > something like that?
> > 
> > [...]
> > 
> > > +/*
> > > + * Modules aren't allowed to use preempt_enable_no_resched, and it is
> > > + * undef'ed. If we are unable to use preempt_enable_no_resched, then
> > > + * fallback to the standard preempt_enable.
> > > + */
> > > +#ifdef preempt_enable_no_resched
> > > +#define __pcp_preempt_enable()	preempt_enable_no_resched()
> > > +#else
> > > +#define __pcp_preempt_enable()	preempt_enable()
> > > +#endif /* preempt_enable_no_resched */
> > 
> > I think it would be worth mentioning in the comment why we want to use
> > preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
> > sequences where we want to have as few retries as possible).
> 
> Hmm, I'm not sure I agree with that. In the interest of throughput, I can
> understand that you want to minimise the retries but since preempt kernels
> are all about minimising latency then actually scheduling when a cmpxchg
> loop fail sounds pretty ideal to me.

I'm on about scheduling at the end of the read, before the cmpxchg. It's
basically asking for another thread to make the read stale (and hence
the cmpxchg is very likely to fail).

Scheduling after the cmpxchg is fine.

Mark.
Mark Rutland March 19, 2015, 4:23 p.m. UTC | #5
> > It's a shame there don't seem to be any this_cpu_* self-tests; I've
> > booted a kernel with this applied, but I didn't have anything that
> > exploded without this, so I'd feel uneasy giving a Tested-by.
> 
> There is the percpu_test module, I ran this and it appeared to pass.
> Also, I ran the traditional hackbench tests.

I haan't spotted the percpu_test module, but looking at it, it seems to
be effectively useless. The body of the test is:

static int __init percpu_test_init(void)
{
	pr_info("percpu test start\n");
	preempt_disable();

	...
	/*
	 * test each this_cpu operation in turn
	 */
	...

	preempt_enable();
	pr_info("percpu test done\n");

	return -EAGAIN;
}

Which doesn't stress the this_cpu operations at all, unless you consider
the hard part to be the maths ;)

To test this patch thoroughly we need something that has a few threads
which perform some common use patterns (while getting migrated),
checking that they end up with consistent and the system doesn't lockup
or explode.

Mark.
Will Deacon March 19, 2015, 4:27 p.m. UTC | #6
On Thu, Mar 19, 2015 at 04:11:44PM +0000, Mark Rutland wrote:
> On Thu, Mar 19, 2015 at 04:00:09PM +0000, Will Deacon wrote:
> > On Thu, Mar 19, 2015 at 03:44:36PM +0000, Mark Rutland wrote:
> > > On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
> > > > +/*
> > > > + * Modules aren't allowed to use preempt_enable_no_resched, and it is
> > > > + * undef'ed. If we are unable to use preempt_enable_no_resched, then
> > > > + * fallback to the standard preempt_enable.
> > > > + */
> > > > +#ifdef preempt_enable_no_resched
> > > > +#define __pcp_preempt_enable()	preempt_enable_no_resched()
> > > > +#else
> > > > +#define __pcp_preempt_enable()	preempt_enable()
> > > > +#endif /* preempt_enable_no_resched */
> > > 
> > > I think it would be worth mentioning in the comment why we want to use
> > > preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
> > > sequences where we want to have as few retries as possible).
> > 
> > Hmm, I'm not sure I agree with that. In the interest of throughput, I can
> > understand that you want to minimise the retries but since preempt kernels
> > are all about minimising latency then actually scheduling when a cmpxchg
> > loop fail sounds pretty ideal to me.
> 
> I'm on about scheduling at the end of the read, before the cmpxchg. It's
> basically asking for another thread to make the read stale (and hence
> the cmpxchg is very likely to fail).

/me gets introduced to SLUB's slab_alloc_node.

> Scheduling after the cmpxchg is fine.

I still don't think the slub code warrants using preempt_enable_no_resched,
for a number of reasons:

  (1) s390 uses preempt_enable, so it doesn't appear to be the end of the
      world

  (2) The slub code is well aware of what it's doing, but doesn't consider
      it an issue:

      * [...] We may switch back and forth between cpus while
      * reading from one cpu area. That does not matter as long
      * as we end up on the original cpu again when doing
      * the cmpxchg.

  (3) Preemption isn't actually an issue here -- CPU migration is. I'd
      expect that to be a lot rarer.

  (4) Having different preempt behaviour depending on whether or not
      something is built as a module is bloody horrible

If we wanted to change anything, SLUB is probably a better candidate than
the pcpu accessors!

Will
Mark Rutland March 19, 2015, 4:39 p.m. UTC | #7
On Thu, Mar 19, 2015 at 04:27:53PM +0000, Will Deacon wrote:
> On Thu, Mar 19, 2015 at 04:11:44PM +0000, Mark Rutland wrote:
> > On Thu, Mar 19, 2015 at 04:00:09PM +0000, Will Deacon wrote:
> > > On Thu, Mar 19, 2015 at 03:44:36PM +0000, Mark Rutland wrote:
> > > > On Thu, Mar 19, 2015 at 02:52:31PM +0000, Steve Capper wrote:
> > > > > +/*
> > > > > + * Modules aren't allowed to use preempt_enable_no_resched, and it is
> > > > > + * undef'ed. If we are unable to use preempt_enable_no_resched, then
> > > > > + * fallback to the standard preempt_enable.
> > > > > + */
> > > > > +#ifdef preempt_enable_no_resched
> > > > > +#define __pcp_preempt_enable()	preempt_enable_no_resched()
> > > > > +#else
> > > > > +#define __pcp_preempt_enable()	preempt_enable()
> > > > > +#endif /* preempt_enable_no_resched */
> > > > 
> > > > I think it would be worth mentioning in the comment why we want to use
> > > > preempt_enable_no_resched where possible (e.g. read-modify-cmpxchg
> > > > sequences where we want to have as few retries as possible).
> > > 
> > > Hmm, I'm not sure I agree with that. In the interest of throughput, I can
> > > understand that you want to minimise the retries but since preempt kernels
> > > are all about minimising latency then actually scheduling when a cmpxchg
> > > loop fail sounds pretty ideal to me.
> > 
> > I'm on about scheduling at the end of the read, before the cmpxchg. It's
> > basically asking for another thread to make the read stale (and hence
> > the cmpxchg is very likely to fail).
> 
> /me gets introduced to SLUB's slab_alloc_node.
> 
> > Scheduling after the cmpxchg is fine.
> 
> I still don't think the slub code warrants using preempt_enable_no_resched,
> for a number of reasons:
> 
>   (1) s390 uses preempt_enable, so it doesn't appear to be the end of the
>       world
> 
>   (2) The slub code is well aware of what it's doing, but doesn't consider
>       it an issue:
> 
>       * [...] We may switch back and forth between cpus while
>       * reading from one cpu area. That does not matter as long
>       * as we end up on the original cpu again when doing
>       * the cmpxchg.

Sure, the code is correct regardless, on both of these points.

>   (3) Preemption isn't actually an issue here -- CPU migration is. I'd
>       expect that to be a lot rarer.

That's not true: in a read-modify-cmpxchg loop (even on a UP system) if
you get preempted at the end of the read and another thread modifies
your datastructure, then your subsequent cmpxchg will fail. So any
preemption could cause you to have to retry, so you want to miminise
potential preemptions.

Note that this is a performance argument rather than a correctness
argument.

>   (4) Having different preempt behaviour depending on whether or not
>       something is built as a module is bloody horrible

Agreed.

Mark.
Will Deacon March 20, 2015, 6:02 p.m. UTC | #8
On Thu, Mar 19, 2015 at 04:39:55PM +0000, Mark Rutland wrote:
> On Thu, Mar 19, 2015 at 04:27:53PM +0000, Will Deacon wrote:
> >   (3) Preemption isn't actually an issue here -- CPU migration is. I'd
> >       expect that to be a lot rarer.
> 
> That's not true: in a read-modify-cmpxchg loop (even on a UP system) if
> you get preempted at the end of the read and another thread modifies
> your datastructure, then your subsequent cmpxchg will fail. So any
> preemption could cause you to have to retry, so you want to miminise
> potential preemptions.
> 
> Note that this is a performance argument rather than a correctness
> argument.

My argument is more that a migration will always break the cmpxchg, whereas
preemption doesn't necessarily.

Steve -- can you respin this using preempt_enable everywhere please?

Will
diff mbox

Patch

diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index cb95930..0ff749b 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -246,14 +246,30 @@  static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 	__ret; \
 })
 
-#define this_cpu_cmpxchg_1(ptr, o, n) cmpxchg_local(raw_cpu_ptr(&(ptr)), o, n)
-#define this_cpu_cmpxchg_2(ptr, o, n) cmpxchg_local(raw_cpu_ptr(&(ptr)), o, n)
-#define this_cpu_cmpxchg_4(ptr, o, n) cmpxchg_local(raw_cpu_ptr(&(ptr)), o, n)
-#define this_cpu_cmpxchg_8(ptr, o, n) cmpxchg_local(raw_cpu_ptr(&(ptr)), o, n)
-
-#define this_cpu_cmpxchg_double_8(ptr1, ptr2, o1, o2, n1, n2) \
-	cmpxchg_double_local(raw_cpu_ptr(&(ptr1)), raw_cpu_ptr(&(ptr2)), \
-				o1, o2, n1, n2)
+#define _protect_cmpxchg_local(pcp, o, n)			\
+({								\
+	typeof(*raw_cpu_ptr(&(pcp))) __ret;			\
+	__pcp_preempt_disable();				\
+	__ret = cmpxchg_local(raw_cpu_ptr(&(pcp)), o, n);	\
+	__pcp_preempt_enable();					\
+	__ret;							\
+})
+
+#define this_cpu_cmpxchg_1(ptr, o, n) _protect_cmpxchg_local(ptr, o, n)
+#define this_cpu_cmpxchg_2(ptr, o, n) _protect_cmpxchg_local(ptr, o, n)
+#define this_cpu_cmpxchg_4(ptr, o, n) _protect_cmpxchg_local(ptr, o, n)
+#define this_cpu_cmpxchg_8(ptr, o, n) _protect_cmpxchg_local(ptr, o, n)
+
+#define this_cpu_cmpxchg_double_8(ptr1, ptr2, o1, o2, n1, n2)		\
+({									\
+	int __ret;							\
+	__pcp_preempt_disable();					\
+	__ret = cmpxchg_double_local(	raw_cpu_ptr(&(ptr1)),		\
+					raw_cpu_ptr(&(ptr2)),		\
+					o1, o2, n1, n2);		\
+	__pcp_preempt_enable();						\
+	__ret;								\
+})
 
 #define cmpxchg64(ptr,o,n)		cmpxchg((ptr),(o),(n))
 #define cmpxchg64_local(ptr,o,n)	cmpxchg_local((ptr),(o),(n))
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 09da25b..94f118e 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -204,25 +204,60 @@  static inline unsigned long __percpu_xchg(void *ptr, unsigned long val,
 	return ret;
 }
 
+/*
+ * Modules aren't allowed to use preempt_enable_no_resched, and it is
+ * undef'ed. If we are unable to use preempt_enable_no_resched, then
+ * fallback to the standard preempt_enable.
+ */
+#ifdef preempt_enable_no_resched
+#define __pcp_preempt_enable()	preempt_enable_no_resched()
+#else
+#define __pcp_preempt_enable()	preempt_enable()
+#endif /* preempt_enable_no_resched */
+
+#define	__pcp_preempt_disable()	preempt_disable()
+
+#define _percpu_read(pcp)						\
+({									\
+	typeof(pcp) __retval;						\
+	__pcp_preempt_disable();					\
+	__retval = (typeof(pcp)) __percpu_read(raw_cpu_ptr(&(pcp)), 	\
+					sizeof(pcp));			\
+	__pcp_preempt_enable();						\
+	__retval;							\
+})
+
+#define _percpu_write(pcp, val)						\
+do {									\
+	__pcp_preempt_disable();					\
+	__percpu_write(raw_cpu_ptr(&(pcp)), (unsigned long) (val), 	\
+				sizeof(pcp));				\
+	__pcp_preempt_enable();						\
+} while(0)								\
+
+#define _pcp_protect(operation, pcp, val)			\
+({								\
+	typeof(pcp) __retval;					\
+	__pcp_preempt_disable();				\
+	__retval = (typeof(pcp)) operation(raw_cpu_ptr(&(pcp)),	\
+				(val), sizeof(pcp));		\
+	__pcp_preempt_enable();					\
+	__retval;						\
+})
+
 #define _percpu_add(pcp, val) \
-	__percpu_add(raw_cpu_ptr(&(pcp)), val, sizeof(pcp))
+	_pcp_protect(__percpu_add, pcp, val)
 
-#define _percpu_add_return(pcp, val) (typeof(pcp)) (_percpu_add(pcp, val))
+#define _percpu_add_return(pcp, val) _percpu_add(pcp, val)
 
 #define _percpu_and(pcp, val) \
-	__percpu_and(raw_cpu_ptr(&(pcp)), val, sizeof(pcp))
+	_pcp_protect(__percpu_and, pcp, val)
 
 #define _percpu_or(pcp, val) \
-	__percpu_or(raw_cpu_ptr(&(pcp)), val, sizeof(pcp))
-
-#define _percpu_read(pcp) (typeof(pcp))	\
-	(__percpu_read(raw_cpu_ptr(&(pcp)), sizeof(pcp)))
-
-#define _percpu_write(pcp, val) \
-	__percpu_write(raw_cpu_ptr(&(pcp)), (unsigned long)(val), sizeof(pcp))
+	_pcp_protect(__percpu_or, pcp, val)
 
 #define _percpu_xchg(pcp, val) (typeof(pcp)) \
-	(__percpu_xchg(raw_cpu_ptr(&(pcp)), (unsigned long)(val), sizeof(pcp)))
+	_pcp_protect(__percpu_xchg, pcp, (unsigned long)(val))
 
 #define this_cpu_add_1(pcp, val) _percpu_add(pcp, val)
 #define this_cpu_add_2(pcp, val) _percpu_add(pcp, val)