diff mbox

[14/15] futex: convert futex_pi_state.refcount to refcount_t

Message ID 1504095773-22895-15-git-send-email-elena.reshetova@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Reshetova, Elena Aug. 30, 2017, 12:22 p.m. UTC
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable futex_pi_state.refcount is used as pure
reference counter. Convert it to refcount_t and fix up
the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 kernel/futex.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Thomas Gleixner Sept. 1, 2017, 7:39 a.m. UTC | #1
On Wed, 30 Aug 2017, Elena Reshetova wrote:
> atomic_t variables are currently used to implement reference
> counters with the following properties:
>  - counter is initialized to 1 using atomic_set()
>  - a resource is freed upon counter reaching zero
>  - once counter reaches zero, its further
>    increments aren't allowed
>  - counter schema uses basic atomic operations
>    (set, inc, inc_not_zero, dec_and_test, etc.)
> 
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows
> and underflows. This is important since overflows and underflows
> can lead to use-after-free situation and be exploitable.
> 
> The variable futex_pi_state.refcount is used as pure
> reference counter. Convert it to refcount_t and fix up
> the operations.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Reviewed-by: David Windsor <dwindsor@gmail.com>
> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra Sept. 1, 2017, 9:38 a.m. UTC | #2
On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > atomic_t variables are currently used to implement reference
> > counters with the following properties:
> >  - counter is initialized to 1 using atomic_set()
> >  - a resource is freed upon counter reaching zero
> >  - once counter reaches zero, its further
> >    increments aren't allowed
> >  - counter schema uses basic atomic operations
> >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > 
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows
> > and underflows. This is important since overflows and underflows
> > can lead to use-after-free situation and be exploitable.
> > 
> > The variable futex_pi_state.refcount is used as pure
> > reference counter. Convert it to refcount_t and fix up
> > the operations.
> > 
> > Suggested-by: Kees Cook <keescook@chromium.org>
> > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

So the thing to be careful with for things like futex and some of the
other core kernel code is the memory ordering.

atomic_dec_and_test() provides a full smp_mb() before and after,
refcount_dec_and_test() only provides release semantics.

This is typically sufficient, and I would argue that if we rely on more
than that, there _should_ be a comment, however reality isn't always as
nice.

That said, I think this conversion is OK, pi_state->refcount isn't
relied upon to provide additional memory ordering above and beyond what
refcounting requires.
Thomas Gleixner Sept. 1, 2017, 9:43 a.m. UTC | #3
On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > atomic_t variables are currently used to implement reference
> > > counters with the following properties:
> > >  - counter is initialized to 1 using atomic_set()
> > >  - a resource is freed upon counter reaching zero
> > >  - once counter reaches zero, its further
> > >    increments aren't allowed
> > >  - counter schema uses basic atomic operations
> > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > 
> > > Such atomic variables should be converted to a newly provided
> > > refcount_t type and API that prevents accidental counter overflows
> > > and underflows. This is important since overflows and underflows
> > > can lead to use-after-free situation and be exploitable.
> > > 
> > > The variable futex_pi_state.refcount is used as pure
> > > reference counter. Convert it to refcount_t and fix up
> > > the operations.
> > > 
> > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > 
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> 
> So the thing to be careful with for things like futex and some of the
> other core kernel code is the memory ordering.
> 
> atomic_dec_and_test() provides a full smp_mb() before and after,
> refcount_dec_and_test() only provides release semantics.
> 
> This is typically sufficient, and I would argue that if we rely on more
> than that, there _should_ be a comment, however reality isn't always as
> nice.
> 
> That said, I think this conversion is OK, pi_state->refcount isn't
> relied upon to provide additional memory ordering above and beyond what
> refcounting requires.

So the changelogs should reflect that. The current one suggests that this
is a one to one replacement for atomic_t just with the extra sanity checks
added.

Thanks,

	tglx
Reshetova, Elena Sept. 1, 2017, 10:52 a.m. UTC | #4
> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > atomic_t variables are currently used to implement reference
> > > > counters with the following properties:
> > > >  - counter is initialized to 1 using atomic_set()
> > > >  - a resource is freed upon counter reaching zero
> > > >  - once counter reaches zero, its further
> > > >    increments aren't allowed
> > > >  - counter schema uses basic atomic operations
> > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > >
> > > > Such atomic variables should be converted to a newly provided
> > > > refcount_t type and API that prevents accidental counter overflows
> > > > and underflows. This is important since overflows and underflows
> > > > can lead to use-after-free situation and be exploitable.
> > > >
> > > > The variable futex_pi_state.refcount is used as pure
> > > > reference counter. Convert it to refcount_t and fix up
> > > > the operations.
> > > >
> > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > >
> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > So the thing to be careful with for things like futex and some of the
> > other core kernel code is the memory ordering.
> >
> > atomic_dec_and_test() provides a full smp_mb() before and after,
> > refcount_dec_and_test() only provides release semantics.
> >
> > This is typically sufficient, and I would argue that if we rely on more
> > than that, there _should_ be a comment, however reality isn't always as
> > nice.
> >
> > That said, I think this conversion is OK, pi_state->refcount isn't
> > relied upon to provide additional memory ordering above and beyond what
> > refcounting requires.
> 
> So the changelogs should reflect that. The current one suggests that this
> is a one to one replacement for atomic_t just with the extra sanity checks
> added.

I will update the commit texts accordingly and resend the whole series since
this should be then mentioned in every commit to make sure it is not missed.

Best Regards,
Elena.

> 
> Thanks,
> 
> 	tglx
Reshetova, Elena Sept. 1, 2017, 11:05 a.m. UTC | #5
> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > > atomic_t variables are currently used to implement reference
> > > > > counters with the following properties:
> > > > >  - counter is initialized to 1 using atomic_set()
> > > > >  - a resource is freed upon counter reaching zero
> > > > >  - once counter reaches zero, its further
> > > > >    increments aren't allowed
> > > > >  - counter schema uses basic atomic operations
> > > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > >
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows
> > > > > and underflows. This is important since overflows and underflows
> > > > > can lead to use-after-free situation and be exploitable.
> > > > >
> > > > > The variable futex_pi_state.refcount is used as pure
> > > > > reference counter. Convert it to refcount_t and fix up
> > > > > the operations.
> > > > >
> > > > > Suggested-by: Kees Cook <keescook@chromium.org>
> > > > > Reviewed-by: David Windsor <dwindsor@gmail.com>
> > > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> > > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > > >
> > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > >
> > > So the thing to be careful with for things like futex and some of the
> > > other core kernel code is the memory ordering.
> > >
> > > atomic_dec_and_test() provides a full smp_mb() before and after,
> > > refcount_dec_and_test() only provides release semantics.
> > >
> > > This is typically sufficient, and I would argue that if we rely on more
> > > than that, there _should_ be a comment, however reality isn't always as
> > > nice.
> > >
> > > That said, I think this conversion is OK, pi_state->refcount isn't
> > > relied upon to provide additional memory ordering above and beyond what
> > > refcounting requires.
> >
> > So the changelogs should reflect that. The current one suggests that this
> > is a one to one replacement for atomic_t just with the extra sanity checks
> > added.
> 
> I will update the commit texts accordingly and resend the whole series since
> this should be then mentioned in every commit to make sure it is not missed.

Actually on the second thought: does the above memory ordering differences
really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
how it is currently implemented for x86 is the same way as it is for atomic cases.


> 
> Best Regards,
> Elena.
> 
> >
> > Thanks,
> >
> > 	tglx
Peter Zijlstra Sept. 1, 2017, 12:34 p.m. UTC | #6
On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> Actually on the second thought: does the above memory ordering differences
> really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> how it is currently implemented for x86 is the same way as it is for atomic cases.

Never look to x86 for memory ordering, its boring.

And yes, for the ARM implementation it can certainly make a difference.
Reshetova, Elena Sept. 1, 2017, 1:24 p.m. UTC | #7
> On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > Actually on the second thought: does the above memory ordering differences
> > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > how it is currently implemented for x86 is the same way as it is for atomic cases.
> 
> Never look to x86 for memory ordering, its boring.
> 
> And yes, for the ARM implementation it can certainly make a difference.

So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
might be differences when you use arch. dependent version of function or not. 

So, I guess if I rewrite the commits, I should only include the statement on relaxed memory order
for REFCOUNT_FULL and tell that arch. specific implementations may vary on their properties
(as they do now).
Peter Zijlstra Sept. 1, 2017, 1:36 p.m. UTC | #8
On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> 
> > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > Actually on the second thought: does the above memory ordering differences
> > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > how it is currently implemented for x86 is the same way as it is for atomic cases.
> > 
> > Never look to x86 for memory ordering, its boring.
> > 
> > And yes, for the ARM implementation it can certainly make a difference.
> 
> So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
> enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
> might be differences when you use arch. dependent version of function or not. 

So the generic one in lib/refcount.c is already weaker on ARM, they
don't need to do a ARCH specific 'fast' implementation for the
difference to show up.
Reshetova, Elena Sept. 1, 2017, 5:03 p.m. UTC | #9
> On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> >
> > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > Actually on the second thought: does the above memory ordering differences
> > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > how it is currently implemented for x86 is the same way as it is for atomic
> cases.
> > >
> > > Never look to x86 for memory ordering, its boring.
> > >
> > > And yes, for the ARM implementation it can certainly make a difference.
> >
> > So, yes, what I am trying to say is that it can really depend if you have
> ARCH_HAS_REFCOUNT
> > enabled or not and then also based on architecture. Thus I believe is also true for
> atomic: there
> > might be differences when you use arch. dependent version of function or not.
> 
> So the generic one in lib/refcount.c is already weaker on ARM, they
> don't need to do a ARCH specific 'fast' implementation for the
> difference to show up.

But can they make "fast" implementation on ARM that would give stronger memory guarantees?
Peter Zijlstra Sept. 1, 2017, 7:12 p.m. UTC | #10
On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > >
> > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > Actually on the second thought: does the above memory ordering differences
> > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > cases.
> > > >
> > > > Never look to x86 for memory ordering, its boring.
> > > >
> > > > And yes, for the ARM implementation it can certainly make a difference.
> > >
> > > So, yes, what I am trying to say is that it can really depend if you have
> > ARCH_HAS_REFCOUNT
> > > enabled or not and then also based on architecture. Thus I believe is also true for
> > atomic: there
> > > might be differences when you use arch. dependent version of function or not.
> > 
> > So the generic one in lib/refcount.c is already weaker on ARM, they
> > don't need to do a ARCH specific 'fast' implementation for the
> > difference to show up.
> 
> But can they make "fast" implementation on ARM that would give stronger memory guarantees?

Whatever for?
Reshetova, Elena Sept. 4, 2017, 10:31 a.m. UTC | #11
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Friday, September 1, 2017 10:13 PM
> To: Reshetova, Elena <elena.reshetova@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; gregkh@linuxfoundation.org; viro@zeniv.linux.org.uk;
> tj@kernel.org; mingo@redhat.com; hannes@cmpxchg.org; lizefan@huawei.com;
> acme@kernel.org; alexander.shishkin@linux.intel.com; eparis@redhat.com;
> akpm@linux-foundation.org; arnd@arndb.de; luto@kernel.org;
> keescook@chromium.org; dvhart@infradead.org; ebiederm@xmission.com
> Subject: Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
> 
> On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > > >
> > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > > Actually on the second thought: does the above memory ordering
> differences
> > > > > > really apply when  we have ARCH_HAS_REFCOUNT? To me it looks like the
> way
> > > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > > cases.
> > > > >
> > > > > Never look to x86 for memory ordering, its boring.
> > > > >
> > > > > And yes, for the ARM implementation it can certainly make a difference.
> > > >
> > > > So, yes, what I am trying to say is that it can really depend if you have
> > > ARCH_HAS_REFCOUNT
> > > > enabled or not and then also based on architecture. Thus I believe is also true
> for
> > > atomic: there
> > > > might be differences when you use arch. dependent version of function or not.
> > >
> > > So the generic one in lib/refcount.c is already weaker on ARM, they
> > > don't need to do a ARCH specific 'fast' implementation for the
> > > difference to show up.
> >
> > But can they make "fast" implementation on ARM that would give stronger
> memory guarantees?
> 
> Whatever for?

Well, maybe just by default when arch.-specific implementation is done. But I was just trying to speculate
to understand. I will resend this one with new comment added. 

Still not sure if I need to resend the whole series with updated commits
or break this up by individual patches further for the separate merges.
diff mbox

Patch

diff --git a/kernel/futex.c b/kernel/futex.c
index 0939255..65460b4 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -67,6 +67,7 @@ 
 #include <linux/freezer.h>
 #include <linux/bootmem.h>
 #include <linux/fault-inject.h>
+#include <linux/refcount.h>
 
 #include <asm/futex.h>
 
@@ -209,7 +210,7 @@  struct futex_pi_state {
 	struct rt_mutex pi_mutex;
 
 	struct task_struct *owner;
-	atomic_t refcount;
+	refcount_t refcount;
 
 	union futex_key key;
 } __randomize_layout;
@@ -795,7 +796,7 @@  static int refill_pi_state_cache(void)
 	INIT_LIST_HEAD(&pi_state->list);
 	/* pi_mutex gets initialized later */
 	pi_state->owner = NULL;
-	atomic_set(&pi_state->refcount, 1);
+	refcount_set(&pi_state->refcount, 1);
 	pi_state->key = FUTEX_KEY_INIT;
 
 	current->pi_state_cache = pi_state;
@@ -815,7 +816,7 @@  static struct futex_pi_state *alloc_pi_state(void)
 
 static void get_pi_state(struct futex_pi_state *pi_state)
 {
-	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+	WARN_ON_ONCE(!refcount_inc_not_zero(&pi_state->refcount));
 }
 
 /*
@@ -829,7 +830,7 @@  static void put_pi_state(struct futex_pi_state *pi_state)
 	if (!pi_state)
 		return;
 
-	if (!atomic_dec_and_test(&pi_state->refcount))
+	if (!refcount_dec_and_test(&pi_state->refcount))
 		return;
 
 	/*
@@ -853,7 +854,7 @@  static void put_pi_state(struct futex_pi_state *pi_state)
 		 * refcount is at 0 - put it back to 1.
 		 */
 		pi_state->owner = NULL;
-		atomic_set(&pi_state->refcount, 1);
+		refcount_set(&pi_state->refcount, 1);
 		current->pi_state_cache = pi_state;
 	}
 }
@@ -1051,7 +1052,7 @@  static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	 * and futex_wait_requeue_pi() as it cannot go to 0 and consequently
 	 * free pi_state before we can take a reference ourselves.
 	 */
-	WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!refcount_read(&pi_state->refcount));
 
 	/*
 	 * Now that we have a pi_state, we can acquire wait_lock