Message ID | 1504095773-22895-15-git-send-email-elena.reshetova@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 30 Aug 2017, Elena Reshetova wrote: > atomic_t variables are currently used to implement reference > counters with the following properties: > - counter is initialized to 1 using atomic_set() > - a resource is freed upon counter reaching zero > - once counter reaches zero, its further > increments aren't allowed > - counter schema uses basic atomic operations > (set, inc, inc_not_zero, dec_and_test, etc.) > > Such atomic variables should be converted to a newly provided > refcount_t type and API that prevents accidental counter overflows > and underflows. This is important since overflows and underflows > can lead to use-after-free situation and be exploitable. > > The variable futex_pi_state.refcount is used as pure > reference counter. Convert it to refcount_t and fix up > the operations. > > Suggested-by: Kees Cook <keescook@chromium.org> > Reviewed-by: David Windsor <dwindsor@gmail.com> > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote: > On Wed, 30 Aug 2017, Elena Reshetova wrote: > > atomic_t variables are currently used to implement reference > > counters with the following properties: > > - counter is initialized to 1 using atomic_set() > > - a resource is freed upon counter reaching zero > > - once counter reaches zero, its further > > increments aren't allowed > > - counter schema uses basic atomic operations > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > Such atomic variables should be converted to a newly provided > > refcount_t type and API that prevents accidental counter overflows > > and underflows. This is important since overflows and underflows > > can lead to use-after-free situation and be exploitable. > > > > The variable futex_pi_state.refcount is used as pure > > reference counter. Convert it to refcount_t and fix up > > the operations. > > > > Suggested-by: Kees Cook <keescook@chromium.org> > > Reviewed-by: David Windsor <dwindsor@gmail.com> > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> So the thing to be careful with for things like futex and some of the other core kernel code is the memory ordering. atomic_dec_and_test() provides a full smp_mb() before and after, refcount_dec_and_test() only provides release semantics. This is typically sufficient, and I would argue that if we rely on more than that, there _should_ be a comment, however reality isn't always as nice. That said, I think this conversion is OK, pi_state->refcount isn't relied upon to provide additional memory ordering above and beyond what refcounting requires.
On Fri, 1 Sep 2017, Peter Zijlstra wrote: > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote: > > On Wed, 30 Aug 2017, Elena Reshetova wrote: > > > atomic_t variables are currently used to implement reference > > > counters with the following properties: > > > - counter is initialized to 1 using atomic_set() > > > - a resource is freed upon counter reaching zero > > > - once counter reaches zero, its further > > > increments aren't allowed > > > - counter schema uses basic atomic operations > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > Such atomic variables should be converted to a newly provided > > > refcount_t type and API that prevents accidental counter overflows > > > and underflows. This is important since overflows and underflows > > > can lead to use-after-free situation and be exploitable. > > > > > > The variable futex_pi_state.refcount is used as pure > > > reference counter. Convert it to refcount_t and fix up > > > the operations. > > > > > > Suggested-by: Kees Cook <keescook@chromium.org> > > > Reviewed-by: David Windsor <dwindsor@gmail.com> > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > So the thing to be careful with for things like futex and some of the > other core kernel code is the memory ordering. > > atomic_dec_and_test() provides a full smp_mb() before and after, > refcount_dec_and_test() only provides release semantics. > > This is typically sufficient, and I would argue that if we rely on more > than that, there _should_ be a comment, however reality isn't always as > nice. > > That said, I think this conversion is OK, pi_state->refcount isn't > relied upon to provide additional memory ordering above and beyond what > refcounting requires. So the changelogs should reflect that. The current one suggests that this is a one to one replacement for atomic_t just with the extra sanity checks added. Thanks, tglx
> On Fri, 1 Sep 2017, Peter Zijlstra wrote: > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote: > > > On Wed, 30 Aug 2017, Elena Reshetova wrote: > > > > atomic_t variables are currently used to implement reference > > > > counters with the following properties: > > > > - counter is initialized to 1 using atomic_set() > > > > - a resource is freed upon counter reaching zero > > > > - once counter reaches zero, its further > > > > increments aren't allowed > > > > - counter schema uses basic atomic operations > > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > > > Such atomic variables should be converted to a newly provided > > > > refcount_t type and API that prevents accidental counter overflows > > > > and underflows. This is important since overflows and underflows > > > > can lead to use-after-free situation and be exploitable. > > > > > > > > The variable futex_pi_state.refcount is used as pure > > > > reference counter. Convert it to refcount_t and fix up > > > > the operations. > > > > > > > > Suggested-by: Kees Cook <keescook@chromium.org> > > > > Reviewed-by: David Windsor <dwindsor@gmail.com> > > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > > > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > So the thing to be careful with for things like futex and some of the > > other core kernel code is the memory ordering. > > > > atomic_dec_and_test() provides a full smp_mb() before and after, > > refcount_dec_and_test() only provides release semantics. > > > > This is typically sufficient, and I would argue that if we rely on more > > than that, there _should_ be a comment, however reality isn't always as > > nice. > > > > That said, I think this conversion is OK, pi_state->refcount isn't > > relied upon to provide additional memory ordering above and beyond what > > refcounting requires. > > So the changelogs should reflect that. The current one suggests that this > is a one to one replacement for atomic_t just with the extra sanity checks > added. I will update the commit texts accordingly and resend the whole series since this should be then mentioned in every commit to make sure it is not missed. Best Regards, Elena. > > Thanks, > > tglx
> On Fri, 1 Sep 2017, Peter Zijlstra wrote: > > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote: > > > > On Wed, 30 Aug 2017, Elena Reshetova wrote: > > > > > atomic_t variables are currently used to implement reference > > > > > counters with the following properties: > > > > > - counter is initialized to 1 using atomic_set() > > > > > - a resource is freed upon counter reaching zero > > > > > - once counter reaches zero, its further > > > > > increments aren't allowed > > > > > - counter schema uses basic atomic operations > > > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > > > > > Such atomic variables should be converted to a newly provided > > > > > refcount_t type and API that prevents accidental counter overflows > > > > > and underflows. This is important since overflows and underflows > > > > > can lead to use-after-free situation and be exploitable. > > > > > > > > > > The variable futex_pi_state.refcount is used as pure > > > > > reference counter. Convert it to refcount_t and fix up > > > > > the operations. > > > > > > > > > > Suggested-by: Kees Cook <keescook@chromium.org> > > > > > Reviewed-by: David Windsor <dwindsor@gmail.com> > > > > > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > > > > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > > > > > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > > > So the thing to be careful with for things like futex and some of the > > > other core kernel code is the memory ordering. > > > > > > atomic_dec_and_test() provides a full smp_mb() before and after, > > > refcount_dec_and_test() only provides release semantics. > > > > > > This is typically sufficient, and I would argue that if we rely on more > > > than that, there _should_ be a comment, however reality isn't always as > > > nice. > > > > > > That said, I think this conversion is OK, pi_state->refcount isn't > > > relied upon to provide additional memory ordering above and beyond what > > > refcounting requires. > > > > So the changelogs should reflect that. The current one suggests that this > > is a one to one replacement for atomic_t just with the extra sanity checks > > added. > > I will update the commit texts accordingly and resend the whole series since > this should be then mentioned in every commit to make sure it is not missed. Actually on the second thought: does the above memory ordering differences really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way how it is currently implemented for x86 is the same way as it is for atomic cases. > > Best Regards, > Elena. > > > > > Thanks, > > > > tglx
On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > Actually on the second thought: does the above memory ordering differences > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way > how it is currently implemented for x86 is the same way as it is for atomic cases. Never look to x86 for memory ordering, its boring. And yes, for the ARM implementation it can certainly make a difference.
> On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > > Actually on the second thought: does the above memory ordering differences > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way > > how it is currently implemented for x86 is the same way as it is for atomic cases. > > Never look to x86 for memory ordering, its boring. > > And yes, for the ARM implementation it can certainly make a difference. So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT enabled or not and then also based on architecture. Thus I believe is also true for atomic: there might be differences when you use arch. dependent version of function or not. So, I guess if I rewrite the commits, I should only include the statement on relaxed memory order for REFCOUNT_FULL and tell that arch. specific implementations may vary on their properties (as they do now).
On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote: > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > > > Actually on the second thought: does the above memory ordering differences > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way > > > how it is currently implemented for x86 is the same way as it is for atomic cases. > > > > Never look to x86 for memory ordering, its boring. > > > > And yes, for the ARM implementation it can certainly make a difference. > > So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT > enabled or not and then also based on architecture. Thus I believe is also true for atomic: there > might be differences when you use arch. dependent version of function or not. So the generic one in lib/refcount.c is already weaker on ARM, they don't need to do a ARCH specific 'fast' implementation for the difference to show up.
> On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote: > > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > > > > Actually on the second thought: does the above memory ordering differences > > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way > > > > how it is currently implemented for x86 is the same way as it is for atomic > cases. > > > > > > Never look to x86 for memory ordering, its boring. > > > > > > And yes, for the ARM implementation it can certainly make a difference. > > > > So, yes, what I am trying to say is that it can really depend if you have > ARCH_HAS_REFCOUNT > > enabled or not and then also based on architecture. Thus I believe is also true for > atomic: there > > might be differences when you use arch. dependent version of function or not. > > So the generic one in lib/refcount.c is already weaker on ARM, they > don't need to do a ARCH specific 'fast' implementation for the > difference to show up. But can they make "fast" implementation on ARM that would give stronger memory guarantees?
On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote: > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote: > > > > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > > > > > Actually on the second thought: does the above memory ordering differences > > > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way > > > > > how it is currently implemented for x86 is the same way as it is for atomic > > cases. > > > > > > > > Never look to x86 for memory ordering, its boring. > > > > > > > > And yes, for the ARM implementation it can certainly make a difference. > > > > > > So, yes, what I am trying to say is that it can really depend if you have > > ARCH_HAS_REFCOUNT > > > enabled or not and then also based on architecture. Thus I believe is also true for > > atomic: there > > > might be differences when you use arch. dependent version of function or not. > > > > So the generic one in lib/refcount.c is already weaker on ARM, they > > don't need to do a ARCH specific 'fast' implementation for the > > difference to show up. > > But can they make "fast" implementation on ARM that would give stronger memory guarantees? Whatever for?
> -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Friday, September 1, 2017 10:13 PM > To: Reshetova, Elena <elena.reshetova@intel.com> > Cc: Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; linux- > fsdevel@vger.kernel.org; gregkh@linuxfoundation.org; viro@zeniv.linux.org.uk; > tj@kernel.org; mingo@redhat.com; hannes@cmpxchg.org; lizefan@huawei.com; > acme@kernel.org; alexander.shishkin@linux.intel.com; eparis@redhat.com; > akpm@linux-foundation.org; arnd@arndb.de; luto@kernel.org; > keescook@chromium.org; dvhart@infradead.org; ebiederm@xmission.com > Subject: Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t > > On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote: > > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote: > > > > > > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote: > > > > > > Actually on the second thought: does the above memory ordering > differences > > > > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the > way > > > > > > how it is currently implemented for x86 is the same way as it is for atomic > > > cases. > > > > > > > > > > Never look to x86 for memory ordering, its boring. > > > > > > > > > > And yes, for the ARM implementation it can certainly make a difference. > > > > > > > > So, yes, what I am trying to say is that it can really depend if you have > > > ARCH_HAS_REFCOUNT > > > > enabled or not and then also based on architecture. Thus I believe is also true > for > > > atomic: there > > > > might be differences when you use arch. dependent version of function or not. > > > > > > So the generic one in lib/refcount.c is already weaker on ARM, they > > > don't need to do a ARCH specific 'fast' implementation for the > > > difference to show up. > > > > But can they make "fast" implementation on ARM that would give stronger > memory guarantees? > > Whatever for? Well, maybe just by default when arch.-specific implementation is done. But I was just trying to speculate to understand. I will resend this one with new comment added. Still not sure if I need to resend the whole series with updated commits or break this up by individual patches further for the separate merges.
diff --git a/kernel/futex.c b/kernel/futex.c index 0939255..65460b4 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -67,6 +67,7 @@ #include <linux/freezer.h> #include <linux/bootmem.h> #include <linux/fault-inject.h> +#include <linux/refcount.h> #include <asm/futex.h> @@ -209,7 +210,7 @@ struct futex_pi_state { struct rt_mutex pi_mutex; struct task_struct *owner; - atomic_t refcount; + refcount_t refcount; union futex_key key; } __randomize_layout; @@ -795,7 +796,7 @@ static int refill_pi_state_cache(void) INIT_LIST_HEAD(&pi_state->list); /* pi_mutex gets initialized later */ pi_state->owner = NULL; - atomic_set(&pi_state->refcount, 1); + refcount_set(&pi_state->refcount, 1); pi_state->key = FUTEX_KEY_INIT; current->pi_state_cache = pi_state; @@ -815,7 +816,7 @@ static struct futex_pi_state *alloc_pi_state(void) static void get_pi_state(struct futex_pi_state *pi_state) { - WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount)); + WARN_ON_ONCE(!refcount_inc_not_zero(&pi_state->refcount)); } /* @@ -829,7 +830,7 @@ static void put_pi_state(struct futex_pi_state *pi_state) if (!pi_state) return; - if (!atomic_dec_and_test(&pi_state->refcount)) + if (!refcount_dec_and_test(&pi_state->refcount)) return; /* @@ -853,7 +854,7 @@ static void put_pi_state(struct futex_pi_state *pi_state) * refcount is at 0 - put it back to 1. */ pi_state->owner = NULL; - atomic_set(&pi_state->refcount, 1); + refcount_set(&pi_state->refcount, 1); current->pi_state_cache = pi_state; } } @@ -1051,7 +1052,7 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval, * and futex_wait_requeue_pi() as it cannot go to 0 and consequently * free pi_state before we can take a reference ourselves. */ - WARN_ON(!atomic_read(&pi_state->refcount)); + WARN_ON(!refcount_read(&pi_state->refcount)); /* * Now that we have a pi_state, we can acquire wait_lock