[v2] creds: Convert cred.usage to refcount_t

Message ID	20230818041740.gonna.513-kees@kernel.org (mailing list archive)
State	Not Applicable
Delegated to:	Netdev Maintainers
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E03A58BE2 for <netdev@vger.kernel.org>; Fri, 18 Aug 2023 04:17:47 +0000 (UTC) From: Kees Cook <keescook@chromium.org> To: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org>, Elena Reshetova <elena.reshetova@intel.com>, David Windsor <dwindsor@gmail.com>, Hans Liljestrand <ishkamiel@gmail.com>, Trond Myklebust <trond.myklebust@hammerspace.com>, Anna Schumaker <anna@kernel.org>, Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>, Olga Kornievskaia <kolga@netapp.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Sergey Senozhatsky <senozhatsky@chromium.org>, Andrew Morton <akpm@linux-foundation.org>, Alexey Gladkov <legion@kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>, Yu Zhao <yuzhao@google.com>, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, netdev@vger.kernel.org Subject: [PATCH v2] creds: Convert cred.usage to refcount_t Date: Thu, 17 Aug 2023 21:17:41 -0700 Message-Id: <20230818041740.gonna.513-kees@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2] creds: Convert cred.usage to refcount_t \| expand [v2] creds: Convert cred.usage to refcount_t

Context	Check	Description
netdev/tree_selection	success	Not a local patch

Kees Cook Aug. 18, 2023, 4:17 a.m. UTC

From: Elena Reshetova <elena.reshetova@intel.com>

atomic_t variables are currently used to implement reference counters
with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows and
underflows. This is important since overflows and underflows can lead
to use-after-free situation and be exploitable.

The variable cred.usage is used as pure reference counter. Convert it
to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in refcount.h have different
memory ordering guarantees than their atomic counterparts. Please check
Documentation/core-api/refcount-vs-atomic.rst for more information.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in some
rare cases it might matter.  Please double check that you don't have
some undocumented memory guarantees for this variable usage.

For the cred.usage it might make a difference in following places:
 - get_task_cred(): increment in refcount_inc_not_zero() only
   guarantees control dependency on success vs. fully ordered atomic
   counterpart
 - put_cred(): decrement in refcount_dec_and_test() only
   provides RELEASE ordering and ACQUIRE ordering on success vs. fully
   ordered atomic counterpart

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v2: rebase
v1: https://lore.kernel.org/lkml/20200612183450.4189588-4-keescook@chromium.org/
---
 include/linux/cred.h |  8 ++++----
 kernel/cred.c        | 42 +++++++++++++++++++++---------------------
 net/sunrpc/auth.c    |  2 +-
 3 files changed, 26 insertions(+), 26 deletions(-)

Andrew Morton Aug. 18, 2023, 5:55 p.m. UTC | #1

On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:

> From: Elena Reshetova <elena.reshetova@intel.com>
> 
> atomic_t variables are currently used to implement reference counters
> with the following properties:
>  - counter is initialized to 1 using atomic_set()
>  - a resource is freed upon counter reaching zero
>  - once counter reaches zero, its further
>    increments aren't allowed
>  - counter schema uses basic atomic operations
>    (set, inc, inc_not_zero, dec_and_test, etc.)
> 
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows and
> underflows. This is important since overflows and underflows can lead
> to use-after-free situation and be exploitable.

ie, if we have bugs which we have no reason to believe presently exist,
let's bloat and slow down the kernel just in case we add some in the
future?  Or something like that.  dangnabbit, that refcount_t.

x86_64 defconfig:

before:
   text	   data	    bss	    dec	    hex	filename
   3869	    552	      8	   4429	   114d	kernel/cred.o
   6140	    724	     16	   6880	   1ae0	net/sunrpc/auth.o

after:
   text	   data	    bss	    dec	    hex	filename
   4573	    552	      8	   5133	   140d	kernel/cred.o
   6236	    724	     16	   6976	   1b40	net/sunrpc/auth.o


Please explain, in a non handwavy and non cargoculty fashion why this
speed and space cost is justified.

Jann Horn Aug. 18, 2023, 6:17 p.m. UTC | #2

On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
>
> > From: Elena Reshetova <elena.reshetova@intel.com>
> >
> > atomic_t variables are currently used to implement reference counters
> > with the following properties:
> >  - counter is initialized to 1 using atomic_set()
> >  - a resource is freed upon counter reaching zero
> >  - once counter reaches zero, its further
> >    increments aren't allowed
> >  - counter schema uses basic atomic operations
> >    (set, inc, inc_not_zero, dec_and_test, etc.)
> >
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows and
> > underflows. This is important since overflows and underflows can lead
> > to use-after-free situation and be exploitable.
>
> ie, if we have bugs which we have no reason to believe presently exist,
> let's bloat and slow down the kernel just in case we add some in the
> future?

Yeah. Or in case we currently have some that we missed.

Though really we don't *just* need refcount_t to catch bugs; on a
system with enough RAM you can also overflow many 32-bit refcounts by
simply creating 2^32 actual references to an object. Depending on the
structure of objects that hold such refcounts, that can start
happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
becomes increasingly practical to do this with more objects if you
have significantly more RAM. I suppose you could avoid such issues by
putting a hard limit of 32 GiB on the amount of slab memory and
requiring that kernel object references are stored as pointers in slab
memory, or by making all the refcounts 64-bit.

Kees Cook Aug. 18, 2023, 6:46 p.m. UTC | #3

On Fri, Aug 18, 2023 at 10:55:42AM -0700, Andrew Morton wrote:
> On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> 
> > From: Elena Reshetova <elena.reshetova@intel.com>
> > 
> > atomic_t variables are currently used to implement reference counters
> > with the following properties:
> >  - counter is initialized to 1 using atomic_set()
> >  - a resource is freed upon counter reaching zero
> >  - once counter reaches zero, its further
> >    increments aren't allowed
> >  - counter schema uses basic atomic operations
> >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > 
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows and
> > underflows. This is important since overflows and underflows can lead
> > to use-after-free situation and be exploitable.
> 
> ie, if we have bugs which we have no reason to believe presently exist,
> let's bloat and slow down the kernel just in case we add some in the
> future?  Or something like that.  dangnabbit, that refcount_t.
> 
> x86_64 defconfig:
> 
> before:
>    text	   data	    bss	    dec	    hex	filename
>    3869	    552	      8	   4429	   114d	kernel/cred.o
>    6140	    724	     16	   6880	   1ae0	net/sunrpc/auth.o
> 
> after:
>    text	   data	    bss	    dec	    hex	filename
>    4573	    552	      8	   5133	   140d	kernel/cred.o
>    6236	    724	     16	   6976	   1b40	net/sunrpc/auth.o
> 
> 
> Please explain, in a non handwavy and non cargoculty fashion why this
> speed and space cost is justified.

Since it's introduction, refcount_t has found a lot of bugs. This is easy
to see even with a simplistic review of commits:

$ git log --date=short --pretty='format:%ad %C(auto)%h ("%s")' \
          --grep 'refcount_t:' | \
  cut -d- -f1 | sort | uniq -c
      1 2016
     15 2017
      9 2018
     23 2019
     24 2020
     18 2021
     24 2022
     10 2023

It's not really tapering off, either. All of these would have been silent
memory corruptions, etc. In the face of _what_ is being protected,
"cred" is pretty important for enforcing security boundaries, etc,
so having it still not protected is a weird choice we've implicitly
made. Given cred code is still seeing changes and novel uses (e.g.
io_uring), it's not unreasonable to protect it from detectable (and
_exploitable_) problems.

While the size differences look large in cred.o, it's basically limited
only to cred.o:

   text    data     bss     dec     hex filename
30515570        12255806        17190916        59962292        392f3b4 vmlinux.before
30517500        12255838        17190916        59964254        392fb5e vmlinux.after

And we've even seen performance _improvements_ in some conditions:
https://lore.kernel.org/lkml/20200615005732.GV12456@shao2-debian/

Looking at confirmed security flaws, exploitable reference counting
related bugs have dropped significantly. (The CVE database is irritating
to search, but most recent refcount-related CVEs are due to counts that
are _not_ using refcount_t.)

I'd rather ask the question, "Why should we _not_ protect cred lifetime
management?"

-Kees

Kees Cook Aug. 18, 2023, 6:48 p.m. UTC | #4

On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote:
> On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> >
> > > From: Elena Reshetova <elena.reshetova@intel.com>
> > >
> > > atomic_t variables are currently used to implement reference counters
> > > with the following properties:
> > >  - counter is initialized to 1 using atomic_set()
> > >  - a resource is freed upon counter reaching zero
> > >  - once counter reaches zero, its further
> > >    increments aren't allowed
> > >  - counter schema uses basic atomic operations
> > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > >
> > > Such atomic variables should be converted to a newly provided
> > > refcount_t type and API that prevents accidental counter overflows and
> > > underflows. This is important since overflows and underflows can lead
> > > to use-after-free situation and be exploitable.
> >
> > ie, if we have bugs which we have no reason to believe presently exist,
> > let's bloat and slow down the kernel just in case we add some in the
> > future?
> 
> Yeah. Or in case we currently have some that we missed.

Right, or to protect us against the _introduction_ of flaws.

> Though really we don't *just* need refcount_t to catch bugs; on a
> system with enough RAM you can also overflow many 32-bit refcounts by
> simply creating 2^32 actual references to an object. Depending on the
> structure of objects that hold such refcounts, that can start
> happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
> becomes increasingly practical to do this with more objects if you
> have significantly more RAM. I suppose you could avoid such issues by
> putting a hard limit of 32 GiB on the amount of slab memory and
> requiring that kernel object references are stored as pointers in slab
> memory, or by making all the refcounts 64-bit.

These problems are a different issue, and yes, the path out of it would
be to crank the size of refcount_t, etc.

Andrew Morton Aug. 18, 2023, 7:31 p.m. UTC | #5

On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote:

> On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote:
> > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> > >
> > > > From: Elena Reshetova <elena.reshetova@intel.com>
> > > >
> > > > atomic_t variables are currently used to implement reference counters
> > > > with the following properties:
> > > >  - counter is initialized to 1 using atomic_set()
> > > >  - a resource is freed upon counter reaching zero
> > > >  - once counter reaches zero, its further
> > > >    increments aren't allowed
> > > >  - counter schema uses basic atomic operations
> > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > >
> > > > Such atomic variables should be converted to a newly provided
> > > > refcount_t type and API that prevents accidental counter overflows and
> > > > underflows. This is important since overflows and underflows can lead
> > > > to use-after-free situation and be exploitable.
> > >
> > > ie, if we have bugs which we have no reason to believe presently exist,
> > > let's bloat and slow down the kernel just in case we add some in the
> > > future?
> > 
> > Yeah. Or in case we currently have some that we missed.
> 
> Right, or to protect us against the _introduction_ of flaws.

We could cheerfully add vast amounts of code to the kernel to check for
the future addition of bugs.  But we don't do that, because it would be
insane.

> > Though really we don't *just* need refcount_t to catch bugs; on a
> > system with enough RAM you can also overflow many 32-bit refcounts by
> > simply creating 2^32 actual references to an object. Depending on the
> > structure of objects that hold such refcounts, that can start
> > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
> > becomes increasingly practical to do this with more objects if you
> > have significantly more RAM. I suppose you could avoid such issues by
> > putting a hard limit of 32 GiB on the amount of slab memory and
> > requiring that kernel object references are stored as pointers in slab
> > memory, or by making all the refcounts 64-bit.
> 
> These problems are a different issue, and yes, the path out of it would
> be to crank the size of refcount_t, etc.

Is it possible for such overflows to occur in the cred code?  If so,
that's a bug.  Can we fix that cred bug without all this overhead? 
With a cc:stable backport.  If not then, again, what is the non
handwavy, non cargoculty justification for adding this overhead to
the kernel?

Jeff Layton Aug. 18, 2023, 8:10 p.m. UTC | #6

On Fri, 2023-08-18 at 12:31 -0700, Andrew Morton wrote:
> On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote:
> 
> > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote:
> > > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> > > > 
> > > > > From: Elena Reshetova <elena.reshetova@intel.com>
> > > > > 
> > > > > atomic_t variables are currently used to implement reference counters
> > > > > with the following properties:
> > > > >  - counter is initialized to 1 using atomic_set()
> > > > >  - a resource is freed upon counter reaching zero
> > > > >  - once counter reaches zero, its further
> > > > >    increments aren't allowed
> > > > >  - counter schema uses basic atomic operations
> > > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > > 
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows and
> > > > > underflows. This is important since overflows and underflows can lead
> > > > > to use-after-free situation and be exploitable.
> > > > 
> > > > ie, if we have bugs which we have no reason to believe presently exist,
> > > > let's bloat and slow down the kernel just in case we add some in the
> > > > future?
> > > 
> > > Yeah. Or in case we currently have some that we missed.
> > 
> > Right, or to protect us against the _introduction_ of flaws.
> 
> We could cheerfully add vast amounts of code to the kernel to check for
> the future addition of bugs.  But we don't do that, because it would be
> insane.
> 
> > > Though really we don't *just* need refcount_t to catch bugs; on a
> > > system with enough RAM you can also overflow many 32-bit refcounts by
> > > simply creating 2^32 actual references to an object. Depending on the
> > > structure of objects that hold such refcounts, that can start
> > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
> > > becomes increasingly practical to do this with more objects if you
> > > have significantly more RAM. I suppose you could avoid such issues by
> > > putting a hard limit of 32 GiB on the amount of slab memory and
> > > requiring that kernel object references are stored as pointers in slab
> > > memory, or by making all the refcounts 64-bit.
> > 
> > These problems are a different issue, and yes, the path out of it would
> > be to crank the size of refcount_t, etc.
> 
> Is it possible for such overflows to occur in the cred code?  If so,
> that's a bug.  Can we fix that cred bug without all this overhead? 
> With a cc:stable backport.  If not then, again, what is the non
> handwavy, non cargoculty justification for adding this overhead to
> the kernel?

It's not so much that the cred code itself is buggy, but the users of it
often have to deal with refcounting directly. Cred refcounting bugs can
be quite hard to even notice in the first place and are often hard to
track down.

That said...

With something like lockdep, you can turn it off at compile time and the
extra checks (supposedly) compile down to nothing. It should be possible
to build alternate refcount_t handling functions that are just wrappers
around atomic_t with no extra checks, for folks who want to really run
"fast and loose".

Jeff Layton Aug. 18, 2023, 8:12 p.m. UTC | #7

On Thu, 2023-08-17 at 21:17 -0700, Kees Cook wrote:
> From: Elena Reshetova <elena.reshetova@intel.com>
> 
> atomic_t variables are currently used to implement reference counters
> with the following properties:
>  - counter is initialized to 1 using atomic_set()
>  - a resource is freed upon counter reaching zero
>  - once counter reaches zero, its further
>    increments aren't allowed
>  - counter schema uses basic atomic operations
>    (set, inc, inc_not_zero, dec_and_test, etc.)
> 
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows and
> underflows. This is important since overflows and underflows can lead
> to use-after-free situation and be exploitable.
> 
> The variable cred.usage is used as pure reference counter. Convert it
> to refcount_t and fix up the operations.
> 
> **Important note for maintainers:
> 
> Some functions from refcount_t API defined in refcount.h have different
> memory ordering guarantees than their atomic counterparts. Please check
> Documentation/core-api/refcount-vs-atomic.rst for more information.
> 
> Normally the differences should not matter since refcount_t provides
> enough guarantees to satisfy the refcounting use cases, but in some
> rare cases it might matter.  Please double check that you don't have
> some undocumented memory guarantees for this variable usage.
> 
> For the cred.usage it might make a difference in following places:
>  - get_task_cred(): increment in refcount_inc_not_zero() only
>    guarantees control dependency on success vs. fully ordered atomic
>    counterpart
>  - put_cred(): decrement in refcount_dec_and_test() only
>    provides RELEASE ordering and ACQUIRE ordering on success vs. fully
>    ordered atomic counterpart
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> Reviewed-by: David Windsor <dwindsor@gmail.com>
> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v2: rebase
> v1: https://lore.kernel.org/lkml/20200612183450.4189588-4-keescook@chromium.org/
> ---
>  include/linux/cred.h |  8 ++++----
>  kernel/cred.c        | 42 +++++++++++++++++++++---------------------
>  net/sunrpc/auth.c    |  2 +-
>  3 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 8661f6294ad4..bf1c142afcec 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -109,7 +109,7 @@ static inline int groups_search(const struct group_info *group_info, kgid_t grp)
>   * same context as task->real_cred.
>   */
>  struct cred {
> -	atomic_t	usage;
> +	refcount_t	usage;
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	atomic_t	subscribers;	/* number of processes subscribed */
>  	void		*put_addr;
> @@ -229,7 +229,7 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred)
>   */
>  static inline struct cred *get_new_cred(struct cred *cred)
>  {
> -	atomic_inc(&cred->usage);
> +	refcount_inc(&cred->usage);
>  	return cred;
>  }
>  
> @@ -261,7 +261,7 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred)
>  	struct cred *nonconst_cred = (struct cred *) cred;
>  	if (!cred)
>  		return NULL;
> -	if (!atomic_inc_not_zero(&nonconst_cred->usage))
> +	if (!refcount_inc_not_zero(&nonconst_cred->usage))
>  		return NULL;
>  	validate_creds(cred);
>  	nonconst_cred->non_rcu = 0;
> @@ -285,7 +285,7 @@ static inline void put_cred(const struct cred *_cred)
>  
>  	if (cred) {
>  		validate_creds(cred);
> -		if (atomic_dec_and_test(&(cred)->usage))
> +		if (refcount_dec_and_test(&(cred)->usage))
>  			__put_cred(cred);
>  	}
>  }
> diff --git a/kernel/cred.c b/kernel/cred.c
> index bed458cfb812..33090c43bcac 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -39,7 +39,7 @@ static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
>   * The initial credentials for the initial task
>   */
>  struct cred init_cred = {
> -	.usage			= ATOMIC_INIT(4),
> +	.usage			= REFCOUNT_INIT(4),
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	.subscribers		= ATOMIC_INIT(2),
>  	.magic			= CRED_MAGIC,
> @@ -99,17 +99,17 @@ static void put_cred_rcu(struct rcu_head *rcu)
>  
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	if (cred->magic != CRED_MAGIC_DEAD ||
> -	    atomic_read(&cred->usage) != 0 ||
> +	    refcount_read(&cred->usage) != 0 ||
>  	    read_cred_subscribers(cred) != 0)
>  		panic("CRED: put_cred_rcu() sees %p with"
>  		      " mag %x, put %p, usage %d, subscr %d\n",
>  		      cred, cred->magic, cred->put_addr,
> -		      atomic_read(&cred->usage),
> +		      refcount_read(&cred->usage),
>  		      read_cred_subscribers(cred));
>  #else
> -	if (atomic_read(&cred->usage) != 0)
> +	if (refcount_read(&cred->usage) != 0)
>  		panic("CRED: put_cred_rcu() sees %p with usage %d\n",
> -		      cred, atomic_read(&cred->usage));
> +		      cred, refcount_read(&cred->usage));
>  #endif
>  
>  	security_cred_free(cred);
> @@ -135,10 +135,10 @@ static void put_cred_rcu(struct rcu_head *rcu)
>  void __put_cred(struct cred *cred)
>  {
>  	kdebug("__put_cred(%p{%d,%d})", cred,
> -	       atomic_read(&cred->usage),
> +	       refcount_read(&cred->usage),
>  	       read_cred_subscribers(cred));
>  
> -	BUG_ON(atomic_read(&cred->usage) != 0);
> +	BUG_ON(refcount_read(&cred->usage) != 0);
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	BUG_ON(read_cred_subscribers(cred) != 0);
>  	cred->magic = CRED_MAGIC_DEAD;
> @@ -162,7 +162,7 @@ void exit_creds(struct task_struct *tsk)
>  	struct cred *cred;
>  
>  	kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred,
> -	       atomic_read(&tsk->cred->usage),
> +	       refcount_read(&tsk->cred->usage),
>  	       read_cred_subscribers(tsk->cred));
>  
>  	cred = (struct cred *) tsk->real_cred;
> @@ -221,7 +221,7 @@ struct cred *cred_alloc_blank(void)
>  	if (!new)
>  		return NULL;
>  
> -	atomic_set(&new->usage, 1);
> +	refcount_set(&new->usage, 1);
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	new->magic = CRED_MAGIC;
>  #endif
> @@ -267,7 +267,7 @@ struct cred *prepare_creds(void)
>  	memcpy(new, old, sizeof(struct cred));
>  
>  	new->non_rcu = 0;
> -	atomic_set(&new->usage, 1);
> +	refcount_set(&new->usage, 1);
>  	set_cred_subscribers(new, 0);
>  	get_group_info(new->group_info);
>  	get_uid(new->user);
> @@ -356,7 +356,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>  		get_cred(p->cred);
>  		alter_cred_subscribers(p->cred, 2);
>  		kdebug("share_creds(%p{%d,%d})",
> -		       p->cred, atomic_read(&p->cred->usage),
> +		       p->cred, refcount_read(&p->cred->usage),
>  		       read_cred_subscribers(p->cred));
>  		inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>  		return 0;
> @@ -450,7 +450,7 @@ int commit_creds(struct cred *new)
>  	const struct cred *old = task->real_cred;
>  
>  	kdebug("commit_creds(%p{%d,%d})", new,
> -	       atomic_read(&new->usage),
> +	       refcount_read(&new->usage),
>  	       read_cred_subscribers(new));
>  
>  	BUG_ON(task->cred != old);
> @@ -459,7 +459,7 @@ int commit_creds(struct cred *new)
>  	validate_creds(old);
>  	validate_creds(new);
>  #endif
> -	BUG_ON(atomic_read(&new->usage) < 1);
> +	BUG_ON(refcount_read(&new->usage) < 1);
>  
>  	get_cred(new); /* we will require a ref for the subj creds too */
>  
> @@ -533,13 +533,13 @@ EXPORT_SYMBOL(commit_creds);
>  void abort_creds(struct cred *new)
>  {
>  	kdebug("abort_creds(%p{%d,%d})", new,
> -	       atomic_read(&new->usage),
> +	       refcount_read(&new->usage),
>  	       read_cred_subscribers(new));
>  
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	BUG_ON(read_cred_subscribers(new) != 0);
>  #endif
> -	BUG_ON(atomic_read(&new->usage) < 1);
> +	BUG_ON(refcount_read(&new->usage) < 1);
>  	put_cred(new);
>  }
>  EXPORT_SYMBOL(abort_creds);
> @@ -556,7 +556,7 @@ const struct cred *override_creds(const struct cred *new)
>  	const struct cred *old = current->cred;
>  
>  	kdebug("override_creds(%p{%d,%d})", new,
> -	       atomic_read(&new->usage),
> +	       refcount_read(&new->usage),
>  	       read_cred_subscribers(new));
>  
>  	validate_creds(old);
> @@ -579,7 +579,7 @@ const struct cred *override_creds(const struct cred *new)
>  	alter_cred_subscribers(old, -1);
>  
>  	kdebug("override_creds() = %p{%d,%d}", old,
> -	       atomic_read(&old->usage),
> +	       refcount_read(&old->usage),
>  	       read_cred_subscribers(old));
>  	return old;
>  }
> @@ -597,7 +597,7 @@ void revert_creds(const struct cred *old)
>  	const struct cred *override = current->cred;
>  
>  	kdebug("revert_creds(%p{%d,%d})", old,
> -	       atomic_read(&old->usage),
> +	       refcount_read(&old->usage),
>  	       read_cred_subscribers(old));
>  
>  	validate_creds(old);
> @@ -728,7 +728,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
>  
>  	*new = *old;
>  	new->non_rcu = 0;
> -	atomic_set(&new->usage, 1);
> +	refcount_set(&new->usage, 1);
>  	set_cred_subscribers(new, 0);
>  	get_uid(new->user);
>  	get_user_ns(new->user_ns);
> @@ -843,7 +843,7 @@ static void dump_invalid_creds(const struct cred *cred, const char *label,
>  	printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n",
>  	       cred->magic, cred->put_addr);
>  	printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n",
> -	       atomic_read(&cred->usage),
> +	       refcount_read(&cred->usage),
>  	       read_cred_subscribers(cred));
>  	printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n",
>  		from_kuid_munged(&init_user_ns, cred->uid),
> @@ -917,7 +917,7 @@ void validate_creds_for_do_exit(struct task_struct *tsk)
>  {
>  	kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})",
>  	       tsk->real_cred, tsk->cred,
> -	       atomic_read(&tsk->cred->usage),
> +	       refcount_read(&tsk->cred->usage),
>  	       read_cred_subscribers(tsk->cred));
>  
>  	__validate_process_creds(tsk, __FILE__, __LINE__);
> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
> index 2f16f9d17966..f9f406249e7d 100644
> --- a/net/sunrpc/auth.c
> +++ b/net/sunrpc/auth.c
> @@ -39,7 +39,7 @@ static LIST_HEAD(cred_unused);
>  static unsigned long number_cred_unused;
>  
>  static struct cred machine_cred = {
> -	.usage = ATOMIC_INIT(1),
> +	.usage = REFCOUNT_INIT(1),
>  #ifdef CONFIG_DEBUG_CREDENTIALS
>  	.magic = CRED_MAGIC,
>  #endif

I wonder what sort of bugs this will uncover.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

Kees Cook Aug. 18, 2023, 8:16 p.m. UTC | #8

On Fri, Aug 18, 2023 at 12:31:48PM -0700, Andrew Morton wrote:
> On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote:
> 
> > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote:
> > > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> > > >
> > > > > From: Elena Reshetova <elena.reshetova@intel.com>
> > > > >
> > > > > atomic_t variables are currently used to implement reference counters
> > > > > with the following properties:
> > > > >  - counter is initialized to 1 using atomic_set()
> > > > >  - a resource is freed upon counter reaching zero
> > > > >  - once counter reaches zero, its further
> > > > >    increments aren't allowed
> > > > >  - counter schema uses basic atomic operations
> > > > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > >
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows and
> > > > > underflows. This is important since overflows and underflows can lead
> > > > > to use-after-free situation and be exploitable.
> > > >
> > > > ie, if we have bugs which we have no reason to believe presently exist,
> > > > let's bloat and slow down the kernel just in case we add some in the
> > > > future?
> > > 
> > > Yeah. Or in case we currently have some that we missed.
> > 
> > Right, or to protect us against the _introduction_ of flaws.
> 
> We could cheerfully add vast amounts of code to the kernel to check for
> the future addition of bugs.  But we don't do that, because it would be
> insane.

This is a slippery-slope fallacy and doesn't apply. Yes, we don't add vast
amounts of code for that and that isn't the case here. This is fixing a
known weakness of using atomic reference counts, with a long history of
exploitation, on a struct used for enforcing security boundaries, solved
with the kernel's standard reference counting type. As I mentioned in
the other arm[1] of this thread, I think the question is better "Why is
this NOT refcount_t? What is the benefit, and why does that make struct
cred special?"

> > > Though really we don't *just* need refcount_t to catch bugs; on a
> > > system with enough RAM you can also overflow many 32-bit refcounts by
> > > simply creating 2^32 actual references to an object. Depending on the
> > > structure of objects that hold such refcounts, that can start
> > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
> > > becomes increasingly practical to do this with more objects if you
> > > have significantly more RAM. I suppose you could avoid such issues by
> > > putting a hard limit of 32 GiB on the amount of slab memory and
> > > requiring that kernel object references are stored as pointers in slab
> > > memory, or by making all the refcounts 64-bit.
> > 
> > These problems are a different issue, and yes, the path out of it would
> > be to crank the size of refcount_t, etc.
> 
> Is it possible for such overflows to occur in the cred code?  If so,
> that's a bug.  Can we fix that cred bug without all this overhead? 
> With a cc:stable backport.  If not then, again, what is the non
> handwavy, non cargoculty justification for adding this overhead to
> the kernel?

The only overhead is on slow-path for the error conditions. There is no
_known_ bug in the cred code today, but there might be unknown flaws,
or new flaws or unexpected reachability may be introduced in the future.
That's the whole point of making kernel code defensive. I've talked about
this (with lots of data to support it) at length before[2], mainly around
the lifetime of exploitable flaws: average lifetime is more than 5 years
and we keep introducing them in code that uses fragile types or ambiguous
language features. But I _haven't_ had to talk much about reference
counting since 2016 when we grew a proper type for it. :)

Let's get the stragglers fixed.

-Kees

[1] https://lore.kernel.org/lkml/202308181131.045F806@keescook/
[2] https://outflux.net/slides/2021/lss/kspp.pdf (see slides 4, 5, 6)
    https://outflux.net/slides/2019/lss/kspp.pdf (see slides 4, 5, 6)
    https://outflux.net/slides/2018/lss/kspp.pdf (see slides 3, 4)
    https://outflux.net/slides/2017/lss/kspp.pdf (see slides 5, 6, 13)
    https://outflux.net/slides/2017/ks/kspp.pdf (see slides 3, 4, 12)
    https://outflux.net/slides/2016/lss/kspp.pdf (see slides 5, 6, 12, 20)
    https://outflux.net/slides/2016/ks/kspp.pdf (see slides 17, 21)
    https://outflux.net/slides/2015/ks/security.pdf (see slides 4, 13)

David Windsor Aug. 18, 2023, 8:21 p.m. UTC | #9

Given this, I have no idea why this discussion is even being continued
further. These features have more than justified themselves. Shall we
speculate what may have happened had these self-protections not been
present? Of course not.

Also, with respect to a switch for turning this off, nobody is going
to want it. If they haven't yet requested it (this feature has been in
mainline for years), I seriously doubt anyone will care.


On Fri, Aug 18, 2023 at 2:46 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Aug 18, 2023 at 10:55:42AM -0700, Andrew Morton wrote:
> > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> >
> > > From: Elena Reshetova <elena.reshetova@intel.com>
> > >
> > > atomic_t variables are currently used to implement reference counters
> > > with the following properties:
> > >  - counter is initialized to 1 using atomic_set()
> > >  - a resource is freed upon counter reaching zero
> > >  - once counter reaches zero, its further
> > >    increments aren't allowed
> > >  - counter schema uses basic atomic operations
> > >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > >
> > > Such atomic variables should be converted to a newly provided
> > > refcount_t type and API that prevents accidental counter overflows and
> > > underflows. This is important since overflows and underflows can lead
> > > to use-after-free situation and be exploitable.
> >
> > ie, if we have bugs which we have no reason to believe presently exist,
> > let's bloat and slow down the kernel just in case we add some in the
> > future?  Or something like that.  dangnabbit, that refcount_t.
> >
> > x86_64 defconfig:
> >
> > before:
> >    text          data     bss     dec     hex filename
> >    3869           552       8    4429    114d kernel/cred.o
> >    6140           724      16    6880    1ae0 net/sunrpc/auth.o
> >
> > after:
> >    text          data     bss     dec     hex filename
> >    4573           552       8    5133    140d kernel/cred.o
> >    6236           724      16    6976    1b40 net/sunrpc/auth.o
> >
> >
> > Please explain, in a non handwavy and non cargoculty fashion why this
> > speed and space cost is justified.
>
> Since it's introduction, refcount_t has found a lot of bugs. This is easy
> to see even with a simplistic review of commits:
>
> $ git log --date=short --pretty='format:%ad %C(auto)%h ("%s")' \
>           --grep 'refcount_t:' | \
>   cut -d- -f1 | sort | uniq -c
>       1 2016
>      15 2017
>       9 2018
>      23 2019
>      24 2020
>      18 2021
>      24 2022
>      10 2023
>
> It's not really tapering off, either. All of these would have been silent
> memory corruptions, etc. In the face of _what_ is being protected,
> "cred" is pretty important for enforcing security boundaries, etc,
> so having it still not protected is a weird choice we've implicitly
> made. Given cred code is still seeing changes and novel uses (e.g.
> io_uring), it's not unreasonable to protect it from detectable (and
> _exploitable_) problems.
>
> While the size differences look large in cred.o, it's basically limited
> only to cred.o:
>
>    text    data     bss     dec     hex filename
> 30515570        12255806        17190916        59962292        392f3b4 vmlinux.before
> 30517500        12255838        17190916        59964254        392fb5e vmlinux.after
>
> And we've even seen performance _improvements_ in some conditions:
> https://lore.kernel.org/lkml/20200615005732.GV12456@shao2-debian/
>
> Looking at confirmed security flaws, exploitable reference counting
> related bugs have dropped significantly. (The CVE database is irritating
> to search, but most recent refcount-related CVEs are due to counts that
> are _not_ using refcount_t.)
>
> I'd rather ask the question, "Why should we _not_ protect cred lifetime
> management?"
>
> -Kees
>
> --
> Kees Cook

Kees Cook Aug. 18, 2023, 8:24 p.m. UTC | #10

On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote:
> [...]
> extra checks (supposedly) compile down to nothing. It should be possible
> to build alternate refcount_t handling functions that are just wrappers
> around atomic_t with no extra checks, for folks who want to really run
> "fast and loose".

No -- there's no benefit for this. We already did all this work years
ago with the fast vs full break-down. All that got tossed out since it
didn't matter. We did all the performance benchmarking and there was no
meaningful difference -- refcount _is_ atomic with an added check that
is branch-predicted away. Peter Zijlstra and Will Deacon spent a lot of
time making it run smoothly. :)

-Kees

Jann Horn Aug. 18, 2023, 8:54 p.m. UTC | #11

On Fri, Aug 18, 2023 at 9:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote:
>
> > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote:
> > > Though really we don't *just* need refcount_t to catch bugs; on a
> > > system with enough RAM you can also overflow many 32-bit refcounts by
> > > simply creating 2^32 actual references to an object. Depending on the
> > > structure of objects that hold such refcounts, that can start
> > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it
> > > becomes increasingly practical to do this with more objects if you
> > > have significantly more RAM. I suppose you could avoid such issues by
> > > putting a hard limit of 32 GiB on the amount of slab memory and
> > > requiring that kernel object references are stored as pointers in slab
> > > memory, or by making all the refcounts 64-bit.
> >
> > These problems are a different issue, and yes, the path out of it would
> > be to crank the size of refcount_t, etc.
>
> Is it possible for such overflows to occur in the cred code?  If so,
> that's a bug.  Can we fix that cred bug without all this overhead?

Dunno, probably depends on how much RAM you have and how the system is
configured? Like, it should get pretty easy to hit if you have around
44 TB of RAM, since I think the kernel will let you create around 2^32
instances of "struct file" at that point, and each file holds a
reference to the creator's "struct cred". If RLIMIT_NOFILE and
/proc/sys/kernel/pid_max are high enough, you could probably store
2^32 files in file descriptor table entries, spread out over a few ten
thousand processes but all pointing to the same struct cred, and
trigger an overflow of a cred refcount that way. But I haven't tried
that and there might be some other limit that prevents this somewhere.

If you have less RAM, you'd have to try harder to find some data
structure where the kernel doesn't impose such strict limits on
allocation as for files. io_uring requests can carry references to
creds, and I think you can probably make them block infinitely through
dependencies; I don't know how many io_uring requests you could have
in flight at a time. Eyeballing the io_uring code, it looks like this
might work at somewhere around 1 TB of slab memory usage if there
isn't some limit somewhere?

My point is that it's really hard to figure out how many references
you can have to an object that can have references from all over the
kernel unless there is a hard cap on the amount of memory in which
such references are stored or you're able to just refuse incrementing
the refcount when it gets too high. And so in my opinion it makes
sense to use a refcount type that is able to warn and (depending on
configuration) continue execution safely (except for leaking a little
bit of memory) even if it reaches its limit.

Eric W. Biederman Aug. 18, 2023, 9:07 p.m. UTC | #12

Kees Cook <keescook@chromium.org> writes:

> On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote:
>> [...]
>> extra checks (supposedly) compile down to nothing. It should be possible
>> to build alternate refcount_t handling functions that are just wrappers
>> around atomic_t with no extra checks, for folks who want to really run
>> "fast and loose".
>
> No -- there's no benefit for this. We already did all this work years
> ago with the fast vs full break-down. All that got tossed out since it
> didn't matter. We did all the performance benchmarking and there was no
> meaningful difference -- refcount _is_ atomic with an added check that
> is branch-predicted away. Peter Zijlstra and Will Deacon spent a lot of
> time making it run smoothly. :)

Since you did all of the work should the text size of be growing by a
kilobyte for this change?

Is that expected?

That is a valid concern with this change and it really should be
justified in the change long as someone brought it up.

Eric

David Laight Aug. 21, 2023, 10:18 a.m. UTC | #13

From: Kees Cook
> Sent: Friday, August 18, 2023 9:25 PM
> 
> On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote:
> > [...]
> > extra checks (supposedly) compile down to nothing. It should be possible
> > to build alternate refcount_t handling functions that are just wrappers
> > around atomic_t with no extra checks, for folks who want to really run
> > "fast and loose".
> 
> No -- there's no benefit for this. We already did all this work years
> ago with the fast vs full break-down. All that got tossed out since it
> didn't matter. We did all the performance benchmarking and there was no
> meaningful difference -- refcount _is_ atomic with an added check that
> is branch-predicted away.

Hmmm IIRC recent Intel x86 cpu never do static branch prediction.
So you can't avoid mis-predicted branches in cold code.

	David

> Peter Zijlstra and Will Deacon spent a lot of
> time making it run smoothly. :)
> 
> -Kees
> 
> --
> Kees Cook

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[v2] creds: Convert cred.usage to refcount_t

Checks

Commit Message

Comments

Patch