Message ID | 20230818041740.gonna.513-kees@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] creds: Convert cred.usage to refcount_t | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > From: Elena Reshetova <elena.reshetova@intel.com> > > atomic_t variables are currently used to implement reference counters > with the following properties: > - counter is initialized to 1 using atomic_set() > - a resource is freed upon counter reaching zero > - once counter reaches zero, its further > increments aren't allowed > - counter schema uses basic atomic operations > (set, inc, inc_not_zero, dec_and_test, etc.) > > Such atomic variables should be converted to a newly provided > refcount_t type and API that prevents accidental counter overflows and > underflows. This is important since overflows and underflows can lead > to use-after-free situation and be exploitable. ie, if we have bugs which we have no reason to believe presently exist, let's bloat and slow down the kernel just in case we add some in the future? Or something like that. dangnabbit, that refcount_t. x86_64 defconfig: before: text data bss dec hex filename 3869 552 8 4429 114d kernel/cred.o 6140 724 16 6880 1ae0 net/sunrpc/auth.o after: text data bss dec hex filename 4573 552 8 5133 140d kernel/cred.o 6236 724 16 6976 1b40 net/sunrpc/auth.o Please explain, in a non handwavy and non cargoculty fashion why this speed and space cost is justified.
On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > atomic_t variables are currently used to implement reference counters > > with the following properties: > > - counter is initialized to 1 using atomic_set() > > - a resource is freed upon counter reaching zero > > - once counter reaches zero, its further > > increments aren't allowed > > - counter schema uses basic atomic operations > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > Such atomic variables should be converted to a newly provided > > refcount_t type and API that prevents accidental counter overflows and > > underflows. This is important since overflows and underflows can lead > > to use-after-free situation and be exploitable. > > ie, if we have bugs which we have no reason to believe presently exist, > let's bloat and slow down the kernel just in case we add some in the > future? Yeah. Or in case we currently have some that we missed. Though really we don't *just* need refcount_t to catch bugs; on a system with enough RAM you can also overflow many 32-bit refcounts by simply creating 2^32 actual references to an object. Depending on the structure of objects that hold such refcounts, that can start happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it becomes increasingly practical to do this with more objects if you have significantly more RAM. I suppose you could avoid such issues by putting a hard limit of 32 GiB on the amount of slab memory and requiring that kernel object references are stored as pointers in slab memory, or by making all the refcounts 64-bit.
On Fri, Aug 18, 2023 at 10:55:42AM -0700, Andrew Morton wrote: > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > atomic_t variables are currently used to implement reference counters > > with the following properties: > > - counter is initialized to 1 using atomic_set() > > - a resource is freed upon counter reaching zero > > - once counter reaches zero, its further > > increments aren't allowed > > - counter schema uses basic atomic operations > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > Such atomic variables should be converted to a newly provided > > refcount_t type and API that prevents accidental counter overflows and > > underflows. This is important since overflows and underflows can lead > > to use-after-free situation and be exploitable. > > ie, if we have bugs which we have no reason to believe presently exist, > let's bloat and slow down the kernel just in case we add some in the > future? Or something like that. dangnabbit, that refcount_t. > > x86_64 defconfig: > > before: > text data bss dec hex filename > 3869 552 8 4429 114d kernel/cred.o > 6140 724 16 6880 1ae0 net/sunrpc/auth.o > > after: > text data bss dec hex filename > 4573 552 8 5133 140d kernel/cred.o > 6236 724 16 6976 1b40 net/sunrpc/auth.o > > > Please explain, in a non handwavy and non cargoculty fashion why this > speed and space cost is justified. Since it's introduction, refcount_t has found a lot of bugs. This is easy to see even with a simplistic review of commits: $ git log --date=short --pretty='format:%ad %C(auto)%h ("%s")' \ --grep 'refcount_t:' | \ cut -d- -f1 | sort | uniq -c 1 2016 15 2017 9 2018 23 2019 24 2020 18 2021 24 2022 10 2023 It's not really tapering off, either. All of these would have been silent memory corruptions, etc. In the face of _what_ is being protected, "cred" is pretty important for enforcing security boundaries, etc, so having it still not protected is a weird choice we've implicitly made. Given cred code is still seeing changes and novel uses (e.g. io_uring), it's not unreasonable to protect it from detectable (and _exploitable_) problems. While the size differences look large in cred.o, it's basically limited only to cred.o: text data bss dec hex filename 30515570 12255806 17190916 59962292 392f3b4 vmlinux.before 30517500 12255838 17190916 59964254 392fb5e vmlinux.after And we've even seen performance _improvements_ in some conditions: https://lore.kernel.org/lkml/20200615005732.GV12456@shao2-debian/ Looking at confirmed security flaws, exploitable reference counting related bugs have dropped significantly. (The CVE database is irritating to search, but most recent refcount-related CVEs are due to counts that are _not_ using refcount_t.) I'd rather ask the question, "Why should we _not_ protect cred lifetime management?" -Kees
On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote: > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > > > atomic_t variables are currently used to implement reference counters > > > with the following properties: > > > - counter is initialized to 1 using atomic_set() > > > - a resource is freed upon counter reaching zero > > > - once counter reaches zero, its further > > > increments aren't allowed > > > - counter schema uses basic atomic operations > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > Such atomic variables should be converted to a newly provided > > > refcount_t type and API that prevents accidental counter overflows and > > > underflows. This is important since overflows and underflows can lead > > > to use-after-free situation and be exploitable. > > > > ie, if we have bugs which we have no reason to believe presently exist, > > let's bloat and slow down the kernel just in case we add some in the > > future? > > Yeah. Or in case we currently have some that we missed. Right, or to protect us against the _introduction_ of flaws. > Though really we don't *just* need refcount_t to catch bugs; on a > system with enough RAM you can also overflow many 32-bit refcounts by > simply creating 2^32 actual references to an object. Depending on the > structure of objects that hold such refcounts, that can start > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it > becomes increasingly practical to do this with more objects if you > have significantly more RAM. I suppose you could avoid such issues by > putting a hard limit of 32 GiB on the amount of slab memory and > requiring that kernel object references are stored as pointers in slab > memory, or by making all the refcounts 64-bit. These problems are a different issue, and yes, the path out of it would be to crank the size of refcount_t, etc.
On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote: > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote: > > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > > > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > > > > > atomic_t variables are currently used to implement reference counters > > > > with the following properties: > > > > - counter is initialized to 1 using atomic_set() > > > > - a resource is freed upon counter reaching zero > > > > - once counter reaches zero, its further > > > > increments aren't allowed > > > > - counter schema uses basic atomic operations > > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > > > Such atomic variables should be converted to a newly provided > > > > refcount_t type and API that prevents accidental counter overflows and > > > > underflows. This is important since overflows and underflows can lead > > > > to use-after-free situation and be exploitable. > > > > > > ie, if we have bugs which we have no reason to believe presently exist, > > > let's bloat and slow down the kernel just in case we add some in the > > > future? > > > > Yeah. Or in case we currently have some that we missed. > > Right, or to protect us against the _introduction_ of flaws. We could cheerfully add vast amounts of code to the kernel to check for the future addition of bugs. But we don't do that, because it would be insane. > > Though really we don't *just* need refcount_t to catch bugs; on a > > system with enough RAM you can also overflow many 32-bit refcounts by > > simply creating 2^32 actual references to an object. Depending on the > > structure of objects that hold such refcounts, that can start > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it > > becomes increasingly practical to do this with more objects if you > > have significantly more RAM. I suppose you could avoid such issues by > > putting a hard limit of 32 GiB on the amount of slab memory and > > requiring that kernel object references are stored as pointers in slab > > memory, or by making all the refcounts 64-bit. > > These problems are a different issue, and yes, the path out of it would > be to crank the size of refcount_t, etc. Is it possible for such overflows to occur in the cred code? If so, that's a bug. Can we fix that cred bug without all this overhead? With a cc:stable backport. If not then, again, what is the non handwavy, non cargoculty justification for adding this overhead to the kernel?
On Fri, 2023-08-18 at 12:31 -0700, Andrew Morton wrote: > On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote: > > > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote: > > > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > > > > > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > > > > > > > atomic_t variables are currently used to implement reference counters > > > > > with the following properties: > > > > > - counter is initialized to 1 using atomic_set() > > > > > - a resource is freed upon counter reaching zero > > > > > - once counter reaches zero, its further > > > > > increments aren't allowed > > > > > - counter schema uses basic atomic operations > > > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > > > > > Such atomic variables should be converted to a newly provided > > > > > refcount_t type and API that prevents accidental counter overflows and > > > > > underflows. This is important since overflows and underflows can lead > > > > > to use-after-free situation and be exploitable. > > > > > > > > ie, if we have bugs which we have no reason to believe presently exist, > > > > let's bloat and slow down the kernel just in case we add some in the > > > > future? > > > > > > Yeah. Or in case we currently have some that we missed. > > > > Right, or to protect us against the _introduction_ of flaws. > > We could cheerfully add vast amounts of code to the kernel to check for > the future addition of bugs. But we don't do that, because it would be > insane. > > > > Though really we don't *just* need refcount_t to catch bugs; on a > > > system with enough RAM you can also overflow many 32-bit refcounts by > > > simply creating 2^32 actual references to an object. Depending on the > > > structure of objects that hold such refcounts, that can start > > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it > > > becomes increasingly practical to do this with more objects if you > > > have significantly more RAM. I suppose you could avoid such issues by > > > putting a hard limit of 32 GiB on the amount of slab memory and > > > requiring that kernel object references are stored as pointers in slab > > > memory, or by making all the refcounts 64-bit. > > > > These problems are a different issue, and yes, the path out of it would > > be to crank the size of refcount_t, etc. > > Is it possible for such overflows to occur in the cred code? If so, > that's a bug. Can we fix that cred bug without all this overhead? > With a cc:stable backport. If not then, again, what is the non > handwavy, non cargoculty justification for adding this overhead to > the kernel? It's not so much that the cred code itself is buggy, but the users of it often have to deal with refcounting directly. Cred refcounting bugs can be quite hard to even notice in the first place and are often hard to track down. That said... With something like lockdep, you can turn it off at compile time and the extra checks (supposedly) compile down to nothing. It should be possible to build alternate refcount_t handling functions that are just wrappers around atomic_t with no extra checks, for folks who want to really run "fast and loose".
On Thu, 2023-08-17 at 21:17 -0700, Kees Cook wrote: > From: Elena Reshetova <elena.reshetova@intel.com> > > atomic_t variables are currently used to implement reference counters > with the following properties: > - counter is initialized to 1 using atomic_set() > - a resource is freed upon counter reaching zero > - once counter reaches zero, its further > increments aren't allowed > - counter schema uses basic atomic operations > (set, inc, inc_not_zero, dec_and_test, etc.) > > Such atomic variables should be converted to a newly provided > refcount_t type and API that prevents accidental counter overflows and > underflows. This is important since overflows and underflows can lead > to use-after-free situation and be exploitable. > > The variable cred.usage is used as pure reference counter. Convert it > to refcount_t and fix up the operations. > > **Important note for maintainers: > > Some functions from refcount_t API defined in refcount.h have different > memory ordering guarantees than their atomic counterparts. Please check > Documentation/core-api/refcount-vs-atomic.rst for more information. > > Normally the differences should not matter since refcount_t provides > enough guarantees to satisfy the refcounting use cases, but in some > rare cases it might matter. Please double check that you don't have > some undocumented memory guarantees for this variable usage. > > For the cred.usage it might make a difference in following places: > - get_task_cred(): increment in refcount_inc_not_zero() only > guarantees control dependency on success vs. fully ordered atomic > counterpart > - put_cred(): decrement in refcount_dec_and_test() only > provides RELEASE ordering and ACQUIRE ordering on success vs. fully > ordered atomic counterpart > > Suggested-by: Kees Cook <keescook@chromium.org> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > Reviewed-by: David Windsor <dwindsor@gmail.com> > Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > v2: rebase > v1: https://lore.kernel.org/lkml/20200612183450.4189588-4-keescook@chromium.org/ > --- > include/linux/cred.h | 8 ++++---- > kernel/cred.c | 42 +++++++++++++++++++++--------------------- > net/sunrpc/auth.c | 2 +- > 3 files changed, 26 insertions(+), 26 deletions(-) > > diff --git a/include/linux/cred.h b/include/linux/cred.h > index 8661f6294ad4..bf1c142afcec 100644 > --- a/include/linux/cred.h > +++ b/include/linux/cred.h > @@ -109,7 +109,7 @@ static inline int groups_search(const struct group_info *group_info, kgid_t grp) > * same context as task->real_cred. > */ > struct cred { > - atomic_t usage; > + refcount_t usage; > #ifdef CONFIG_DEBUG_CREDENTIALS > atomic_t subscribers; /* number of processes subscribed */ > void *put_addr; > @@ -229,7 +229,7 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred) > */ > static inline struct cred *get_new_cred(struct cred *cred) > { > - atomic_inc(&cred->usage); > + refcount_inc(&cred->usage); > return cred; > } > > @@ -261,7 +261,7 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred) > struct cred *nonconst_cred = (struct cred *) cred; > if (!cred) > return NULL; > - if (!atomic_inc_not_zero(&nonconst_cred->usage)) > + if (!refcount_inc_not_zero(&nonconst_cred->usage)) > return NULL; > validate_creds(cred); > nonconst_cred->non_rcu = 0; > @@ -285,7 +285,7 @@ static inline void put_cred(const struct cred *_cred) > > if (cred) { > validate_creds(cred); > - if (atomic_dec_and_test(&(cred)->usage)) > + if (refcount_dec_and_test(&(cred)->usage)) > __put_cred(cred); > } > } > diff --git a/kernel/cred.c b/kernel/cred.c > index bed458cfb812..33090c43bcac 100644 > --- a/kernel/cred.c > +++ b/kernel/cred.c > @@ -39,7 +39,7 @@ static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) }; > * The initial credentials for the initial task > */ > struct cred init_cred = { > - .usage = ATOMIC_INIT(4), > + .usage = REFCOUNT_INIT(4), > #ifdef CONFIG_DEBUG_CREDENTIALS > .subscribers = ATOMIC_INIT(2), > .magic = CRED_MAGIC, > @@ -99,17 +99,17 @@ static void put_cred_rcu(struct rcu_head *rcu) > > #ifdef CONFIG_DEBUG_CREDENTIALS > if (cred->magic != CRED_MAGIC_DEAD || > - atomic_read(&cred->usage) != 0 || > + refcount_read(&cred->usage) != 0 || > read_cred_subscribers(cred) != 0) > panic("CRED: put_cred_rcu() sees %p with" > " mag %x, put %p, usage %d, subscr %d\n", > cred, cred->magic, cred->put_addr, > - atomic_read(&cred->usage), > + refcount_read(&cred->usage), > read_cred_subscribers(cred)); > #else > - if (atomic_read(&cred->usage) != 0) > + if (refcount_read(&cred->usage) != 0) > panic("CRED: put_cred_rcu() sees %p with usage %d\n", > - cred, atomic_read(&cred->usage)); > + cred, refcount_read(&cred->usage)); > #endif > > security_cred_free(cred); > @@ -135,10 +135,10 @@ static void put_cred_rcu(struct rcu_head *rcu) > void __put_cred(struct cred *cred) > { > kdebug("__put_cred(%p{%d,%d})", cred, > - atomic_read(&cred->usage), > + refcount_read(&cred->usage), > read_cred_subscribers(cred)); > > - BUG_ON(atomic_read(&cred->usage) != 0); > + BUG_ON(refcount_read(&cred->usage) != 0); > #ifdef CONFIG_DEBUG_CREDENTIALS > BUG_ON(read_cred_subscribers(cred) != 0); > cred->magic = CRED_MAGIC_DEAD; > @@ -162,7 +162,7 @@ void exit_creds(struct task_struct *tsk) > struct cred *cred; > > kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred, > - atomic_read(&tsk->cred->usage), > + refcount_read(&tsk->cred->usage), > read_cred_subscribers(tsk->cred)); > > cred = (struct cred *) tsk->real_cred; > @@ -221,7 +221,7 @@ struct cred *cred_alloc_blank(void) > if (!new) > return NULL; > > - atomic_set(&new->usage, 1); > + refcount_set(&new->usage, 1); > #ifdef CONFIG_DEBUG_CREDENTIALS > new->magic = CRED_MAGIC; > #endif > @@ -267,7 +267,7 @@ struct cred *prepare_creds(void) > memcpy(new, old, sizeof(struct cred)); > > new->non_rcu = 0; > - atomic_set(&new->usage, 1); > + refcount_set(&new->usage, 1); > set_cred_subscribers(new, 0); > get_group_info(new->group_info); > get_uid(new->user); > @@ -356,7 +356,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) > get_cred(p->cred); > alter_cred_subscribers(p->cred, 2); > kdebug("share_creds(%p{%d,%d})", > - p->cred, atomic_read(&p->cred->usage), > + p->cred, refcount_read(&p->cred->usage), > read_cred_subscribers(p->cred)); > inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > return 0; > @@ -450,7 +450,7 @@ int commit_creds(struct cred *new) > const struct cred *old = task->real_cred; > > kdebug("commit_creds(%p{%d,%d})", new, > - atomic_read(&new->usage), > + refcount_read(&new->usage), > read_cred_subscribers(new)); > > BUG_ON(task->cred != old); > @@ -459,7 +459,7 @@ int commit_creds(struct cred *new) > validate_creds(old); > validate_creds(new); > #endif > - BUG_ON(atomic_read(&new->usage) < 1); > + BUG_ON(refcount_read(&new->usage) < 1); > > get_cred(new); /* we will require a ref for the subj creds too */ > > @@ -533,13 +533,13 @@ EXPORT_SYMBOL(commit_creds); > void abort_creds(struct cred *new) > { > kdebug("abort_creds(%p{%d,%d})", new, > - atomic_read(&new->usage), > + refcount_read(&new->usage), > read_cred_subscribers(new)); > > #ifdef CONFIG_DEBUG_CREDENTIALS > BUG_ON(read_cred_subscribers(new) != 0); > #endif > - BUG_ON(atomic_read(&new->usage) < 1); > + BUG_ON(refcount_read(&new->usage) < 1); > put_cred(new); > } > EXPORT_SYMBOL(abort_creds); > @@ -556,7 +556,7 @@ const struct cred *override_creds(const struct cred *new) > const struct cred *old = current->cred; > > kdebug("override_creds(%p{%d,%d})", new, > - atomic_read(&new->usage), > + refcount_read(&new->usage), > read_cred_subscribers(new)); > > validate_creds(old); > @@ -579,7 +579,7 @@ const struct cred *override_creds(const struct cred *new) > alter_cred_subscribers(old, -1); > > kdebug("override_creds() = %p{%d,%d}", old, > - atomic_read(&old->usage), > + refcount_read(&old->usage), > read_cred_subscribers(old)); > return old; > } > @@ -597,7 +597,7 @@ void revert_creds(const struct cred *old) > const struct cred *override = current->cred; > > kdebug("revert_creds(%p{%d,%d})", old, > - atomic_read(&old->usage), > + refcount_read(&old->usage), > read_cred_subscribers(old)); > > validate_creds(old); > @@ -728,7 +728,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon) > > *new = *old; > new->non_rcu = 0; > - atomic_set(&new->usage, 1); > + refcount_set(&new->usage, 1); > set_cred_subscribers(new, 0); > get_uid(new->user); > get_user_ns(new->user_ns); > @@ -843,7 +843,7 @@ static void dump_invalid_creds(const struct cred *cred, const char *label, > printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n", > cred->magic, cred->put_addr); > printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n", > - atomic_read(&cred->usage), > + refcount_read(&cred->usage), > read_cred_subscribers(cred)); > printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n", > from_kuid_munged(&init_user_ns, cred->uid), > @@ -917,7 +917,7 @@ void validate_creds_for_do_exit(struct task_struct *tsk) > { > kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})", > tsk->real_cred, tsk->cred, > - atomic_read(&tsk->cred->usage), > + refcount_read(&tsk->cred->usage), > read_cred_subscribers(tsk->cred)); > > __validate_process_creds(tsk, __FILE__, __LINE__); > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c > index 2f16f9d17966..f9f406249e7d 100644 > --- a/net/sunrpc/auth.c > +++ b/net/sunrpc/auth.c > @@ -39,7 +39,7 @@ static LIST_HEAD(cred_unused); > static unsigned long number_cred_unused; > > static struct cred machine_cred = { > - .usage = ATOMIC_INIT(1), > + .usage = REFCOUNT_INIT(1), > #ifdef CONFIG_DEBUG_CREDENTIALS > .magic = CRED_MAGIC, > #endif I wonder what sort of bugs this will uncover. Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Fri, Aug 18, 2023 at 12:31:48PM -0700, Andrew Morton wrote: > On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote: > > > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote: > > > On Fri, Aug 18, 2023 at 7:56 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > > > > > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > > > > > > > atomic_t variables are currently used to implement reference counters > > > > > with the following properties: > > > > > - counter is initialized to 1 using atomic_set() > > > > > - a resource is freed upon counter reaching zero > > > > > - once counter reaches zero, its further > > > > > increments aren't allowed > > > > > - counter schema uses basic atomic operations > > > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > > > > > Such atomic variables should be converted to a newly provided > > > > > refcount_t type and API that prevents accidental counter overflows and > > > > > underflows. This is important since overflows and underflows can lead > > > > > to use-after-free situation and be exploitable. > > > > > > > > ie, if we have bugs which we have no reason to believe presently exist, > > > > let's bloat and slow down the kernel just in case we add some in the > > > > future? > > > > > > Yeah. Or in case we currently have some that we missed. > > > > Right, or to protect us against the _introduction_ of flaws. > > We could cheerfully add vast amounts of code to the kernel to check for > the future addition of bugs. But we don't do that, because it would be > insane. This is a slippery-slope fallacy and doesn't apply. Yes, we don't add vast amounts of code for that and that isn't the case here. This is fixing a known weakness of using atomic reference counts, with a long history of exploitation, on a struct used for enforcing security boundaries, solved with the kernel's standard reference counting type. As I mentioned in the other arm[1] of this thread, I think the question is better "Why is this NOT refcount_t? What is the benefit, and why does that make struct cred special?" > > > Though really we don't *just* need refcount_t to catch bugs; on a > > > system with enough RAM you can also overflow many 32-bit refcounts by > > > simply creating 2^32 actual references to an object. Depending on the > > > structure of objects that hold such refcounts, that can start > > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it > > > becomes increasingly practical to do this with more objects if you > > > have significantly more RAM. I suppose you could avoid such issues by > > > putting a hard limit of 32 GiB on the amount of slab memory and > > > requiring that kernel object references are stored as pointers in slab > > > memory, or by making all the refcounts 64-bit. > > > > These problems are a different issue, and yes, the path out of it would > > be to crank the size of refcount_t, etc. > > Is it possible for such overflows to occur in the cred code? If so, > that's a bug. Can we fix that cred bug without all this overhead? > With a cc:stable backport. If not then, again, what is the non > handwavy, non cargoculty justification for adding this overhead to > the kernel? The only overhead is on slow-path for the error conditions. There is no _known_ bug in the cred code today, but there might be unknown flaws, or new flaws or unexpected reachability may be introduced in the future. That's the whole point of making kernel code defensive. I've talked about this (with lots of data to support it) at length before[2], mainly around the lifetime of exploitable flaws: average lifetime is more than 5 years and we keep introducing them in code that uses fragile types or ambiguous language features. But I _haven't_ had to talk much about reference counting since 2016 when we grew a proper type for it. :) Let's get the stragglers fixed. -Kees [1] https://lore.kernel.org/lkml/202308181131.045F806@keescook/ [2] https://outflux.net/slides/2021/lss/kspp.pdf (see slides 4, 5, 6) https://outflux.net/slides/2019/lss/kspp.pdf (see slides 4, 5, 6) https://outflux.net/slides/2018/lss/kspp.pdf (see slides 3, 4) https://outflux.net/slides/2017/lss/kspp.pdf (see slides 5, 6, 13) https://outflux.net/slides/2017/ks/kspp.pdf (see slides 3, 4, 12) https://outflux.net/slides/2016/lss/kspp.pdf (see slides 5, 6, 12, 20) https://outflux.net/slides/2016/ks/kspp.pdf (see slides 17, 21) https://outflux.net/slides/2015/ks/security.pdf (see slides 4, 13)
Given this, I have no idea why this discussion is even being continued further. These features have more than justified themselves. Shall we speculate what may have happened had these self-protections not been present? Of course not. Also, with respect to a switch for turning this off, nobody is going to want it. If they haven't yet requested it (this feature has been in mainline for years), I seriously doubt anyone will care. On Fri, Aug 18, 2023 at 2:46 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Aug 18, 2023 at 10:55:42AM -0700, Andrew Morton wrote: > > On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@chromium.org> wrote: > > > > > From: Elena Reshetova <elena.reshetova@intel.com> > > > > > > atomic_t variables are currently used to implement reference counters > > > with the following properties: > > > - counter is initialized to 1 using atomic_set() > > > - a resource is freed upon counter reaching zero > > > - once counter reaches zero, its further > > > increments aren't allowed > > > - counter schema uses basic atomic operations > > > (set, inc, inc_not_zero, dec_and_test, etc.) > > > > > > Such atomic variables should be converted to a newly provided > > > refcount_t type and API that prevents accidental counter overflows and > > > underflows. This is important since overflows and underflows can lead > > > to use-after-free situation and be exploitable. > > > > ie, if we have bugs which we have no reason to believe presently exist, > > let's bloat and slow down the kernel just in case we add some in the > > future? Or something like that. dangnabbit, that refcount_t. > > > > x86_64 defconfig: > > > > before: > > text data bss dec hex filename > > 3869 552 8 4429 114d kernel/cred.o > > 6140 724 16 6880 1ae0 net/sunrpc/auth.o > > > > after: > > text data bss dec hex filename > > 4573 552 8 5133 140d kernel/cred.o > > 6236 724 16 6976 1b40 net/sunrpc/auth.o > > > > > > Please explain, in a non handwavy and non cargoculty fashion why this > > speed and space cost is justified. > > Since it's introduction, refcount_t has found a lot of bugs. This is easy > to see even with a simplistic review of commits: > > $ git log --date=short --pretty='format:%ad %C(auto)%h ("%s")' \ > --grep 'refcount_t:' | \ > cut -d- -f1 | sort | uniq -c > 1 2016 > 15 2017 > 9 2018 > 23 2019 > 24 2020 > 18 2021 > 24 2022 > 10 2023 > > It's not really tapering off, either. All of these would have been silent > memory corruptions, etc. In the face of _what_ is being protected, > "cred" is pretty important for enforcing security boundaries, etc, > so having it still not protected is a weird choice we've implicitly > made. Given cred code is still seeing changes and novel uses (e.g. > io_uring), it's not unreasonable to protect it from detectable (and > _exploitable_) problems. > > While the size differences look large in cred.o, it's basically limited > only to cred.o: > > text data bss dec hex filename > 30515570 12255806 17190916 59962292 392f3b4 vmlinux.before > 30517500 12255838 17190916 59964254 392fb5e vmlinux.after > > And we've even seen performance _improvements_ in some conditions: > https://lore.kernel.org/lkml/20200615005732.GV12456@shao2-debian/ > > Looking at confirmed security flaws, exploitable reference counting > related bugs have dropped significantly. (The CVE database is irritating > to search, but most recent refcount-related CVEs are due to counts that > are _not_ using refcount_t.) > > I'd rather ask the question, "Why should we _not_ protect cred lifetime > management?" > > -Kees > > -- > Kees Cook
On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote: > [...] > extra checks (supposedly) compile down to nothing. It should be possible > to build alternate refcount_t handling functions that are just wrappers > around atomic_t with no extra checks, for folks who want to really run > "fast and loose". No -- there's no benefit for this. We already did all this work years ago with the fast vs full break-down. All that got tossed out since it didn't matter. We did all the performance benchmarking and there was no meaningful difference -- refcount _is_ atomic with an added check that is branch-predicted away. Peter Zijlstra and Will Deacon spent a lot of time making it run smoothly. :) -Kees
On Fri, Aug 18, 2023 at 9:31 PM Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 18 Aug 2023 11:48:16 -0700 Kees Cook <keescook@chromium.org> wrote: > > > On Fri, Aug 18, 2023 at 08:17:55PM +0200, Jann Horn wrote: > > > Though really we don't *just* need refcount_t to catch bugs; on a > > > system with enough RAM you can also overflow many 32-bit refcounts by > > > simply creating 2^32 actual references to an object. Depending on the > > > structure of objects that hold such refcounts, that can start > > > happening at around 2^32 * 8 bytes = 32 GiB memory usage, and it > > > becomes increasingly practical to do this with more objects if you > > > have significantly more RAM. I suppose you could avoid such issues by > > > putting a hard limit of 32 GiB on the amount of slab memory and > > > requiring that kernel object references are stored as pointers in slab > > > memory, or by making all the refcounts 64-bit. > > > > These problems are a different issue, and yes, the path out of it would > > be to crank the size of refcount_t, etc. > > Is it possible for such overflows to occur in the cred code? If so, > that's a bug. Can we fix that cred bug without all this overhead? Dunno, probably depends on how much RAM you have and how the system is configured? Like, it should get pretty easy to hit if you have around 44 TB of RAM, since I think the kernel will let you create around 2^32 instances of "struct file" at that point, and each file holds a reference to the creator's "struct cred". If RLIMIT_NOFILE and /proc/sys/kernel/pid_max are high enough, you could probably store 2^32 files in file descriptor table entries, spread out over a few ten thousand processes but all pointing to the same struct cred, and trigger an overflow of a cred refcount that way. But I haven't tried that and there might be some other limit that prevents this somewhere. If you have less RAM, you'd have to try harder to find some data structure where the kernel doesn't impose such strict limits on allocation as for files. io_uring requests can carry references to creds, and I think you can probably make them block infinitely through dependencies; I don't know how many io_uring requests you could have in flight at a time. Eyeballing the io_uring code, it looks like this might work at somewhere around 1 TB of slab memory usage if there isn't some limit somewhere? My point is that it's really hard to figure out how many references you can have to an object that can have references from all over the kernel unless there is a hard cap on the amount of memory in which such references are stored or you're able to just refuse incrementing the refcount when it gets too high. And so in my opinion it makes sense to use a refcount type that is able to warn and (depending on configuration) continue execution safely (except for leaking a little bit of memory) even if it reaches its limit.
Kees Cook <keescook@chromium.org> writes: > On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote: >> [...] >> extra checks (supposedly) compile down to nothing. It should be possible >> to build alternate refcount_t handling functions that are just wrappers >> around atomic_t with no extra checks, for folks who want to really run >> "fast and loose". > > No -- there's no benefit for this. We already did all this work years > ago with the fast vs full break-down. All that got tossed out since it > didn't matter. We did all the performance benchmarking and there was no > meaningful difference -- refcount _is_ atomic with an added check that > is branch-predicted away. Peter Zijlstra and Will Deacon spent a lot of > time making it run smoothly. :) Since you did all of the work should the text size of be growing by a kilobyte for this change? Is that expected? That is a valid concern with this change and it really should be justified in the change long as someone brought it up. Eric
From: Kees Cook > Sent: Friday, August 18, 2023 9:25 PM > > On Fri, Aug 18, 2023 at 04:10:49PM -0400, Jeff Layton wrote: > > [...] > > extra checks (supposedly) compile down to nothing. It should be possible > > to build alternate refcount_t handling functions that are just wrappers > > around atomic_t with no extra checks, for folks who want to really run > > "fast and loose". > > No -- there's no benefit for this. We already did all this work years > ago with the fast vs full break-down. All that got tossed out since it > didn't matter. We did all the performance benchmarking and there was no > meaningful difference -- refcount _is_ atomic with an added check that > is branch-predicted away. Hmmm IIRC recent Intel x86 cpu never do static branch prediction. So you can't avoid mis-predicted branches in cold code. David > Peter Zijlstra and Will Deacon spent a lot of > time making it run smoothly. :) > > -Kees > > -- > Kees Cook - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/include/linux/cred.h b/include/linux/cred.h index 8661f6294ad4..bf1c142afcec 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -109,7 +109,7 @@ static inline int groups_search(const struct group_info *group_info, kgid_t grp) * same context as task->real_cred. */ struct cred { - atomic_t usage; + refcount_t usage; #ifdef CONFIG_DEBUG_CREDENTIALS atomic_t subscribers; /* number of processes subscribed */ void *put_addr; @@ -229,7 +229,7 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred) */ static inline struct cred *get_new_cred(struct cred *cred) { - atomic_inc(&cred->usage); + refcount_inc(&cred->usage); return cred; } @@ -261,7 +261,7 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred) struct cred *nonconst_cred = (struct cred *) cred; if (!cred) return NULL; - if (!atomic_inc_not_zero(&nonconst_cred->usage)) + if (!refcount_inc_not_zero(&nonconst_cred->usage)) return NULL; validate_creds(cred); nonconst_cred->non_rcu = 0; @@ -285,7 +285,7 @@ static inline void put_cred(const struct cred *_cred) if (cred) { validate_creds(cred); - if (atomic_dec_and_test(&(cred)->usage)) + if (refcount_dec_and_test(&(cred)->usage)) __put_cred(cred); } } diff --git a/kernel/cred.c b/kernel/cred.c index bed458cfb812..33090c43bcac 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -39,7 +39,7 @@ static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) }; * The initial credentials for the initial task */ struct cred init_cred = { - .usage = ATOMIC_INIT(4), + .usage = REFCOUNT_INIT(4), #ifdef CONFIG_DEBUG_CREDENTIALS .subscribers = ATOMIC_INIT(2), .magic = CRED_MAGIC, @@ -99,17 +99,17 @@ static void put_cred_rcu(struct rcu_head *rcu) #ifdef CONFIG_DEBUG_CREDENTIALS if (cred->magic != CRED_MAGIC_DEAD || - atomic_read(&cred->usage) != 0 || + refcount_read(&cred->usage) != 0 || read_cred_subscribers(cred) != 0) panic("CRED: put_cred_rcu() sees %p with" " mag %x, put %p, usage %d, subscr %d\n", cred, cred->magic, cred->put_addr, - atomic_read(&cred->usage), + refcount_read(&cred->usage), read_cred_subscribers(cred)); #else - if (atomic_read(&cred->usage) != 0) + if (refcount_read(&cred->usage) != 0) panic("CRED: put_cred_rcu() sees %p with usage %d\n", - cred, atomic_read(&cred->usage)); + cred, refcount_read(&cred->usage)); #endif security_cred_free(cred); @@ -135,10 +135,10 @@ static void put_cred_rcu(struct rcu_head *rcu) void __put_cred(struct cred *cred) { kdebug("__put_cred(%p{%d,%d})", cred, - atomic_read(&cred->usage), + refcount_read(&cred->usage), read_cred_subscribers(cred)); - BUG_ON(atomic_read(&cred->usage) != 0); + BUG_ON(refcount_read(&cred->usage) != 0); #ifdef CONFIG_DEBUG_CREDENTIALS BUG_ON(read_cred_subscribers(cred) != 0); cred->magic = CRED_MAGIC_DEAD; @@ -162,7 +162,7 @@ void exit_creds(struct task_struct *tsk) struct cred *cred; kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred, - atomic_read(&tsk->cred->usage), + refcount_read(&tsk->cred->usage), read_cred_subscribers(tsk->cred)); cred = (struct cred *) tsk->real_cred; @@ -221,7 +221,7 @@ struct cred *cred_alloc_blank(void) if (!new) return NULL; - atomic_set(&new->usage, 1); + refcount_set(&new->usage, 1); #ifdef CONFIG_DEBUG_CREDENTIALS new->magic = CRED_MAGIC; #endif @@ -267,7 +267,7 @@ struct cred *prepare_creds(void) memcpy(new, old, sizeof(struct cred)); new->non_rcu = 0; - atomic_set(&new->usage, 1); + refcount_set(&new->usage, 1); set_cred_subscribers(new, 0); get_group_info(new->group_info); get_uid(new->user); @@ -356,7 +356,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) get_cred(p->cred); alter_cred_subscribers(p->cred, 2); kdebug("share_creds(%p{%d,%d})", - p->cred, atomic_read(&p->cred->usage), + p->cred, refcount_read(&p->cred->usage), read_cred_subscribers(p->cred)); inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); return 0; @@ -450,7 +450,7 @@ int commit_creds(struct cred *new) const struct cred *old = task->real_cred; kdebug("commit_creds(%p{%d,%d})", new, - atomic_read(&new->usage), + refcount_read(&new->usage), read_cred_subscribers(new)); BUG_ON(task->cred != old); @@ -459,7 +459,7 @@ int commit_creds(struct cred *new) validate_creds(old); validate_creds(new); #endif - BUG_ON(atomic_read(&new->usage) < 1); + BUG_ON(refcount_read(&new->usage) < 1); get_cred(new); /* we will require a ref for the subj creds too */ @@ -533,13 +533,13 @@ EXPORT_SYMBOL(commit_creds); void abort_creds(struct cred *new) { kdebug("abort_creds(%p{%d,%d})", new, - atomic_read(&new->usage), + refcount_read(&new->usage), read_cred_subscribers(new)); #ifdef CONFIG_DEBUG_CREDENTIALS BUG_ON(read_cred_subscribers(new) != 0); #endif - BUG_ON(atomic_read(&new->usage) < 1); + BUG_ON(refcount_read(&new->usage) < 1); put_cred(new); } EXPORT_SYMBOL(abort_creds); @@ -556,7 +556,7 @@ const struct cred *override_creds(const struct cred *new) const struct cred *old = current->cred; kdebug("override_creds(%p{%d,%d})", new, - atomic_read(&new->usage), + refcount_read(&new->usage), read_cred_subscribers(new)); validate_creds(old); @@ -579,7 +579,7 @@ const struct cred *override_creds(const struct cred *new) alter_cred_subscribers(old, -1); kdebug("override_creds() = %p{%d,%d}", old, - atomic_read(&old->usage), + refcount_read(&old->usage), read_cred_subscribers(old)); return old; } @@ -597,7 +597,7 @@ void revert_creds(const struct cred *old) const struct cred *override = current->cred; kdebug("revert_creds(%p{%d,%d})", old, - atomic_read(&old->usage), + refcount_read(&old->usage), read_cred_subscribers(old)); validate_creds(old); @@ -728,7 +728,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon) *new = *old; new->non_rcu = 0; - atomic_set(&new->usage, 1); + refcount_set(&new->usage, 1); set_cred_subscribers(new, 0); get_uid(new->user); get_user_ns(new->user_ns); @@ -843,7 +843,7 @@ static void dump_invalid_creds(const struct cred *cred, const char *label, printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n", cred->magic, cred->put_addr); printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n", - atomic_read(&cred->usage), + refcount_read(&cred->usage), read_cred_subscribers(cred)); printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n", from_kuid_munged(&init_user_ns, cred->uid), @@ -917,7 +917,7 @@ void validate_creds_for_do_exit(struct task_struct *tsk) { kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})", tsk->real_cred, tsk->cred, - atomic_read(&tsk->cred->usage), + refcount_read(&tsk->cred->usage), read_cred_subscribers(tsk->cred)); __validate_process_creds(tsk, __FILE__, __LINE__); diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 2f16f9d17966..f9f406249e7d 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -39,7 +39,7 @@ static LIST_HEAD(cred_unused); static unsigned long number_cred_unused; static struct cred machine_cred = { - .usage = ATOMIC_INIT(1), + .usage = REFCOUNT_INIT(1), #ifdef CONFIG_DEBUG_CREDENTIALS .magic = CRED_MAGIC, #endif