Message ID | 20240805-remove-cred-transfer-v2-1-a2aa1d45e6b8@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Paul Moore |
Headers | show |
Series | get rid of cred_transfer | expand |
On Mon Aug 5, 2024 at 2:54 PM EEST, Jann Horn wrote: > keyctl_session_to_parent() involves posting task work to the parent task, > with work function key_change_session_keyring. > Because the task work in the parent runs asynchronously, no errors can be > returned back to the caller of keyctl_session_to_parent(), and therefore > the work function key_change_session_keyring() can't be allowed to fail due > to things like memory allocation failure or permission checks - all > allocations and checks have to happen in the child. > > This is annoying for two reasons: > > - It is the only reason why cred_alloc_blank() and > security_transfer_creds() are necessary. > - It means we can't do synchronous permission checks. I agree with this premise. Also I think the code change is reasonable. I'd like to see a comment from David tho. BR, Jarkko
Jann Horn <jannh@google.com> wrote: > Rewrite keyctl_session_to_parent() to run task work on the parent > synchronously, so that any errors that happen in the task work can be > plumbed back into the syscall return value in the child. The main thing I worry about is if there's a way to deadlock the child and the parent against each other. vfork() for example. > + if (task_work_cancel(parent, &ctx.work)) { > + /* > + * We got interrupted and the task work was canceled before it > + * could execute. > + * Use -ERESTARTNOINTR instead of -ERESTARTSYS for > + * compatibility - the manpage does not list -EINTR as a > + * possible error for keyctl(). > + */ I think returning EINTR is fine, provided that if we return EINTR, the change didn't happen. KEYCTL_SESSION_TO_PARENT is only used by the aklog, dlog and klog* OpenAFS programs AFAIK, and only if "-setpag" is set as a command line option. It also won't be effective if you strace the program. Maybe the AFS people can say whether it's even worth keeping the functionality rather than just dropping KEYCTL_SESSION_TO_PARENT? David
On Thu, Aug 15, 2024 at 9:46 PM David Howells <dhowells@redhat.com> wrote: > Jann Horn <jannh@google.com> wrote: > > > Rewrite keyctl_session_to_parent() to run task work on the parent > > synchronously, so that any errors that happen in the task work can be > > plumbed back into the syscall return value in the child. > > The main thing I worry about is if there's a way to deadlock the child and the > parent against each other. vfork() for example. Yes - I think it would work fine for scenarios like using KEYCTL_SESSION_TO_PARENT from a helper binary against the shell that launched the helper (which I think is the intended usecase?), but there could theoretically be constellations where it would cause an (interruptible) hang if the parent is stuck in uninterruptible/killable sleep. I think vfork() is rather special in that it does a killable wait for the child to exit or execute; and based on my understanding of the intended usecase of KEYCTL_SESSION_TO_PARENT, I think normally KEYCTL_SESSION_TO_PARENT would only be used by a child that has gone through execve? > > + if (task_work_cancel(parent, &ctx.work)) { > > + /* > > + * We got interrupted and the task work was canceled before it > > + * could execute. > > + * Use -ERESTARTNOINTR instead of -ERESTARTSYS for > > + * compatibility - the manpage does not list -EINTR as a > > + * possible error for keyctl(). > > + */ > > I think returning EINTR is fine, provided that if we return EINTR, the change > didn't happen. KEYCTL_SESSION_TO_PARENT is only used by the aklog, dlog and > klog* OpenAFS programs AFAIK, and only if "-setpag" is set as a command line > option. It also won't be effective if you strace the program. Ah, I didn't even know about those. The users I knew of are the command-line tools "keyctl new_session" and "e4crypt new_session" (see https://codesearch.debian.net/search?q=KEYCTL_SESSION_TO_PARENT&literal=1, which indexes code that's part of Debian). > Maybe the AFS people can say whether it's even worth keeping the functionality > rather than just dropping KEYCTL_SESSION_TO_PARENT? I think this would break the tools "keyctl new_session" and "e4crypt new_session" - though I don't know if anyone actually uses those invocations.
On Thu Aug 15, 2024 at 10:59 PM EEST, Jann Horn wrote: > On Thu, Aug 15, 2024 at 9:46 PM David Howells <dhowells@redhat.com> wrote: > > Jann Horn <jannh@google.com> wrote: > > > > > Rewrite keyctl_session_to_parent() to run task work on the parent > > > synchronously, so that any errors that happen in the task work can be > > > plumbed back into the syscall return value in the child. > > > > The main thing I worry about is if there's a way to deadlock the child and the > > parent against each other. vfork() for example. > > Yes - I think it would work fine for scenarios like using > KEYCTL_SESSION_TO_PARENT from a helper binary against the shell that > launched the helper (which I think is the intended usecase?), but > there could theoretically be constellations where it would cause an > (interruptible) hang if the parent is stuck in > uninterruptible/killable sleep. > > I think vfork() is rather special in that it does a killable wait for > the child to exit or execute; and based on my understanding of the > intended usecase of KEYCTL_SESSION_TO_PARENT, I think normally > KEYCTL_SESSION_TO_PARENT would only be used by a child that has gone > through execve? Could this encapsulated to a kselftest? Like having host process that forks the payload and send SIGINT. That could be deployed e.g to tools/testing/selftests/keys. Would be nice to be able to try this out with a low barrier. Doing this type of testing is different axis than keyutils test suite IMHO. That would be also great starting point for adding concurrency tests in future. Could be done either in C or Python. BR, Jarkko
On Thu, Aug 15, 2024 at 4:00 PM Jann Horn <jannh@google.com> wrote: > On Thu, Aug 15, 2024 at 9:46 PM David Howells <dhowells@redhat.com> wrote: > > Jann Horn <jannh@google.com> wrote: > > > > > Rewrite keyctl_session_to_parent() to run task work on the parent > > > synchronously, so that any errors that happen in the task work can be > > > plumbed back into the syscall return value in the child. > > > > The main thing I worry about is if there's a way to deadlock the child and the > > parent against each other. vfork() for example. > > Yes - I think it would work fine for scenarios like using > KEYCTL_SESSION_TO_PARENT from a helper binary against the shell that > launched the helper (which I think is the intended usecase?), but > there could theoretically be constellations where it would cause an > (interruptible) hang if the parent is stuck in > uninterruptible/killable sleep. > > I think vfork() is rather special in that it does a killable wait for > the child to exit or execute; and based on my understanding of the > intended usecase of KEYCTL_SESSION_TO_PARENT, I think normally > KEYCTL_SESSION_TO_PARENT would only be used by a child that has gone > through execve? Where did we land on all of this? Unless I missed a thread somewhere, it looks like the discussion trailed off without any resolution on if we are okay with a potentially (interruptible) deadlock?
On Aug 5, 2024 Jann Horn <jannh@google.com> wrote: > > keyctl_session_to_parent() involves posting task work to the parent task, > with work function key_change_session_keyring. > Because the task work in the parent runs asynchronously, no errors can be > returned back to the caller of keyctl_session_to_parent(), and therefore > the work function key_change_session_keyring() can't be allowed to fail due > to things like memory allocation failure or permission checks - all > allocations and checks have to happen in the child. > > This is annoying for two reasons: > > - It is the only reason why cred_alloc_blank() and > security_transfer_creds() are necessary. > - It means we can't do synchronous permission checks. > > Rewrite keyctl_session_to_parent() to run task work on the parent > synchronously, so that any errors that happen in the task work can be > plumbed back into the syscall return value in the child. > This allows us to get rid of cred_alloc_blank() and > security_transfer_creds() in a later commit, and it will make it possible > to write more reliable security checks for this operation. > > Note that this requires using TWA_SIGNAL instead of TWA_RESUME, so the > parent might observe some spurious -EAGAIN syscall returns or such; but the > parent likely anyway has to be ready to deal with the side effects of > receiving signals (since it'll probably get SIGCHLD when the child dies), > so that probably isn't an issue. > > Signed-off-by: Jann Horn <jannh@google.com> > --- > security/keys/internal.h | 8 ++++ > security/keys/keyctl.c | 107 +++++++++++++------------------------------ > security/keys/process_keys.c | 86 ++++++++++++++++++---------------- > 3 files changed, 87 insertions(+), 114 deletions(-) ... > diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c > index ab927a142f51..e4cfe5c4594a 100644 > --- a/security/keys/keyctl.c > +++ b/security/keys/keyctl.c > @@ -1616,104 +1616,63 @@ long keyctl_get_security(key_serial_t keyid, > * parent process. > * > * The keyring must exist and must grant the caller LINK permission, and the > * parent process must be single-threaded and must have the same effective > * ownership as this process and mustn't be SUID/SGID. > * > - * The keyring will be emplaced on the parent when it next resumes userspace. > + * The keyring will be emplaced on the parent via a pseudo-signal. > * > * If successful, 0 will be returned. > */ > long keyctl_session_to_parent(void) > { > - struct task_struct *me, *parent; > - const struct cred *mycred, *pcred; > - struct callback_head *newwork, *oldwork; > + struct keyctl_session_to_parent_context ctx; > + struct task_struct *parent; > key_ref_t keyring_r; > - struct cred *cred; > int ret; > > keyring_r = lookup_user_key(KEY_SPEC_SESSION_KEYRING, 0, KEY_NEED_LINK); > if (IS_ERR(keyring_r)) > return PTR_ERR(keyring_r); > > - ret = -ENOMEM; > - > - /* our parent is going to need a new cred struct, a new tgcred struct > - * and new security data, so we allocate them here to prevent ENOMEM in > - * our parent */ > - cred = cred_alloc_blank(); > - if (!cred) > - goto error_keyring; > - newwork = &cred->rcu; > + write_lock_irq(&tasklist_lock); > + parent = get_task_struct(rcu_dereference_protected(current->real_parent, > + lockdep_is_held(&tasklist_lock))); > + write_unlock_irq(&tasklist_lock); > > - cred->session_keyring = key_ref_to_ptr(keyring_r); > - keyring_r = NULL; > - init_task_work(newwork, key_change_session_keyring); > + /* the parent mustn't be init and mustn't be a kernel thread */ > + if (is_global_init(parent) || (READ_ONCE(parent->flags) & PF_KTHREAD) != 0) > + goto put_task; I think we need to explicitly set @ret if we are failing here, yes? > - me = current; > - rcu_read_lock(); > - write_lock_irq(&tasklist_lock); > + ctx.new_session_keyring = key_ref_to_ptr(keyring_r); > + ctx.child_cred = current_cred(); > + init_completion(&ctx.done); > + init_task_work(&ctx.work, key_change_session_keyring); > + ret = task_work_add(parent, &ctx.work, TWA_SIGNAL); > + if (ret) > + goto put_task; > > - ret = -EPERM; > - oldwork = NULL; > - parent = rcu_dereference_protected(me->real_parent, > - lockdep_is_held(&tasklist_lock)); > + ret = wait_for_completion_interruptible(&ctx.done); > > - /* the parent mustn't be init and mustn't be a kernel thread */ > - if (parent->pid <= 1 || !parent->mm) > - goto unlock; > - > - /* the parent must be single threaded */ > - if (!thread_group_empty(parent)) > - goto unlock; > - > - /* the parent and the child must have different session keyrings or > - * there's no point */ > - mycred = current_cred(); > - pcred = __task_cred(parent); > - if (mycred == pcred || > - mycred->session_keyring == pcred->session_keyring) { > - ret = 0; > - goto unlock; > + if (task_work_cancel(parent, &ctx.work)) { > + /* > + * We got interrupted and the task work was canceled before it > + * could execute. > + * Use -ERESTARTNOINTR instead of -ERESTARTSYS for > + * compatibility - the manpage does not list -EINTR as a > + * possible error for keyctl(). > + */ > + ret = -ERESTARTNOINTR; > + } else { > + /* task work is running or has been executed */ > + wait_for_completion(&ctx.done); > + ret = ctx.result; > } > > - /* the parent must have the same effective ownership and mustn't be > - * SUID/SGID */ > - if (!uid_eq(pcred->uid, mycred->euid) || > - !uid_eq(pcred->euid, mycred->euid) || > - !uid_eq(pcred->suid, mycred->euid) || > - !gid_eq(pcred->gid, mycred->egid) || > - !gid_eq(pcred->egid, mycred->egid) || > - !gid_eq(pcred->sgid, mycred->egid)) > - goto unlock; > - > - /* the keyrings must have the same UID */ > - if ((pcred->session_keyring && > - !uid_eq(pcred->session_keyring->uid, mycred->euid)) || > - !uid_eq(mycred->session_keyring->uid, mycred->euid)) > - goto unlock; > - > - /* cancel an already pending keyring replacement */ > - oldwork = task_work_cancel_func(parent, key_change_session_keyring); > - > - /* the replacement session keyring is applied just prior to userspace > - * restarting */ > - ret = task_work_add(parent, newwork, TWA_RESUME); > - if (!ret) > - newwork = NULL; > -unlock: > - write_unlock_irq(&tasklist_lock); > - rcu_read_unlock(); > - if (oldwork) > - put_cred(container_of(oldwork, struct cred, rcu)); > - if (newwork) > - put_cred(cred); > - return ret; > - > -error_keyring: > +put_task: > + put_task_struct(parent); > key_ref_put(keyring_r); > return ret; > } -- paul-moore.com
On Tue, Sep 10, 2024 at 11:07 PM Paul Moore <paul@paul-moore.com> wrote: > On Aug 5, 2024 Jann Horn <jannh@google.com> wrote: > > - cred->session_keyring = key_ref_to_ptr(keyring_r); > > - keyring_r = NULL; > > - init_task_work(newwork, key_change_session_keyring); > > + /* the parent mustn't be init and mustn't be a kernel thread */ > > + if (is_global_init(parent) || (READ_ONCE(parent->flags) & PF_KTHREAD) != 0) > > + goto put_task; > > I think we need to explicitly set @ret if we are failing here, yes? Ah, yes. Thanks.
On Tue, Sep 10, 2024 at 4:49 PM Paul Moore <paul@paul-moore.com> wrote: > On Thu, Aug 15, 2024 at 4:00 PM Jann Horn <jannh@google.com> wrote: > > On Thu, Aug 15, 2024 at 9:46 PM David Howells <dhowells@redhat.com> wrote: > > > Jann Horn <jannh@google.com> wrote: > > > > > > > Rewrite keyctl_session_to_parent() to run task work on the parent > > > > synchronously, so that any errors that happen in the task work can be > > > > plumbed back into the syscall return value in the child. > > > > > > The main thing I worry about is if there's a way to deadlock the child and the > > > parent against each other. vfork() for example. > > > > Yes - I think it would work fine for scenarios like using > > KEYCTL_SESSION_TO_PARENT from a helper binary against the shell that > > launched the helper (which I think is the intended usecase?), but > > there could theoretically be constellations where it would cause an > > (interruptible) hang if the parent is stuck in > > uninterruptible/killable sleep. > > > > I think vfork() is rather special in that it does a killable wait for > > the child to exit or execute; and based on my understanding of the > > intended usecase of KEYCTL_SESSION_TO_PARENT, I think normally > > KEYCTL_SESSION_TO_PARENT would only be used by a child that has gone > > through execve? > > Where did we land on all of this? Unless I missed a thread somewhere, > it looks like the discussion trailed off without any resolution on if > we are okay with a potentially (interruptible) deadlock? As a potential tweak to this, what if we gave up on the idea of returning the error code so we could avoid the signal deadlock issue? I suppose there could be an issue if the parent was expecting/depending on keyring change from the child, but honestly, if the parent is relying on the kernel keyring and spawning a child process without restring the KEYCTL_SESSION_TO_PARENT then the parent really should be doing some sanity checks on the keyring after the child returns anyway. I'm conflicted on the best way to solve this problem, but I think we need to fix this somehow as I believe the current behavior is broken ...
On Mon, Sep 16, 2024 at 12:46 PM Paul Moore <paul@paul-moore.com> wrote: > On Tue, Sep 10, 2024 at 4:49 PM Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Aug 15, 2024 at 4:00 PM Jann Horn <jannh@google.com> wrote: > > > On Thu, Aug 15, 2024 at 9:46 PM David Howells <dhowells@redhat.com> wrote: > > > > Jann Horn <jannh@google.com> wrote: > > > > > > > > > Rewrite keyctl_session_to_parent() to run task work on the parent > > > > > synchronously, so that any errors that happen in the task work can be > > > > > plumbed back into the syscall return value in the child. > > > > > > > > The main thing I worry about is if there's a way to deadlock the child and the > > > > parent against each other. vfork() for example. > > > > > > Yes - I think it would work fine for scenarios like using > > > KEYCTL_SESSION_TO_PARENT from a helper binary against the shell that > > > launched the helper (which I think is the intended usecase?), but > > > there could theoretically be constellations where it would cause an > > > (interruptible) hang if the parent is stuck in > > > uninterruptible/killable sleep. > > > > > > I think vfork() is rather special in that it does a killable wait for > > > the child to exit or execute; and based on my understanding of the > > > intended usecase of KEYCTL_SESSION_TO_PARENT, I think normally > > > KEYCTL_SESSION_TO_PARENT would only be used by a child that has gone > > > through execve? > > > > Where did we land on all of this? Unless I missed a thread somewhere, > > it looks like the discussion trailed off without any resolution on if > > we are okay with a potentially (interruptible) deadlock? > > As a potential tweak to this, what if we gave up on the idea of > returning the error code so we could avoid the signal deadlock issue? I'm still not convinced that there is a real danger of deadlocking here if the only way to deadlock involves the parent being in an uninterruptible wait. There aren't many places in the kernel that involve a parent uninterruptibly waiting for the child without locks being involved, especially when the parent is a shell that spawns the child with execve, as seems to be the intended use here. I really dislike the idea of silently ignoring an error - I kinda feel like if we give up on returning an error to the child that issued the keyctl, the next-best option is to SIGKILL the parent, so that we can say "hey, we technically ensured that all future syscalls in the parent will use the new creds, because the parent will no longer do _any_ syscalls". > I suppose there could be an issue if the parent was > expecting/depending on keyring change from the child, but honestly, if > the parent is relying on the kernel keyring and spawning a child > process without restring the KEYCTL_SESSION_TO_PARENT then the parent > really should be doing some sanity checks on the keyring after the > child returns anyway. > I'm conflicted on the best way to solve this problem, but I think we > need to fix this somehow as I believe the current behavior is broken > ...
diff --git a/security/keys/internal.h b/security/keys/internal.h index 2cffa6dc8255..2c5eadc04cf2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -157,12 +157,20 @@ extern struct key *request_key_and_link(struct key_type *type, unsigned long flags); extern bool lookup_user_key_possessed(const struct key *key, const struct key_match_data *match_data); extern long join_session_keyring(const char *name); + +struct keyctl_session_to_parent_context { + struct callback_head work; + struct completion done; + struct key *new_session_keyring; + const struct cred *child_cred; + int result; +}; extern void key_change_session_keyring(struct callback_head *twork); extern struct work_struct key_gc_work; extern unsigned key_gc_delay; extern void keyring_gc(struct key *keyring, time64_t limit); extern void keyring_restriction_gc(struct key *keyring, diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index ab927a142f51..e4cfe5c4594a 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -1616,104 +1616,63 @@ long keyctl_get_security(key_serial_t keyid, * parent process. * * The keyring must exist and must grant the caller LINK permission, and the * parent process must be single-threaded and must have the same effective * ownership as this process and mustn't be SUID/SGID. * - * The keyring will be emplaced on the parent when it next resumes userspace. + * The keyring will be emplaced on the parent via a pseudo-signal. * * If successful, 0 will be returned. */ long keyctl_session_to_parent(void) { - struct task_struct *me, *parent; - const struct cred *mycred, *pcred; - struct callback_head *newwork, *oldwork; + struct keyctl_session_to_parent_context ctx; + struct task_struct *parent; key_ref_t keyring_r; - struct cred *cred; int ret; keyring_r = lookup_user_key(KEY_SPEC_SESSION_KEYRING, 0, KEY_NEED_LINK); if (IS_ERR(keyring_r)) return PTR_ERR(keyring_r); - ret = -ENOMEM; - - /* our parent is going to need a new cred struct, a new tgcred struct - * and new security data, so we allocate them here to prevent ENOMEM in - * our parent */ - cred = cred_alloc_blank(); - if (!cred) - goto error_keyring; - newwork = &cred->rcu; + write_lock_irq(&tasklist_lock); + parent = get_task_struct(rcu_dereference_protected(current->real_parent, + lockdep_is_held(&tasklist_lock))); + write_unlock_irq(&tasklist_lock); - cred->session_keyring = key_ref_to_ptr(keyring_r); - keyring_r = NULL; - init_task_work(newwork, key_change_session_keyring); + /* the parent mustn't be init and mustn't be a kernel thread */ + if (is_global_init(parent) || (READ_ONCE(parent->flags) & PF_KTHREAD) != 0) + goto put_task; - me = current; - rcu_read_lock(); - write_lock_irq(&tasklist_lock); + ctx.new_session_keyring = key_ref_to_ptr(keyring_r); + ctx.child_cred = current_cred(); + init_completion(&ctx.done); + init_task_work(&ctx.work, key_change_session_keyring); + ret = task_work_add(parent, &ctx.work, TWA_SIGNAL); + if (ret) + goto put_task; - ret = -EPERM; - oldwork = NULL; - parent = rcu_dereference_protected(me->real_parent, - lockdep_is_held(&tasklist_lock)); + ret = wait_for_completion_interruptible(&ctx.done); - /* the parent mustn't be init and mustn't be a kernel thread */ - if (parent->pid <= 1 || !parent->mm) - goto unlock; - - /* the parent must be single threaded */ - if (!thread_group_empty(parent)) - goto unlock; - - /* the parent and the child must have different session keyrings or - * there's no point */ - mycred = current_cred(); - pcred = __task_cred(parent); - if (mycred == pcred || - mycred->session_keyring == pcred->session_keyring) { - ret = 0; - goto unlock; + if (task_work_cancel(parent, &ctx.work)) { + /* + * We got interrupted and the task work was canceled before it + * could execute. + * Use -ERESTARTNOINTR instead of -ERESTARTSYS for + * compatibility - the manpage does not list -EINTR as a + * possible error for keyctl(). + */ + ret = -ERESTARTNOINTR; + } else { + /* task work is running or has been executed */ + wait_for_completion(&ctx.done); + ret = ctx.result; } - /* the parent must have the same effective ownership and mustn't be - * SUID/SGID */ - if (!uid_eq(pcred->uid, mycred->euid) || - !uid_eq(pcred->euid, mycred->euid) || - !uid_eq(pcred->suid, mycred->euid) || - !gid_eq(pcred->gid, mycred->egid) || - !gid_eq(pcred->egid, mycred->egid) || - !gid_eq(pcred->sgid, mycred->egid)) - goto unlock; - - /* the keyrings must have the same UID */ - if ((pcred->session_keyring && - !uid_eq(pcred->session_keyring->uid, mycred->euid)) || - !uid_eq(mycred->session_keyring->uid, mycred->euid)) - goto unlock; - - /* cancel an already pending keyring replacement */ - oldwork = task_work_cancel_func(parent, key_change_session_keyring); - - /* the replacement session keyring is applied just prior to userspace - * restarting */ - ret = task_work_add(parent, newwork, TWA_RESUME); - if (!ret) - newwork = NULL; -unlock: - write_unlock_irq(&tasklist_lock); - rcu_read_unlock(); - if (oldwork) - put_cred(container_of(oldwork, struct cred, rcu)); - if (newwork) - put_cred(cred); - return ret; - -error_keyring: +put_task: + put_task_struct(parent); key_ref_put(keyring_r); return ret; } /* * Apply a restriction to a given keyring. diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c index b5d5333ab330..199c5dd34792 100644 --- a/security/keys/process_keys.c +++ b/security/keys/process_keys.c @@ -902,59 +902,65 @@ long join_session_keyring(const char *name) error: abort_creds(new); return ret; } /* - * Replace a process's session keyring on behalf of one of its children when - * the target process is about to resume userspace execution. + * Replace a process's session keyring on behalf of one of its children. + * This function runs in task context, while the child is blocked in + * keyctl_session_to_parent(). */ -void key_change_session_keyring(struct callback_head *twork) +void key_change_session_keyring(struct callback_head *work) { - const struct cred *old = current_cred(); - struct cred *new = container_of(twork, struct cred, rcu); + struct keyctl_session_to_parent_context *ctx = + container_of(work, struct keyctl_session_to_parent_context, work); + const struct cred *pcred = current_cred(); + const struct cred *ccred = ctx->child_cred; + struct cred *new; - if (unlikely(current->flags & PF_EXITING)) { - put_cred(new); - return; - } + /* do checks */ + ctx->result = -EPERM; + if (unlikely(current->flags & PF_EXITING)) + goto out; - /* If get_ucounts fails more bits are needed in the refcount */ - if (unlikely(!get_ucounts(old->ucounts))) { - WARN_ONCE(1, "In %s get_ucounts failed\n", __func__); - put_cred(new); - return; - } + /* we must be single threaded */ + if (!thread_group_empty(current)) + goto out; + + /* + * the parent must have the same effective ownership and mustn't be + * SUID/SGID + */ + if (!uid_eq(pcred->uid, ccred->euid) || + !uid_eq(pcred->euid, ccred->euid) || + !uid_eq(pcred->suid, ccred->euid) || + !gid_eq(pcred->gid, ccred->egid) || + !gid_eq(pcred->egid, ccred->egid) || + !gid_eq(pcred->sgid, ccred->egid)) + goto out; + + /* the keyrings must have the same UID */ + if ((pcred->session_keyring && + !uid_eq(pcred->session_keyring->uid, ccred->euid)) || + !uid_eq(ctx->new_session_keyring->uid, ccred->euid)) + goto out; + + + /* okay, try to update creds */ + ctx->result = -ENOMEM; + new = prepare_creds(); + if (!new) + goto out; - new-> uid = old-> uid; - new-> euid = old-> euid; - new-> suid = old-> suid; - new->fsuid = old->fsuid; - new-> gid = old-> gid; - new-> egid = old-> egid; - new-> sgid = old-> sgid; - new->fsgid = old->fsgid; - new->user = get_uid(old->user); - new->ucounts = old->ucounts; - new->user_ns = get_user_ns(old->user_ns); - new->group_info = get_group_info(old->group_info); - - new->securebits = old->securebits; - new->cap_inheritable = old->cap_inheritable; - new->cap_permitted = old->cap_permitted; - new->cap_effective = old->cap_effective; - new->cap_ambient = old->cap_ambient; - new->cap_bset = old->cap_bset; - - new->jit_keyring = old->jit_keyring; - new->thread_keyring = key_get(old->thread_keyring); - new->process_keyring = key_get(old->process_keyring); - - security_transfer_creds(new, old); + key_put(new->session_keyring); + new->session_keyring = key_get(ctx->new_session_keyring); commit_creds(new); + ctx->result = 0; +out: + complete_all(&ctx->done); } /* * Make sure that root's user and user-session keyrings exist. */ static int __init init_root_keyring(void)
keyctl_session_to_parent() involves posting task work to the parent task, with work function key_change_session_keyring. Because the task work in the parent runs asynchronously, no errors can be returned back to the caller of keyctl_session_to_parent(), and therefore the work function key_change_session_keyring() can't be allowed to fail due to things like memory allocation failure or permission checks - all allocations and checks have to happen in the child. This is annoying for two reasons: - It is the only reason why cred_alloc_blank() and security_transfer_creds() are necessary. - It means we can't do synchronous permission checks. Rewrite keyctl_session_to_parent() to run task work on the parent synchronously, so that any errors that happen in the task work can be plumbed back into the syscall return value in the child. This allows us to get rid of cred_alloc_blank() and security_transfer_creds() in a later commit, and it will make it possible to write more reliable security checks for this operation. Note that this requires using TWA_SIGNAL instead of TWA_RESUME, so the parent might observe some spurious -EAGAIN syscall returns or such; but the parent likely anyway has to be ready to deal with the side effects of receiving signals (since it'll probably get SIGCHLD when the child dies), so that probably isn't an issue. Signed-off-by: Jann Horn <jannh@google.com> --- security/keys/internal.h | 8 ++++ security/keys/keyctl.c | 107 +++++++++++++------------------------------ security/keys/process_keys.c | 86 ++++++++++++++++++---------------- 3 files changed, 87 insertions(+), 114 deletions(-)