diff mbox

Keyrings, user namespaces and the user_struct

Message ID 17576.1477412418@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

David Howells Oct. 25, 2016, 4:20 p.m. UTC
I have some questions about user namespacing, with regard to making keyrings
namespaced.  My current idea is to follow the following method:

 (1) A new key/keyring records the user_namespace active when it is created.

 (2) If a process's user_namespace doesn't match that recorded in a key then
     it gets ENOKEY if it tries to refer to it or access it and can't see it
     in /proc/keys.

 (3) A process's keyring subscriptions are cleared if CLONE_NEWUSER is passed
     to clone() or to unshare() so that it doesn't retain a keyring it can't
     access.

 (4) Each user_namespace has its own separate register of persistent keyrings
     and KEYCTL_GET_PERSISTENT can only get from the register of the currently
     caller's user_namespace.  This is already upstream

as this seems the simplest solution.  I don't want to add a new CLONE_xxx flag
as there isn't exactly a whole lot of room left.

Whilst I've got this partially working, there is a problem because the
user_struct contains pointers to the user's user-keyring and user-session
keyrings - and these would need replacing when entering a new user_namespace.

However, the active user_struct is *not* replaced by create_user_ns().  Should
it be?

I'm not sure whether there's a need to use the user_struct inherited from
before the unsharing - certainly setresuid(), for example, doesn't seem to
keep the values.  Would it be possible to create a new user_struct with the
same kuid_t as the old one, but in the context of the new user_struct in case
it gets mapped?

I've attached the patch showing my current changes for reference.

David
---
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jann Horn Oct. 25, 2016, 4:41 p.m. UTC | #1
On Tue, Oct 25, 2016 at 05:20:18PM +0100, David Howells wrote:
> I have some questions about user namespacing, with regard to making keyrings
> namespaced.  My current idea is to follow the following method:
> 
>  (1) A new key/keyring records the user_namespace active when it is created.
> 
>  (2) If a process's user_namespace doesn't match that recorded in a key then
>      it gets ENOKEY if it tries to refer to it or access it and can't see it
>      in /proc/keys.
> 
>  (3) A process's keyring subscriptions are cleared if CLONE_NEWUSER is passed
>      to clone() or to unshare() so that it doesn't retain a keyring it can't
>      access.
> 
>  (4) Each user_namespace has its own separate register of persistent keyrings
>      and KEYCTL_GET_PERSISTENT can only get from the register of the currently
>      caller's user_namespace.  This is already upstream
> 
> as this seems the simplest solution.  I don't want to add a new CLONE_xxx flag
> as there isn't exactly a whole lot of room left.

I think this sounds sensible.


> Whilst I've got this partially working, there is a problem because the
> user_struct contains pointers to the user's user-keyring and user-session
> keyrings - and these would need replacing when entering a new user_namespace.
> 
> However, the active user_struct is *not* replaced by create_user_ns().  Should
> it be?
> 
> I'm not sure whether there's a need to use the user_struct inherited from
> before the unsharing - certainly setresuid(), for example, doesn't seem to
> keep the values.  Would it be possible to create a new user_struct with the
> same kuid_t as the old one, but in the context of the new user_struct in case
> it gets mapped?

Most things in user_struct should not be per-namespace, at least not without the
consent of root in the init namespace or so. Otherwise, someone could e.g. create
a bunch of new namespaces and lock the maximum permitted amount of memory in each
one, effectively allowing an arbitrary amount of memory to be pinned.

However, I think that there are people who would benefit from being able to have
the user_struct be per-namespace because they actually want separate limits for
multiple containers, but have too many containers to be able to allocate separate
kuid ranges for all of them. (I think James Bottomley said that in his Security
Summit talk?) Therefore, I think it might make sense to have separate user_structs
for user namespaces, but let them have a "struct user_struct *parent_pointer" to
the user_struct belonging to the same kuid in the parent namespace and a
resource_tracking_parent pointer that normally points to the user with the same
kuid in the init namespace.
Then, you could use the keyring stuff in the user_struct belonging to the current
namespace, all the existing users of user_struct could follow the ->parent_pointer
for now, and if someone wants to have per-ns resource tracking, they can add it in
later relatively easily.

But maybe I'm just overthinking this?
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 4:49 p.m. UTC | #2
On Tue, 2016-10-25 at 17:20 +0100, David Howells wrote:
> I have some questions about user namespacing, with regard to making
> keyrings
> namespaced.  My current idea is to follow the following method:
> 
>  (1) A new key/keyring records the user_namespace active when it is
> created.
> 
>  (2) If a process's user_namespace doesn't match that recorded in a
> key then
>      it gets ENOKEY if it tries to refer to it or access it and can't
> see it
>      in /proc/keys.
> 
>  (3) A process's keyring subscriptions are cleared if CLONE_NEWUSER
> is passed
>      to clone() or to unshare() so that it doesn't retain a keyring
> it can't
>      access.
> 
>  (4) Each user_namespace has its own separate register of persistent
> keyrings
>      and KEYCTL_GET_PERSISTENT can only get from the register of the
> currently
>      caller's user_namespace.  This is already upstream
> 
> as this seems the simplest solution.  I don't want to add a new 
> CLONE_xxx flag as there isn't exactly a whole lot of room left.

OK, so the specific problem with this is that it fits perfectly the
container orchestration system use case but pretty much fails for any
non-orchestration use case.

For me it would fail in the architecture emulation use case: I actually
want an administerable container running as me, which means I use a
user_ns to escalate my privilege over the constrained system.  However,
doing this in your world would now divorce me from any keyring I had
access to, which isn't what I want.

Why not make it the job of the orchestration system to drop keyrings?
So by default CLONE_NEWUSER subscribes to the same keyrings the owning
user does?  That would mean it's configurable and you can add an extra
KEYCTL_ to give fine grained access, so subscriptions can be dropped
before the container is populated by its eventual end user, but the
default case is likely what someone using creation of a user_ns to
elevate their privilege wants.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 25, 2016, 4:53 p.m. UTC | #3
David Howells <dhowells@redhat.com> wrote:

>  (2) If a process's user_namespace doesn't match that recorded in a key then
>      it gets ENOKEY if it tries to refer to it or access it and can't see it
>      in /proc/keys.

There's another possibility here - since user_namespaces are hierarchical,
does it make sense to let a process see keys that are in an ancestral
namespace?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 4:56 p.m. UTC | #4
On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
> 
> >  (2) If a process's user_namespace doesn't match that recorded in a
> > key then
> >      it gets ENOKEY if it tries to refer to it or access it and
> > can't see it
> >      in /proc/keys.
> 
> There's another possibility here - since user_namespaces are 
> hierarchical, does it make sense to let a process see keys that are 
> in an ancestral namespace?

I think that should be the decision of the owner.  If you're creating a
userns to de-privilege the next user, likely you don't want this, but
if you're creating a userns to enhance it, then you do.

I think you want to behave exactly as the mount namespace does: on
initial clone, you get a fully cloned mount tree and then you customise
it by unmounting pieces.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Oct. 25, 2016, 5:06 p.m. UTC | #5
On Tue, Oct 25, 2016 at 09:56:45AM -0700, James Bottomley wrote:
> On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
> > David Howells <dhowells@redhat.com> wrote:
> > 
> > >  (2) If a process's user_namespace doesn't match that recorded in a
> > > key then
> > >      it gets ENOKEY if it tries to refer to it or access it and
> > > can't see it
> > >      in /proc/keys.
> > 
> > There's another possibility here - since user_namespaces are 
> > hierarchical, does it make sense to let a process see keys that are 
> > in an ancestral namespace?
> 
> I think that should be the decision of the owner.  If you're creating a
> userns to de-privilege the next user, likely you don't want this, but
> if you're creating a userns to enhance it, then you do.
> 
> I think you want to behave exactly as the mount namespace does: on
> initial clone, you get a fully cloned mount tree and then you customise
> it by unmounting pieces.

The issue with keys is that their interface is pretty broken when exposed
to user namespaces with different UID mappings, and this also causes
security issues where you can get access to other users' keyrings and
stuff like that. The proposed fix works around that brokenness by simply
never exposing keys to other namespaces.

Your suggestion would probably be more flexible, but it'll require some
more, moderately ugly kernel code where various keyname prefixes are
special-cased so that names with these prefixes are dynamically
translated into strings containing the KUID or so.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 25, 2016, 5:30 p.m. UTC | #6
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> > There's another possibility here - since user_namespaces are 
> > hierarchical, does it make sense to let a process see keys that are 
> > in an ancestral namespace?
> 
> I think that should be the decision of the owner.  If you're creating a
> userns to de-privilege the next user, likely you don't want this, but
> if you're creating a userns to enhance it, then you do.

Maybe the simplest is to put a 'stop here' flag on a user_namespace.  Then
when we look to see if a key is in the caller's namespace, we go up the tree
until we hit the flag.  If you don't find the key's ns within the caller's
permitted subtree, you don't get to see the key.

> I think you want to behave exactly as the mount namespace does: on
> initial clone, you get a fully cloned mount tree and then you customise
> it by unmounting pieces.

That adds quite a bit of complexity, given that:

 (1) we don't have an automatic updating view of a keyring as the cloned mount
     tree does;

 (2) there are quotas on the number of keys a user is permitted;

 (3) you may not have permission to 'dismount' part of the tree.

That said, it might be enough to just clone the *keyrings* and share the
non-keyring keys.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 6:05 p.m. UTC | #7
On Tue, 2016-10-25 at 19:06 +0200, Jann Horn wrote:
> On Tue, Oct 25, 2016 at 09:56:45AM -0700, James Bottomley wrote:
> > On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
> > > David Howells <dhowells@redhat.com> wrote:
> > > 
> > > >  (2) If a process's user_namespace doesn't match that recorded
> > > > in a
> > > > key then
> > > >      it gets ENOKEY if it tries to refer to it or access it and
> > > > can't see it
> > > >      in /proc/keys.
> > > 
> > > There's another possibility here - since user_namespaces are 
> > > hierarchical, does it make sense to let a process see keys that
> > > are 
> > > in an ancestral namespace?
> > 
> > I think that should be the decision of the owner.  If you're
> > creating a
> > userns to de-privilege the next user, likely you don't want this,
> > but
> > if you're creating a userns to enhance it, then you do.
> > 
> > I think you want to behave exactly as the mount namespace does: on
> > initial clone, you get a fully cloned mount tree and then you
> > customise
> > it by unmounting pieces.
> 
> The issue with keys is that their interface is pretty broken when 
> exposed to user namespaces with different UID mappings, and this also 
> causes security issues where you can get access to other users' 
> keyrings and stuff like that. The proposed fix works around that 
> brokenness by simply never exposing keys to other namespaces.

Define how ... we can fix the keyring interface if it's hugely
problematic.

If you're simply thinking that it's a uid range mapping leak, then the
way I think of this is the same way the shadow utils do: the owning
user owns the exterior range of ids.  This seems to be the assumption
the unprivileged container systems are working on as well, so it
doesn't matter that there's a theoretical uid leak to the exterior
mapped ids.

> Your suggestion would probably be more flexible, but it'll require 
> some more, moderately ugly kernel code where various keyname prefixes 
> are special-cased so that names with these prefixes are dynamically
> translated into strings containing the KUID or so.

Can we not simply do it like mnt?  have a refcounted pointer structure
(struct mnt in its case) that points to a keyring.  This would
represent the subscription.  Once you detach it, you wouldn't be able
to recover it again.  So a user namespace wouldn't really be part of
the keyring prefix namespace.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 6:13 p.m. UTC | #8
On Tue, 2016-10-25 at 18:30 +0100, David Howells wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > > There's another possibility here - since user_namespaces are 
> > > hierarchical, does it make sense to let a process see keys that
> > > are 
> > > in an ancestral namespace?
> > 
> > I think that should be the decision of the owner.  If you're
> > creating a
> > userns to de-privilege the next user, likely you don't want this,
> > but
> > if you're creating a userns to enhance it, then you do.
> 
> Maybe the simplest is to put a 'stop here' flag on a user_namespace. 
>  Then when we look to see if a key is in the caller's namespace, we 
> go up the tree until we hit the flag.  If you don't find the key's ns 
> within the caller's permitted subtree, you don't get to see the key.

That sounds a bit nasty.  Even for nested user namespaces, what we
traditionally see is the interior uid and the exterior kuid; we don't
usually see the intermediary mappings unless you specifically traverse
the userns tree.

> > I think you want to behave exactly as the mount namespace does: on
> > initial clone, you get a fully cloned mount tree and then you 
> > customise it by unmounting pieces.
> 
> That adds quite a bit of complexity, given that:
> 
>  (1) we don't have an automatic updating view of a keyring as the 
> cloned mount tree does;

You're thinking of propagation?  The keyrings aren't currently
hierarchical in the same sense are they, so we really only need a top
level cloned, refcounted pointer.  Any change to the underlying keyring
could either COW the keyring or simply operate on the current one
(shared mount behaviour).  I'm betting the latter is far easier and if
you want the former, the orchestration system could simply copy all the
keys to a new keyring, release the old one and then rename the new one
to what the old one was, making it an orchestration problem to sort
out.

>  (2) there are quotas on the number of keys a user is permitted;

If this is counted per-id, then yes, mapping a range of ids with an
owning exterior non root user does allow you to escape the count ...
how important is it?  Could we not simply hand wave it away as yet
another consequence you have to think about when giving a user a range
of exterior ids (because if there are N, their key count could get up
to N * the limit).

>  (3) you may not have permission to 'dismount' part of the tree.

"root" in the userns should have.  I presume there's also some
capability we can give a non-root user.

> That said, it might be enough to just clone the *keyrings* and share
> the non-keyring keys.

Right ... do a refcounted clone structure and you can release the
keyring if the refcount goes to zero.

> David
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Oct. 25, 2016, 6:17 p.m. UTC | #9
On Tue, Oct 25, 2016 at 11:05:08AM -0700, James Bottomley wrote:
> On Tue, 2016-10-25 at 19:06 +0200, Jann Horn wrote:
> > On Tue, Oct 25, 2016 at 09:56:45AM -0700, James Bottomley wrote:
> > > On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
> > > > David Howells <dhowells@redhat.com> wrote:
> > > > 
> > > > >  (2) If a process's user_namespace doesn't match that recorded
> > > > > in a
> > > > > key then
> > > > >      it gets ENOKEY if it tries to refer to it or access it and
> > > > > can't see it
> > > > >      in /proc/keys.
> > > > 
> > > > There's another possibility here - since user_namespaces are 
> > > > hierarchical, does it make sense to let a process see keys that
> > > > are 
> > > > in an ancestral namespace?
> > > 
> > > I think that should be the decision of the owner.  If you're
> > > creating a
> > > userns to de-privilege the next user, likely you don't want this,
> > > but
> > > if you're creating a userns to enhance it, then you do.
> > > 
> > > I think you want to behave exactly as the mount namespace does: on
> > > initial clone, you get a fully cloned mount tree and then you
> > > customise
> > > it by unmounting pieces.
> > 
> > The issue with keys is that their interface is pretty broken when 
> > exposed to user namespaces with different UID mappings, and this also 
> > causes security issues where you can get access to other users' 
> > keyrings and stuff like that. The proposed fix works around that 
> > brokenness by simply never exposing keys to other namespaces.
> 
> Define how ... we can fix the keyring interface if it's hugely
> problematic.

See http://lkml.iu.edu/hypermail/linux/kernel/1606.2/06794.html .
Here's a copy of my explanation in that message:

| Not just questionable, completely wrong. The gist is that there is a
| *global* name -> key mapping for accessing keys by name, and user
| keyrings are stored in there under the name "_uid.%u", where %u
| refers to the *namespaced* UID. (See install_user_keyrings().)
| The result is that, if e.g. the user with UID 1000 has no running
| processes, a local attacker can enter a new user namespace, map UID
| 1000 in the namespace to some KUID he controls, do
| setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.
| This ends up permitting the attacker to dump the contents of KUID
| 1000's keys after KUID 1000 signs in. I discovered this while going
| through the kuid->uid conversions when I thought about writing this
| feature the first time.

> If you're simply thinking that it's a uid range mapping leak, then the
> way I think of this is the same way the shadow utils do: the owning
> user owns the exterior range of ids.  This seems to be the assumption
> the unprivileged container systems are working on as well, so it
> doesn't matter that there's a theoretical uid leak to the exterior
> mapped ids.

Nope, that's not the issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 6:21 p.m. UTC | #10
On Tue, 2016-10-25 at 20:17 +0200, Jann Horn wrote:
> On Tue, Oct 25, 2016 at 11:05:08AM -0700, James Bottomley wrote:
> > On Tue, 2016-10-25 at 19:06 +0200, Jann Horn wrote:
> > > On Tue, Oct 25, 2016 at 09:56:45AM -0700, James Bottomley wrote:
> > > > On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
> > > > > David Howells <dhowells@redhat.com> wrote:
> > > > > 
> > > > > >  (2) If a process's user_namespace doesn't match that
> > > > > > recorded
> > > > > > in a
> > > > > > key then
> > > > > >      it gets ENOKEY if it tries to refer to it or access it
> > > > > > and
> > > > > > can't see it
> > > > > >      in /proc/keys.
> > > > > 
> > > > > There's another possibility here - since user_namespaces are 
> > > > > hierarchical, does it make sense to let a process see keys
> > > > > that
> > > > > are 
> > > > > in an ancestral namespace?
> > > > 
> > > > I think that should be the decision of the owner.  If you're
> > > > creating a
> > > > userns to de-privilege the next user, likely you don't want
> > > > this,
> > > > but
> > > > if you're creating a userns to enhance it, then you do.
> > > > 
> > > > I think you want to behave exactly as the mount namespace does:
> > > > on
> > > > initial clone, you get a fully cloned mount tree and then you
> > > > customise
> > > > it by unmounting pieces.
> > > 
> > > The issue with keys is that their interface is pretty broken when
> > > exposed to user namespaces with different UID mappings, and this
> > > also 
> > > causes security issues where you can get access to other users' 
> > > keyrings and stuff like that. The proposed fix works around that 
> > > brokenness by simply never exposing keys to other namespaces.
> > 
> > Define how ... we can fix the keyring interface if it's hugely
> > problematic.
> 
> See http://lkml.iu.edu/hypermail/linux/kernel/1606.2/06794.html .
> Here's a copy of my explanation in that message:
> 
> > Not just questionable, completely wrong. The gist is that there is
> > a
> > *global* name -> key mapping for accessing keys by name, and user
> > keyrings are stored in there under the name "_uid.%u", where %u
> > refers to the *namespaced* UID. (See install_user_keyrings().)
> > The result is that, if e.g. the user with UID 1000 has no running
> > processes, a local attacker can enter a new user namespace, map UID
> > 1000 in the namespace to some KUID he controls, do
> > setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.
> > This ends up permitting the attacker to dump the contents of KUID
> > 1000's keys after KUID 1000 signs in. I discovered this while going
> > through the kuid->uid conversions when I thought about writing this
> > feature the first time.

Well, that's obviously wrong and we fix it before adding namespaced
views of keyrings.  Originally I would say it needs to be kuid, like
filesystems, but now Eric is on with an additional namespace mapping at
the filesystem boundary, I'll defer to him whether he thinks we should
have something like it at the key naming one as well.

This is starting to sound like a useful late arriving discussion topic
for the Plumbers Containers MC ...

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Oct. 25, 2016, 6:22 p.m. UTC | #11
On Tue, Oct 25, 2016 at 11:13:24AM -0700, James Bottomley wrote:
> On Tue, 2016-10-25 at 18:30 +0100, David Howells wrote:
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > > I think you want to behave exactly as the mount namespace does: on
> > > initial clone, you get a fully cloned mount tree and then you 
> > > customise it by unmounting pieces.
> > 
> > That adds quite a bit of complexity, given that:
[...]
> >  (2) there are quotas on the number of keys a user is permitted;
> 
> If this is counted per-id, then yes, mapping a range of ids with an
> owning exterior non root user does allow you to escape the count ...
> how important is it?  Could we not simply hand wave it away as yet
> another consequence you have to think about when giving a user a range
> of exterior ids (because if there are N, their key count could get up
> to N * the limit).

Wouldn't it be N * normal_limit * number_of_namespaces, where
number_of_namespaces is how many namespaces you can create (afaik
unbounded)?
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 6:25 p.m. UTC | #12
On Tue, 2016-10-25 at 20:22 +0200, Jann Horn wrote:
> On Tue, Oct 25, 2016 at 11:13:24AM -0700, James Bottomley wrote:
> > On Tue, 2016-10-25 at 18:30 +0100, David Howells wrote:
> > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > > > I think you want to behave exactly as the mount namespace does:
> > > > on
> > > > initial clone, you get a fully cloned mount tree and then you 
> > > > customise it by unmounting pieces.
> > > 
> > > That adds quite a bit of complexity, given that:
> [...]
> > >  (2) there are quotas on the number of keys a user is permitted;
> > 
> > If this is counted per-id, then yes, mapping a range of ids with an
> > owning exterior non root user does allow you to escape the count 
> > ... how important is it?  Could we not simply hand wave it away as 
> > yet another consequence you have to think about when giving a user 
> > a range of exterior ids (because if there are N, their key count 
> > could get up to N * the limit).
> 
> Wouldn't it be N * normal_limit * number_of_namespaces, where
> number_of_namespaces is how many namespaces you can create (afaik
> unbounded)?

I need to understand why we have a limit first.  The above was assuming
it's something simple like an arbitrary limit to prevent resource
exhaustion.

but, assuming that again ... I was thinking enforce the limit by
exterior uid (kuid), so it becomes number of uids a user owns *
normal_limit.  It wouldn't vary based on the number of namespaces you
create.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Oct. 25, 2016, 7:34 p.m. UTC | #13
On Tue, Oct 25, 2016 at 11:17 AM, Jann Horn <jann@thejh.net> wrote:
> On Tue, Oct 25, 2016 at 11:05:08AM -0700, James Bottomley wrote:
>> On Tue, 2016-10-25 at 19:06 +0200, Jann Horn wrote:
>> > On Tue, Oct 25, 2016 at 09:56:45AM -0700, James Bottomley wrote:
>> > > On Tue, 2016-10-25 at 17:53 +0100, David Howells wrote:
>> > > > David Howells <dhowells@redhat.com> wrote:
>> > > >
>> > > > >  (2) If a process's user_namespace doesn't match that recorded
>> > > > > in a
>> > > > > key then
>> > > > >      it gets ENOKEY if it tries to refer to it or access it and
>> > > > > can't see it
>> > > > >      in /proc/keys.
>> > > >
>> > > > There's another possibility here - since user_namespaces are
>> > > > hierarchical, does it make sense to let a process see keys that
>> > > > are
>> > > > in an ancestral namespace?
>> > >
>> > > I think that should be the decision of the owner.  If you're
>> > > creating a
>> > > userns to de-privilege the next user, likely you don't want this,
>> > > but
>> > > if you're creating a userns to enhance it, then you do.
>> > >
>> > > I think you want to behave exactly as the mount namespace does: on
>> > > initial clone, you get a fully cloned mount tree and then you
>> > > customise
>> > > it by unmounting pieces.
>> >
>> > The issue with keys is that their interface is pretty broken when
>> > exposed to user namespaces with different UID mappings, and this also
>> > causes security issues where you can get access to other users'
>> > keyrings and stuff like that. The proposed fix works around that
>> > brokenness by simply never exposing keys to other namespaces.
>>
>> Define how ... we can fix the keyring interface if it's hugely
>> problematic.
>
> See http://lkml.iu.edu/hypermail/linux/kernel/1606.2/06794.html .
> Here's a copy of my explanation in that message:
>
> | Not just questionable, completely wrong. The gist is that there is a
> | *global* name -> key mapping for accessing keys by name, and user
> | keyrings are stored in there under the name "_uid.%u", where %u
> | refers to the *namespaced* UID. (See install_user_keyrings().)
> | The result is that, if e.g. the user with UID 1000 has no running
> | processes, a local attacker can enter a new user namespace, map UID
> | 1000 in the namespace to some KUID he controls, do
> | setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.
> | This ends up permitting the attacker to dump the contents of KUID
> | 1000's keys after KUID 1000 signs in. I discovered this while going
> | through the kuid->uid conversions when I thought about writing this
> | feature the first time.

I found this issue, too, several years ago. ;-/

Can we use the namespacing of keys as an opportunity to fix things
like this?  Perhaps we could simply *remove* the concept of named keys
and keyrings.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 25, 2016, 8:41 p.m. UTC | #14
Andy Lutomirski <luto@amacapital.net> wrote:

> ... Perhaps we could simply *remove* the concept of named keys and keyrings.

See Linus's dictum about breaking userspace.

The problem isn't named keys: keys have to be named - the description is how
they're looked up typically.  Further, non-keyring keys can't be looked up
directly by name - you have to search for them in a keyring.

The issue here is named keyrings and keyctl_join_session_keyring().  It might
well have been a bad idea - though I've seen some people arguing for a single
session keyring shared across all a user's logins, in which case, we might
want this after all (or use the user-default session).

One thing we perhaps do want to do, though, is restrict the names of keyrings
to the user_namespace in which the keyring was created.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 25, 2016, 8:51 p.m. UTC | #15
On Tue, 2016-10-25 at 21:41 +0100, David Howells wrote:
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> > ... Perhaps we could simply *remove* the concept of named keys and
> > keyrings.
> 
> See Linus's dictum about breaking userspace.

We wouldn't have to ... we'd simply declare the current interface to be
a legacy one, which means it doesn't work with containers and
namespaces, give it a new compile time option so the distros could
remove it if they had no actual users, and then present a v2 which does
what we need and works with namespaces.

> The problem isn't named keys: keys have to be named - the description 
> is how they're looked up typically.  Further, non-keyring keys can't 
> be looked up directly by name - you have to search for them in a
> keyring.
> 
> The issue here is named keyrings and keyctl_join_session_keyring(). 
>  It might well have been a bad idea - though I've seen some people 
> arguing for a single session keyring shared across all a user's 
> logins, in which case, we might want this after all (or use the user
> -default session).
> 
> One thing we perhaps do want to do, though, is restrict the names of 
> keyrings to the user_namespace in which the keyring was created.

Sounds fine to me since most things created within a namespace either
get destroyed or placed in the parent when it is destroyed.

Perhaps we want to agree on what semantics we want first before we
start inventing the new API.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Oct. 26, 2016, 4:38 a.m. UTC | #16
David Howells <dhowells@redhat.com> writes:

> I have some questions about user namespacing, with regard to making keyrings
> namespaced.  My current idea is to follow the following method:

I am not certain your stated goal is a proper description of what you
are trying to achieve.

>  (1) A new key/keyring records the user_namespace active when it is created.
>
>  (2) If a process's user_namespace doesn't match that recorded in a key then
>      it gets ENOKEY if it tries to refer to it or access it and can't see it
>      in /proc/keys.
>
>  (3) A process's keyring subscriptions are cleared if CLONE_NEWUSER is passed
>      to clone() or to unshare() so that it doesn't retain a keyring it can't
>      access.
>
>  (4) Each user_namespace has its own separate register of persistent keyrings
>      and KEYCTL_GET_PERSISTENT can only get from the register of the currently
>      caller's user_namespace.  This is already upstream
>
> as this seems the simplest solution.  I don't want to add a new CLONE_xxx flag
> as there isn't exactly a whole lot of room left.
>
> Whilst I've got this partially working, there is a problem because the
> user_struct contains pointers to the user's user-keyring and user-session
> keyrings - and these would need replacing when entering a new
> user_namespace.

The session keyring should change when someone changes their
session.  It is a deliberate choice for a session to span namespaces.

Similarly upon entering a user namespace the user does not immediately
change.

So I expect the user mechanism by which the session and user keyrings
are changed should suffice when entering a user namespace.

Also keep in mind that users are fundamentally global.  A given
kuid always maps to the same user_struct.

> However, the active user_struct is *not* replaced by create_user_ns().  Should
> it be?

No.  The user has not changed, so changing the user_struct does not make
sense.

> I'm not sure whether there's a need to use the user_struct inherited from
> before the unsharing - certainly setresuid(), for example, doesn't seem to
> keep the values.  Would it be possible to create a new user_struct with the
> same kuid_t as the old one, but in the context of the new user_struct in case
> it gets mapped?

That would not make much sense.

Are you being asked for different user keys for the same user in
different user namespaces?

Or do you possibly have the impression that users in different user
namespaces are disjoint? (Users in different user namespaces are not
disjoint, they are just the users outside of the user namespace that
they are mapped to).

I have seen a mistaken notion that it is safe to map users in
different containers to the same user outside of the container
and there won't be problems.  Are people with that notion asking
you to fix the code to conform to their mistaken notion of reality?

Now if it does make sense to change the user key to be fundamentally per
user per user namespace we can do that, but we need to be very clear
about what we are trying to accomplish and why.

> I've attached the patch showing my current changes for reference.

Thank you.

The current design if for only the keys owned by users in the current
user namespace to be accessible.  Something that key_task_permission
appears to enforce quite well.

There is a legitimate criticism that the CRIU folks can bring that we
have a global space of key_serial_t/key inode numbers in the kernel.  It
might make sense to move key_serial_tree into struct user_namespace
to fix that.  In general I don't think it matters because the number
is a random number that is essentially unpredicatable.

Elsewhere it has been mentioned there is an implementation bug or
two that needs to be corrected, but that is separate from the bulk
of the implementation that I am not seeing any issues jump out at me
when I look at it.

Eric

p.s.  Apologies for the delay it has taken a bit for my state on
the key subsystem to page back in.

p.p.s. In case I was not clear.  I think your patch below is very much
the wrong approach.

> David
> ---
> diff --git a/include/linux/key.h b/include/linux/key.h
> index 722914798f37..8d785279f7b4 100644
> --- a/include/linux/key.h
> +++ b/include/linux/key.h
> @@ -142,6 +142,7 @@ struct key {
>  		struct rb_node	serial_node;
>  	};
>  	struct rw_semaphore	sem;		/* change vs change sem */
> +	struct user_namespace	*ns;		/* Owning namespace */
>  	struct key_user		*user;		/* owner of this key */
>  	void			*security;	/* security data for this key */
>  	union {
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 68f594212759..43f5c47de3a3 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -42,6 +42,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
>  	cred->cap_ambient = CAP_EMPTY_SET;
>  	cred->cap_bset = CAP_FULL_SET;
>  #ifdef CONFIG_KEYS
> +	key_put(cred->session_keyring);
> +	cred->session_keyring = NULL;
> +	key_put(cred->process_keyring);
> +	cred->process_keyring = NULL;
> +	key_put(cred->thread_keyring);
> +	cred->thread_keyring = NULL;
>  	key_put(cred->request_key_auth);
>  	cred->request_key_auth = NULL;
>  #endif
> diff --git a/security/keys/gc.c b/security/keys/gc.c
> index addf060399e0..0f0c589bf49d 100644
> --- a/security/keys/gc.c
> +++ b/security/keys/gc.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/security.h>
> +#include <linux/user_namespace.h>
>  #include <keys/keyring-type.h>
>  #include "internal.h"
>  
> @@ -155,6 +156,7 @@ static noinline void key_gc_unused_keys(struct list_head *keys)
>  			atomic_dec(&key->user->nikeys);
>  
>  		key_user_put(key->user);
> +		put_user_ns(key->ns);
>  
>  		kfree(key->description);
>  
> diff --git a/security/keys/key.c b/security/keys/key.c
> index 346fbf201c22..09df168907fd 100644
> --- a/security/keys/key.c
> +++ b/security/keys/key.c
> @@ -15,6 +15,7 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/security.h>
> +#include <linux/user_namespace.h>
>  #include <linux/workqueue.h>
>  #include <linux/random.h>
>  #include <linux/err.h>
> @@ -289,6 +290,7 @@ struct key *key_alloc(struct key_type *type, const char *desc,
>  	init_rwsem(&key->sem);
>  	lockdep_set_class(&key->sem, &type->lock_class);
>  	key->index_key.type = type;
> +	key->ns = get_user_ns(current_user_ns());
>  	key->user = user;
>  	key->quotalen = quotalen;
>  	key->datalen = type->def_datalen;
> diff --git a/security/keys/keyring.c b/security/keys/keyring.c
> index c91e4e0cea08..70a399bd572c 100644
> --- a/security/keys/keyring.c
> +++ b/security/keys/keyring.c
> @@ -1016,6 +1016,9 @@ struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
>  				    &keyring_name_hash[bucket],
>  				    name_link
>  				    ) {
> +			if (keyring->ns != current_user_ns())
> +				continue;
> +
>  			if (!kuid_has_mapping(current_user_ns(), keyring->user->uid))
>  				continue;
>  
> diff --git a/security/keys/permission.c b/security/keys/permission.c
> index 732cc0beffdf..3c1bd572d9da 100644
> --- a/security/keys/permission.c
> +++ b/security/keys/permission.c
> @@ -24,8 +24,9 @@
>   *
>   * The caller must hold either a ref on cred or must hold the RCU readlock.
>   *
> - * Returns 0 if successful, -EACCES if access is denied based on the
> - * permissions bits or the LSM check.
> + * Returns 0 if successful, -ENOKEY if the key is outside of the caller's user
> + * namespace, -EACCES if access is denied based on the permissions bits or the
> + * LSM check.
>   */
>  int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
>  			unsigned perm)
> @@ -36,6 +37,9 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
>  
>  	key = key_ref_to_ptr(key_ref);
>  
> +	if (key->ns != current_user_ns())
> +		return -ENOKEY;
> +
>  	/* use the second 8-bits of permissions for keys the caller owns */
>  	if (uid_eq(key->uid, cred->fsuid)) {
>  		kperm = key->perm >> 16;
> diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
> index 40a885239782..52866e90d51a 100644
> --- a/security/keys/process_keys.c
> +++ b/security/keys/process_keys.c
> @@ -691,6 +694,11 @@ try_again:
>  		break;
>  	}
>  
> +	/* We don't see keys that are outside the caller's user namespace */
> +	ret = -ENOKEY;
> +	if (key->ns != current_user_ns())
> +		goto invalid_key;
> +
>  	/* unlink does not use the nominated key in any way, so can skip all
>  	 * the permission checks as it is only concerned with the keyring */
>  	if (lflags & KEY_LOOKUP_FOR_UNLINK) {
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Oct. 26, 2016, 4:45 a.m. UTC | #17
David Howells <dhowells@redhat.com> writes:

> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> > There's another possibility here - since user_namespaces are 
>> > hierarchical, does it make sense to let a process see keys that are 
>> > in an ancestral namespace?
>> 
>> I think that should be the decision of the owner.  If you're creating a
>> userns to de-privilege the next user, likely you don't want this, but
>> if you're creating a userns to enhance it, then you do.
>
> Maybe the simplest is to put a 'stop here' flag on a user_namespace.  Then
> when we look to see if a key is in the caller's namespace, we go up the tree
> until we hit the flag.  If you don't find the key's ns within the caller's
> permitted subtree, you don't get to see the key.

Let me just say we already have all of that (in a much nicer format) by
limiting the set of keys we can access to the set of users visible in
the user namespace.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
José Bollo Oct. 26, 2016, 7:18 a.m. UTC | #18
Le mardi 25 octobre 2016 à 17:53 +0100, David Howells a écrit :
> David Howells <dhowells@redhat.com> wrote:
> 
> > 
> >  (2) If a process's user_namespace doesn't match that recorded in a
> > key then
> >      it gets ENOKEY if it tries to refer to it or access it and
> > can't see it
> >      in /proc/keys.
> 
> There's another possibility here - since user_namespaces are
> hierarchical,
> does it make sense to let a process see keys that are in an ancestral
> namespace?
>
> David
> --

Hello all,

I used this approach for PTags. Let me explain how it works for PTags
in few words.

 - each ptags is in fact a couple (user-namespace, ptag-name, flags)
 - when looking to tags in a given namespace:
   o either it is found with the given current name-space
   o or the ancestor-hierarchy of name-spaces is looked to get the
     containing ancestor
 - in both case when a tag is found, the flag indicates if it is alive
   or was removed in the namespace

It works well (from what I sew 8)

Technical details:
 - assumption is done that ancestors wait their children to die
 - shadow-ghost pointers are used to weakly handle user namespaces

Best regards

José Bollo

> To unsubscribe from this list: send the line "unsubscribe linux-
> security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 26, 2016, 7:37 a.m. UTC | #19
Eric W. Biederman <ebiederm@xmission.com> wrote:

> David Howells <dhowells@redhat.com> writes:
> 
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >
> >> > There's another possibility here - since user_namespaces are 
> >> > hierarchical, does it make sense to let a process see keys that are 
> >> > in an ancestral namespace?
> >> 
> >> I think that should be the decision of the owner.  If you're creating a
> >> userns to de-privilege the next user, likely you don't want this, but
> >> if you're creating a userns to enhance it, then you do.
> >
> > Maybe the simplest is to put a 'stop here' flag on a user_namespace.  Then
> > when we look to see if a key is in the caller's namespace, we go up the tree
> > until we hit the flag.  If you don't find the key's ns within the caller's
> > permitted subtree, you don't get to see the key.
> 
> Let me just say we already have all of that (in a much nicer format) by
> limiting the set of keys we can access to the set of users visible in
> the user namespace.

Except that we don't.  Users outside of a namespace can see any key that
grants permissions to 'other' - though this is something we don't do by
default.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 26, 2016, 11:43 a.m. UTC | #20
I think it might help if I lay out the current state of play:

Keys and keyrings can be accessed through a number of mechanisms:

 (1) Direct key lookup by serial number.

     The serial number set is global, so any key can potentially be looked up
     directly with the condition:

     (A) The key must grant the appropriate permission to the caller for the
     	 operation being proposed.

 (2) Direct keyring lookup by description (name).

     The keyring name set is global, so potentially any keyring could be set
     to the caller's session keyring.  However, there are two noteworthy
     conditions:

     (A) The keyring's uid must have a mapping in the current user namespace.

     (B) The keyring must grant SEARCH permission to the caller.

 (3) Process-subscribed keyrings.

     A process has a number of keyring subscriptions (the session, process and
     thread keyrings).  These are accessed by special negative serial numbers,
     macroised as KEY_SPEC_{THREAD,PROCESS,SESSION}_KEYRING.

     If a key can be reached via one of these keyrings then it is classed as
     being 'possessed' and the possession permissions grant is added to the
     user/group/other grant, with the following conditions:

     (A) The keyring being accessed must grant SEARCH to look inside it or
     	 some other appropriate permission to deal with the keyring itself.

 (4) User-subscribed keyrings.

     A user ID also has a couple of subscriptions (user and user-session
     keyrings).  These are accessed by KEY_SPEC_{USER,USER_SESSION}_KEYRING.

 (5) Group-subscribed keyrings.

     Linus wanted this for completeness, but it was never implemented.
     KEY_SPEC_GROUP_KEYRING is defined for this.

 (6) Search in directly-looked up keyring.

     A keyring can be looked up directly as per (1) and then searched inside
     for a key by description or some other identifier.  The conditions are:

     (A) The keyring must grant SEARCH permission, as must any subordinate
     	 keyrings that need to be searched.

     (B) The target key must grant SEARCH permission.

     Note that each key considered will *only* be considered possessed if the
     starting keyring is determined to be possessed.  Whilst this will give
     some false negatives, it massively reduces computation time.

 (7) Search in session-, process- and thread-keyrings.

     As (4), but the starting keyring is definitely possessed.

     Further, the user-session keyring is used in place of the session keyring
     if the caller doesn't have a session keyring, and the keys in there are
     counted as possessed.

 (8) Search in the user-keyring and user-session keyring.

     As (4).  These two keyrings are treated no differently from any other
     keyring.  pam_keyinit, however, places a link to the user-keyring in any
     session-keyring it creates, thereby making it possessed.

     [I wish I hadn't made user- and user-session keyrings as they're kind of
      difficult to get right.  It would've been better not to automatically
      fall back to the user-session keyring, I think.]

 (9) Lookup in the persistent keyring register.

     The persistent register is per-user_namespace.  You can ask for the
     persistent keyring for any UID, but:

     (A) That UID must be within your user_namespace

     (B) The UID must match your process UID or EUID - or you must have
     	 CAP_SETUID.

     Further, no matter which one you ask for:

     (C) The persistent keyring must grant you LINK permission.

     Note that persistent keyrings only grant VIEW and READ to user and grant
     everything bar SETATTR to possessed.

(10) /proc/keys.

     This shows all keys to which you have at least one permit, with the
     condition:

     (A) The keyring's uid must have a mapping in the current user namespace.

(11) Request-key upcall specials.

     If a process was created by request_key() as an upcall, then three
     additional keys become available:

       (i) A new session keyring concocted by request_key() holding the
     	   authorisation key that allows the protection mechanisms to be
     	   bypassed by keyctl_instantiate() and friends for the key being
     	   constructed.

      (ii) The currently active authorisation key.  If one is set by
     	   keyctl_assume_authority(), then this permits the upcall to search
     	   the subscribed keyrings of the original calling process as if they
     	   were its own.  This would allow it, say, to retrieve the caller's
     	   Kerberos TGT and to request a ticket with which to instantiate a
     	   key.

     (iii) The keyring passed to request_key_and_link() as the destination.
     	   This has no special permissions.  It's treated like any other
     	   non-possessed keyring unless the upcall program links it to such as
     	   its session keyring or it is accessible through the authorisation
     	   key.  Note that it's only available if an authorisation key is in
     	   use.


So:

 (*) Key serial numbers are global, though randomly generated and hopefully
     unpredictable.

 (*) Normal keys and keyrings are created by default with zero OTHER and GROUP
     permits, minimal USER permits and full POSSESSOR permits.

     Special keys may get a different balance of USER and POSSESSOR permits,
     but OTHER and GROUP are always initially zero.

 (*) You cannot access an ordinary key outside of your namespace unless:

     (A) it belongs to the founding user, you are that user within the
     	 namespace and it grants USER permits, or

     (B) it grants OTHER permits, or

     (C) it's attached to a subscribed keyring and it grants POSSESSOR
     	 permits, including SEARCH, or

     (D) you're trying to unlink it from a keyring you have WRITE access to
     	 (and then all you can do is unlink it).

 (*) You cannot find a session keyring that's owned by a UID that's outside of
     your namespace in order to join it.  Trying to do so gives a new session
     keyring of the same name.

 (*) Session-, process- and thread-keyrings are carried across clone() (as
     appropriate) or unshare(), even if CLONE_NEWUSER is set.

 (*) You cannot see in /proc/keys any keys owned by the founding user of a new
     namespace, though you can create and manipulate keys accessible through
     the carried over keyrings or accessible by ID.

     This is may need fixing - you should be able to see any key that you can
     access.

 (*) The installation of user and user-session keyrings in a user_struct
     waives the SEARCH permit that's a requirement for when
     keyctl_join_session_keyring() does a by-name search.


Thoughts:

 (*) request_key() upcalls are not namespaced.  It's not clear how this should
     be done, since the upcall might depend on network namespace or filesystem
     namespace, for example.

 (*) Possibly user and user-session keyrings could be kept in the persistent
     keyring rather than the user_struct, though there's a problem with the
     persistent keyring potentially timing out.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 26, 2016, 2:34 p.m. UTC | #21
Andy Lutomirski <luto@amacapital.net> wrote:

> > | Not just questionable, completely wrong. The gist is that there is a
> > | *global* name -> key mapping for accessing keys by name, and user
> > | keyrings are stored in there under the name "_uid.%u", where %u
> > | refers to the *namespaced* UID. (See install_user_keyrings().)
> > | The result is that, if e.g. the user with UID 1000 has no running
> > | processes, a local attacker can enter a new user namespace, map UID
> > | 1000 in the namespace to some KUID he controls, do
> > | setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.

Hmmm...  Having checked over the code, it might not be that simple.  Thanks to
something Eric Biederman added into find_keyring_by_name(), unless a keyring's
UID is a member of the current user namespace, you can't find that keyring by
name.

However, I'm not sure that that stops someone trying to find it by colliding
kuids since keys can't pin the user_struct without causing a loop.

Further, if the user_struct referring to a user (or user-session) keyring gets
deallocated, it drops its link to that keyring - though that's no guarantee
that the user keyrings will be deallocated themselves (there may be links from
elsewhere).

I'm wondering if I should move the user keyrings out of the user_struct so
that keys can pin the user_struct (and I could then merge the key_user struct
into it).

The user-keyring and user-session keyring could be kept in the user's
persistent keyring or in their own tree in the user namespace.  The former
would allow their easy reuse within a certain amount of time and the latter
would allow them to be easily revoked when the user_namespace struct is
destroyed.

I think Eric's patch to move the keyring list into the user_namespace struct
rather than having it be global is also a good step.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Oct. 26, 2016, 2:38 p.m. UTC | #22
On Wed, Oct 26, 2016 at 03:34:50PM +0100, David Howells wrote:
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> > > | Not just questionable, completely wrong. The gist is that there is a
> > > | *global* name -> key mapping for accessing keys by name, and user
> > > | keyrings are stored in there under the name "_uid.%u", where %u
> > > | refers to the *namespaced* UID. (See install_user_keyrings().)
> > > | The result is that, if e.g. the user with UID 1000 has no running
> > > | processes, a local attacker can enter a new user namespace, map UID
> > > | 1000 in the namespace to some KUID he controls, do
> > > | setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.
> 
> Hmmm...  Having checked over the code, it might not be that simple.  Thanks to
> something Eric Biederman added into find_keyring_by_name(), unless a keyring's
> UID is a member of the current user namespace, you can't find that keyring by
> name.

find_keyring_by_name() checks that the UID of the keyring's owner is mapped into
the current user namespace. But that doesn't catch the scenario I described:
The keyring is created in an attacker-created namespace and looked up from the
init namespace, into which all kuids are mapped.
David Howells Oct. 26, 2016, 2:48 p.m. UTC | #23
Jann Horn <jann@thejh.net> wrote:

> find_keyring_by_name() checks that the UID of the keyring's owner is mapped into
> the current user namespace. But that doesn't catch the scenario I described:
> The keyring is created in an attacker-created namespace and looked up from the
> init namespace, into which all kuids are mapped.

Ah - gotcha.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Oct. 26, 2016, 6:10 p.m. UTC | #24
David Howells <dhowells@redhat.com> writes:

> Jann Horn <jann@thejh.net> wrote:
>
>> find_keyring_by_name() checks that the UID of the keyring's owner is mapped into
>> the current user namespace. But that doesn't catch the scenario I described:
>> The keyring is created in an attacker-created namespace and looked up from the
>> init namespace, into which all kuids are mapped.
>
> Ah - gotcha.

Unless I am misreading something it actually gets worse.  You don't even
need a user namespace.  You can just call keyctl_join_session_keyring
and the named keyring of your choice will be created.

Plus there are various really weird things in their where the keyring
names of _tid, _pid, _ses, get reused over and over again.

So it looks like there are some significant things to fix.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 26, 2016, 6:35 p.m. UTC | #25
Eric W. Biederman <ebiederm@xmission.com> wrote:

> > Jann Horn <jann@thejh.net> wrote:
> >
> >> find_keyring_by_name() checks that the UID of the keyring's owner is
> >> mapped into the current user namespace. But that doesn't catch the
> >> scenario I described: The keyring is created in an attacker-created
> >> namespace and looked up from the init namespace, into which all kuids are
> >> mapped.
> >
> > Ah - gotcha.
> 
> Unless I am misreading something it actually gets worse.  You don't even
> need a user namespace.  You can just call keyctl_join_session_keyring
> and the named keyring of your choice will be created.

It's meant to do that - assuming a keyring of that name doesn't yet exist or
isn't accessible by you.

I should perhaps prohibit the creation of a _uid.N keyring.  However, I think
that conceptually, it should be possible to join the user-session keyring as
your session - or I should make it so that the user-session keyring doesn't
exist (but I'm pretty certain that will break userspace).

> Plus there are various really weird things in their where the keyring
> names of _tid, _pid, _ses, get reused over and over again.

True, however per-thread (_tid) and per-process(_pid) keyrings are always
allocated by key_alloc() and never looked up by name when being created.

Anonymous session (_ses) keyrings are also created by key_alloc() and not
looked up when created.  It's only when a named session keyring is requested
that a look up by name is done.

I could make the per-thread, per-process and anon-session keyrings nameless by
default, or prefix them with '.' and not permit joining of a keyring whose
name begins with a '.' (you aren't allowed to use add_key() to create a such
keyrings, so that really ought to be extended to here too).

> So it looks like there are some significant things to fix.

So:

 (1) Make the keyring name list per user_namespace.

 (2) Don't let _uid.N, _tid, _pid and _ses (anonymous session) keyrings be
     joined.

 (3) Don't let keyrings whose name begins '.' be joined.

 (4) Don't keep the user and user-session keyrings in the user_struct, but
     rather make them subordinates of the user namespace.

for starters.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Oct. 27, 2016, 4:11 p.m. UTC | #26
David Howells <dhowells@redhat.com> wrote:

> > Plus there are various really weird things in their where the keyring
> > names of _tid, _pid, _ses, get reused over and over again.
> 
> True, however per-thread (_tid) and per-process(_pid) keyrings are always
> allocated by key_alloc() and never looked up by name when being created.
> 
> Anonymous session (_ses) keyrings are also created by key_alloc() and not
> looked up when created.  It's only when a named session keyring is requested
> that a look up by name is done.
> 
> I could make the per-thread, per-process and anon-session keyrings nameless by
> default, or prefix them with '.' and not permit joining of a keyring whose
> name begins with a '.' (you aren't allowed to use add_key() to create a such
> keyrings, so that really ought to be extended to here too).

Note that the per-thread, per-process and anon-session keyrings are not
joinable by default as they don't come with SEARCH permission for u/g/o.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/key.h b/include/linux/key.h
index 722914798f37..8d785279f7b4 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -142,6 +142,7 @@  struct key {
 		struct rb_node	serial_node;
 	};
 	struct rw_semaphore	sem;		/* change vs change sem */
+	struct user_namespace	*ns;		/* Owning namespace */
 	struct key_user		*user;		/* owner of this key */
 	void			*security;	/* security data for this key */
 	union {
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 68f594212759..43f5c47de3a3 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -42,6 +42,12 @@  static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
 	cred->cap_ambient = CAP_EMPTY_SET;
 	cred->cap_bset = CAP_FULL_SET;
 #ifdef CONFIG_KEYS
+	key_put(cred->session_keyring);
+	cred->session_keyring = NULL;
+	key_put(cred->process_keyring);
+	cred->process_keyring = NULL;
+	key_put(cred->thread_keyring);
+	cred->thread_keyring = NULL;
 	key_put(cred->request_key_auth);
 	cred->request_key_auth = NULL;
 #endif
diff --git a/security/keys/gc.c b/security/keys/gc.c
index addf060399e0..0f0c589bf49d 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -12,6 +12,7 @@ 
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/security.h>
+#include <linux/user_namespace.h>
 #include <keys/keyring-type.h>
 #include "internal.h"
 
@@ -155,6 +156,7 @@  static noinline void key_gc_unused_keys(struct list_head *keys)
 			atomic_dec(&key->user->nikeys);
 
 		key_user_put(key->user);
+		put_user_ns(key->ns);
 
 		kfree(key->description);
 
diff --git a/security/keys/key.c b/security/keys/key.c
index 346fbf201c22..09df168907fd 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -15,6 +15,7 @@ 
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/security.h>
+#include <linux/user_namespace.h>
 #include <linux/workqueue.h>
 #include <linux/random.h>
 #include <linux/err.h>
@@ -289,6 +290,7 @@  struct key *key_alloc(struct key_type *type, const char *desc,
 	init_rwsem(&key->sem);
 	lockdep_set_class(&key->sem, &type->lock_class);
 	key->index_key.type = type;
+	key->ns = get_user_ns(current_user_ns());
 	key->user = user;
 	key->quotalen = quotalen;
 	key->datalen = type->def_datalen;
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index c91e4e0cea08..70a399bd572c 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -1016,6 +1016,9 @@  struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
 				    &keyring_name_hash[bucket],
 				    name_link
 				    ) {
+			if (keyring->ns != current_user_ns())
+				continue;
+
 			if (!kuid_has_mapping(current_user_ns(), keyring->user->uid))
 				continue;
 
diff --git a/security/keys/permission.c b/security/keys/permission.c
index 732cc0beffdf..3c1bd572d9da 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -24,8 +24,9 @@ 
  *
  * The caller must hold either a ref on cred or must hold the RCU readlock.
  *
- * Returns 0 if successful, -EACCES if access is denied based on the
- * permissions bits or the LSM check.
+ * Returns 0 if successful, -ENOKEY if the key is outside of the caller's user
+ * namespace, -EACCES if access is denied based on the permissions bits or the
+ * LSM check.
  */
 int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 			unsigned perm)
@@ -36,6 +37,9 @@  int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 
 	key = key_ref_to_ptr(key_ref);
 
+	if (key->ns != current_user_ns())
+		return -ENOKEY;
+
 	/* use the second 8-bits of permissions for keys the caller owns */
 	if (uid_eq(key->uid, cred->fsuid)) {
 		kperm = key->perm >> 16;
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 40a885239782..52866e90d51a 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -691,6 +694,11 @@  try_again:
 		break;
 	}
 
+	/* We don't see keys that are outside the caller's user namespace */
+	ret = -ENOKEY;
+	if (key->ns != current_user_ns())
+		goto invalid_key;
+
 	/* unlink does not use the nominated key in any way, so can skip all
 	 * the permission checks as it is only concerned with the keyring */
 	if (lflags & KEY_LOOKUP_FOR_UNLINK) {