diff mbox series

[v14,2/6] namei: LOOKUP_IN_ROOT: chroot-like path resolution

Message ID 20191010054140.8483-3-cyphar@cyphar.com (mailing list archive)
State Superseded
Headers show
Series open: introduce openat2(2) syscall | expand

Commit Message

Aleksa Sarai Oct. 10, 2019, 5:41 a.m. UTC
/* Background. */
Container runtimes or other administrative management processes will
often interact with root filesystems while in the host mount namespace,
because the cost of doing a chroot(2) on every operation is too
prohibitive (especially in Go, which cannot safely use vfork). However,
a malicious program can trick the management process into doing
operations on files outside of the root filesystem through careful
crafting of symlinks.

Most programs that need this feature have attempted to make this process
safe, by doing all of the path resolution in userspace (with symlinks
being scoped to the root of the malicious root filesystem).
Unfortunately, this method is prone to foot-guns and usually such
implementations have subtle security bugs.

Thus, what userspace needs is a way to resolve a path as though it were
in a chroot(2) -- with all absolute symlinks being resolved relative to
the dirfd root (and ".." components being stuck under the dirfd root[1])
It is much simpler and more straight-forward to provide this
functionality in-kernel (because it can be done far more cheaply and
correctly).

More classical applications that also have this problem (which have
their own potentially buggy userspace path sanitisation code) include
web servers, archive extraction tools, network file servers, and so on.

[1]: At the moment, ".." and magic-link jumping are disallowed for the
     same reason it is disabled for LOOKUP_BENEATH -- currently it is
     not safe to allow it. Future patches may enable it unconditionally
     once we have resolved the possible races (for "..") and semantics
     (for magic-link jumping).

/* Userspace API. */
LOOKUP_IN_ROOT will be exposed to userspace through openat2(2).

There is a slight change in behaviour regarding pathnames -- if the
pathname is absolute then the dirfd is still used as the root of
resolution of LOOKUP_IN_ROOT is specified (this is to avoid obvious
foot-guns, at the cost of a minor API inconsistency).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 fs/namei.c            | 5 +++++
 include/linux/namei.h | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Linus Torvalds Oct. 10, 2019, 5:07 p.m. UTC | #1
On Wed, Oct 9, 2019 at 10:42 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2277,6 +2277,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>
>         nd->m_seq = read_seqbegin(&mount_lock);
>
> +       /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */
> +       if (flags & LOOKUP_IN_ROOT)
> +               while (*s == '/')
> +                       s++;
> +
>         /* Figure out the starting path and root (if needed). */
>         if (*s == '/') {
>                 error = nd_jump_root(nd);

Hmm. Wouldn't this make more sense all inside the if (*s =- '/') test?
That way if would be where we check for "should we start at the root",
which seems to make more sense conceptually.

That test for '/' currently has a "} else if (..)", but that's
pointless since it ends with a "return" anyway. So the "else" logic is
just noise.

And if you get rid of the unnecessary else, moving the LOOKUP_IN_ROOT
inside the if-statement works fine.

So this could be something like

    --- a/fs/namei.c
    +++ b/fs/namei.c
    @@ -2194,11 +2196,19 @@ static const char *path_init(struct
nameidata *nd, unsigned flags)

        nd->m_seq = read_seqbegin(&mount_lock);
        if (*s == '/') {
    -           set_root(nd);
    -           if (likely(!nd_jump_root(nd)))
    -                   return s;
    -           return ERR_PTR(-ECHILD);
    -   } else if (nd->dfd == AT_FDCWD) {
    +           /* LOOKUP_IN_ROOT treats absolute paths as being
relative-to-dirfd. */
    +           if (!(flags & LOOKUP_IN_ROOT)) {
    +                   set_root(nd);
    +                   if (likely(!nd_jump_root(nd)))
    +                           return s;
    +                   return ERR_PTR(-ECHILD);
    +           }
    +
    +           /* Skip initial '/' for LOOKUP_IN_ROOT */
    +           do { s++; } while (*s == '/');
    +   }
    +
    +   if (nd->dfd == AT_FDCWD) {
                if (flags & LOOKUP_RCU) {
                        struct fs_struct *fs = current->fs;
                        unsigned seq;

instead. The patch ends up slightly bigger (due to the re-indentation)
but now it handles all the "start at root" in the same place. Doesn't
that make sense?

             Linus
Aleksa Sarai Oct. 12, 2019, 4:08 a.m. UTC | #2
On 2019-10-10, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, Oct 9, 2019 at 10:42 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2277,6 +2277,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
> >
> >         nd->m_seq = read_seqbegin(&mount_lock);
> >
> > +       /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */
> > +       if (flags & LOOKUP_IN_ROOT)
> > +               while (*s == '/')
> > +                       s++;
> > +
> >         /* Figure out the starting path and root (if needed). */
> >         if (*s == '/') {
> >                 error = nd_jump_root(nd);
> 
> Hmm. Wouldn't this make more sense all inside the if (*s =- '/') test?
> That way if would be where we check for "should we start at the root",
> which seems to make more sense conceptually.

I don't really agree (though I do think that both options are pretty
ugly). Doing it before the block makes it clear that absolute paths are
just treated relative-to-dirfd -- doing it inside the block makes it
look more like "/" is a special-case for nd_jump_root(). And while that
is somewhat true, this is just a side-effect of making the code more
clean -- my earlier versions reworked the dirfd handling to always grab
nd->root first if LOOKUP_IS_SCOPED. I switched to this method based on
Al's review.

In fairness, I do agree that the lonely while loop looks ugly.

> That test for '/' currently has a "} else if (..)", but that's
> pointless since it ends with a "return" anyway. So the "else" logic is
> just noise.

This depends on the fact that LOOKUP_BENEATH always triggers -EXDEV for
nd_jump_root() -- if we ever add another "scoped lookup" flag then the
logic will have to be further reworked.

(It should be noted that the new version doesn't always end with a
"return", but you could change it to act that way given the above
assumption.)

> And if you get rid of the unnecessary else, moving the LOOKUP_IN_ROOT
> inside the if-statement works fine.
> 
> So this could be something like
> 
>     --- a/fs/namei.c
>     +++ b/fs/namei.c
>     @@ -2194,11 +2196,19 @@ static const char *path_init(struct
> nameidata *nd, unsigned flags)
> 
>         nd->m_seq = read_seqbegin(&mount_lock);
>         if (*s == '/') {
>     -           set_root(nd);
>     -           if (likely(!nd_jump_root(nd)))
>     -                   return s;
>     -           return ERR_PTR(-ECHILD);
>     -   } else if (nd->dfd == AT_FDCWD) {
>     +           /* LOOKUP_IN_ROOT treats absolute paths as being
> relative-to-dirfd. */
>     +           if (!(flags & LOOKUP_IN_ROOT)) {
>     +                   set_root(nd);
>     +                   if (likely(!nd_jump_root(nd)))
>     +                           return s;
>     +                   return ERR_PTR(-ECHILD);
>     +           }
>     +
>     +           /* Skip initial '/' for LOOKUP_IN_ROOT */
>     +           do { s++; } while (*s == '/');
>     +   }
>     +
>     +   if (nd->dfd == AT_FDCWD) {
>                 if (flags & LOOKUP_RCU) {
>                         struct fs_struct *fs = current->fs;
>                         unsigned seq;
> 
> instead. The patch ends up slightly bigger (due to the re-indentation)
> but now it handles all the "start at root" in the same place. Doesn't
> that make sense?

It is correct (though I'd need to clean it up a bit to handle
nd_jump_root() correctly), and if you really would like me to change it
I will -- but I just don't agree that it's cleaner.
Aleksa Sarai Oct. 12, 2019, 4:15 a.m. UTC | #3
On 2019-10-12, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2019-10-10, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Wed, Oct 9, 2019 at 10:42 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> > >
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -2277,6 +2277,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
> > >
> > >         nd->m_seq = read_seqbegin(&mount_lock);
> > >
> > > +       /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */
> > > +       if (flags & LOOKUP_IN_ROOT)
> > > +               while (*s == '/')
> > > +                       s++;
> > > +
> > >         /* Figure out the starting path and root (if needed). */
> > >         if (*s == '/') {
> > >                 error = nd_jump_root(nd);
> > 
> > Hmm. Wouldn't this make more sense all inside the if (*s =- '/') test?
> > That way if would be where we check for "should we start at the root",
> > which seems to make more sense conceptually.
> 
> I don't really agree (though I do think that both options are pretty
> ugly). Doing it before the block makes it clear that absolute paths are
> just treated relative-to-dirfd -- doing it inside the block makes it
> look more like "/" is a special-case for nd_jump_root(). And while that

Sorry, I meant "special-case for LOOKUP_IN_ROOT".

> is somewhat true, this is just a side-effect of making the code more
> clean -- my earlier versions reworked the dirfd handling to always grab
> nd->root first if LOOKUP_IS_SCOPED. I switched to this method based on
> Al's review.
> 
> In fairness, I do agree that the lonely while loop looks ugly.

And with the old way I did it (where we grabbed nd->root first) the
semantics were slightly more clear -- stripping leading "/"s doesn't
really look as "clearly obvious" as grabbing nd->root beforehand and
treating "/"s normally. But the code was also needlessly more complex.

> > That test for '/' currently has a "} else if (..)", but that's
> > pointless since it ends with a "return" anyway. So the "else" logic is
> > just noise.
> 
> This depends on the fact that LOOKUP_BENEATH always triggers -EXDEV for
> nd_jump_root() -- if we ever add another "scoped lookup" flag then the
> logic will have to be further reworked.
> 
> (It should be noted that the new version doesn't always end with a
> "return", but you could change it to act that way given the above
> assumption.)
> 
> > And if you get rid of the unnecessary else, moving the LOOKUP_IN_ROOT
> > inside the if-statement works fine.
> > 
> > So this could be something like
> > 
> >     --- a/fs/namei.c
> >     +++ b/fs/namei.c
> >     @@ -2194,11 +2196,19 @@ static const char *path_init(struct
> > nameidata *nd, unsigned flags)
> > 
> >         nd->m_seq = read_seqbegin(&mount_lock);
> >         if (*s == '/') {
> >     -           set_root(nd);
> >     -           if (likely(!nd_jump_root(nd)))
> >     -                   return s;
> >     -           return ERR_PTR(-ECHILD);
> >     -   } else if (nd->dfd == AT_FDCWD) {
> >     +           /* LOOKUP_IN_ROOT treats absolute paths as being
> > relative-to-dirfd. */
> >     +           if (!(flags & LOOKUP_IN_ROOT)) {
> >     +                   set_root(nd);
> >     +                   if (likely(!nd_jump_root(nd)))
> >     +                           return s;
> >     +                   return ERR_PTR(-ECHILD);
> >     +           }
> >     +
> >     +           /* Skip initial '/' for LOOKUP_IN_ROOT */
> >     +           do { s++; } while (*s == '/');
> >     +   }
> >     +
> >     +   if (nd->dfd == AT_FDCWD) {
> >                 if (flags & LOOKUP_RCU) {
> >                         struct fs_struct *fs = current->fs;
> >                         unsigned seq;
> > 
> > instead. The patch ends up slightly bigger (due to the re-indentation)
> > but now it handles all the "start at root" in the same place. Doesn't
> > that make sense?
> 
> It is correct (though I'd need to clean it up a bit to handle
> nd_jump_root() correctly), and if you really would like me to change it
> I will -- but I just don't agree that it's cleaner.
Aleksa Sarai Oct. 24, 2019, 7:06 a.m. UTC | #4
On 2019-10-12, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2019-10-12, Aleksa Sarai <cyphar@cyphar.com> wrote:
> > On 2019-10-10, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > On Wed, Oct 9, 2019 at 10:42 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> > > >
> > > > --- a/fs/namei.c
> > > > +++ b/fs/namei.c
> > > > @@ -2277,6 +2277,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
> > > >
> > > >         nd->m_seq = read_seqbegin(&mount_lock);
> > > >
> > > > +       /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */
> > > > +       if (flags & LOOKUP_IN_ROOT)
> > > > +               while (*s == '/')
> > > > +                       s++;
> > > > +
> > > >         /* Figure out the starting path and root (if needed). */
> > > >         if (*s == '/') {
> > > >                 error = nd_jump_root(nd);
> > > 
> > > Hmm. Wouldn't this make more sense all inside the if (*s =- '/') test?
> > > That way if would be where we check for "should we start at the root",
> > > which seems to make more sense conceptually.
> > 
> > I don't really agree (though I do think that both options are pretty
> > ugly). Doing it before the block makes it clear that absolute paths are
> > just treated relative-to-dirfd -- doing it inside the block makes it
> > look more like "/" is a special-case for nd_jump_root(). And while that
> 
> Sorry, I meant "special-case for LOOKUP_IN_ROOT".
> 
> > is somewhat true, this is just a side-effect of making the code more
> > clean -- my earlier versions reworked the dirfd handling to always grab
> > nd->root first if LOOKUP_IS_SCOPED. I switched to this method based on
> > Al's review.
> > 
> > In fairness, I do agree that the lonely while loop looks ugly.
> 
> And with the old way I did it (where we grabbed nd->root first) the
> semantics were slightly more clear -- stripping leading "/"s doesn't
> really look as "clearly obvious" as grabbing nd->root beforehand and
> treating "/"s normally. But the code was also needlessly more complex.
> 
> > > That test for '/' currently has a "} else if (..)", but that's
> > > pointless since it ends with a "return" anyway. So the "else" logic is
> > > just noise.
> > 
> > This depends on the fact that LOOKUP_BENEATH always triggers -EXDEV for
> > nd_jump_root() -- if we ever add another "scoped lookup" flag then the
> > logic will have to be further reworked.
> > 
> > (It should be noted that the new version doesn't always end with a
> > "return", but you could change it to act that way given the above
> > assumption.)
> > 
> > > And if you get rid of the unnecessary else, moving the LOOKUP_IN_ROOT
> > > inside the if-statement works fine.
> > > 
> > > So this could be something like
> > > 
> > >     --- a/fs/namei.c
> > >     +++ b/fs/namei.c
> > >     @@ -2194,11 +2196,19 @@ static const char *path_init(struct
> > > nameidata *nd, unsigned flags)
> > > 
> > >         nd->m_seq = read_seqbegin(&mount_lock);
> > >         if (*s == '/') {
> > >     -           set_root(nd);
> > >     -           if (likely(!nd_jump_root(nd)))
> > >     -                   return s;
> > >     -           return ERR_PTR(-ECHILD);
> > >     -   } else if (nd->dfd == AT_FDCWD) {
> > >     +           /* LOOKUP_IN_ROOT treats absolute paths as being
> > > relative-to-dirfd. */
> > >     +           if (!(flags & LOOKUP_IN_ROOT)) {
> > >     +                   set_root(nd);
> > >     +                   if (likely(!nd_jump_root(nd)))
> > >     +                           return s;
> > >     +                   return ERR_PTR(-ECHILD);
> > >     +           }
> > >     +
> > >     +           /* Skip initial '/' for LOOKUP_IN_ROOT */
> > >     +           do { s++; } while (*s == '/');
> > >     +   }
> > >     +
> > >     +   if (nd->dfd == AT_FDCWD) {
> > >                 if (flags & LOOKUP_RCU) {
> > >                         struct fs_struct *fs = current->fs;
> > >                         unsigned seq;
> > > 
> > > instead. The patch ends up slightly bigger (due to the re-indentation)
> > > but now it handles all the "start at root" in the same place. Doesn't
> > > that make sense?
> > 
> > It is correct (though I'd need to clean it up a bit to handle
> > nd_jump_root() correctly), and if you really would like me to change it
> > I will -- but I just don't agree that it's cleaner.

Linus, did you still want me to make your proposed change?
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 54fdbdfbeb94..9d00b138f54c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2277,6 +2277,11 @@  static const char *path_init(struct nameidata *nd, unsigned flags)
 
 	nd->m_seq = read_seqbegin(&mount_lock);
 
+	/* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */
+	if (flags & LOOKUP_IN_ROOT)
+		while (*s == '/')
+			s++;
+
 	/* Figure out the starting path and root (if needed). */
 	if (*s == '/') {
 		error = nd_jump_root(nd);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 35a1bf074ff1..c7a010570d05 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -47,8 +47,9 @@  enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_NO_MAGICLINKS	0x080000 /* No /proc/$pid/fd/ "symlink" crossing. */
 #define LOOKUP_NO_SYMLINKS	0x100000 /* No symlink crossing *at all*.
 					    Implies LOOKUP_NO_MAGICLINKS. */
+#define LOOKUP_IN_ROOT		0x200000 /* Treat dirfd as %current->fs->root. */
 /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
-#define LOOKUP_IS_SCOPED LOOKUP_BENEATH
+#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
 
 extern int path_pts(struct path *path);