diff mbox

[RFC,01/42] drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()

Message ID 20180711152555.GR30522@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro July 11, 2018, 3:25 p.m. UTC
On Tue, Jul 10, 2018 at 07:56:51PM -0700, Linus Torvalds wrote:
> Ok, you didn't seem to have a coverletter email, so I'm just replying
> to the first one.
> 
> Apart from the couple of totally trivial things I reacted to, this
> looks very clean and nice. And now I sat in front of the computer
> while reading it, so I could follow along better.
> 
> So apart from the small stylistic nits, all Acked-by from me.

FWIW, looking at the ->f_flags handling, I'm seriously tempted to do
alloc_file_pseudo(inode, mnt, name, f_flags, ops).

Reason: right now all but two callers of alloc_file_pseudo() are followed
by setting ->f_flags and for all those callers the mode argument passed
to alloc_file_pseudo() is equal to OPEN_FMODE(->f_flags value to be).

The exceptions are __shmem_file_setup() and hugetlb_file_setup().  Both
end up with FMODE_READ | FMODE_WRITE combined with 0 (i.e. O_RDONLY) in
f_flags.  Which smells like a bug in making, at the very least.

Unless somebody has a good reason why those shouldn't have O_RDWR,
I want to add (and fold back into individual "convert to alloc_file_pseudo"
commits) the following:



Objections?  Another odd beastie is do_shmat() - there we end up with
f_mode not matching f_flags, same way as in shmem and hugetlb.  If we
could rectify that one as well, we'd be able to switch alloc_file_clone()
to flags as well.  I would obviously prefer that kind of change to happen
before these helpers go into mainline...

Comments

Al Viro July 11, 2018, 4:15 p.m. UTC | #1
On Wed, Jul 11, 2018 at 04:25:55PM +0100, Al Viro wrote:

> FWIW, looking at the ->f_flags handling, I'm seriously tempted to do
> alloc_file_pseudo(inode, mnt, name, f_flags, ops).
> 
> Reason: right now all but two callers of alloc_file_pseudo() are followed
> by setting ->f_flags and for all those callers the mode argument passed
> to alloc_file_pseudo() is equal to OPEN_FMODE(->f_flags value to be).
> 
> The exceptions are __shmem_file_setup() and hugetlb_file_setup().  Both
> end up with FMODE_READ | FMODE_WRITE combined with 0 (i.e. O_RDONLY) in
> f_flags.  Which smells like a bug in making, at the very least.
> 
> Unless somebody has a good reason why those shouldn't have O_RDWR,
> I want to add (and fold back into individual "convert to alloc_file_pseudo"
> commits) the following:

[snip]

> Objections?  Another odd beastie is do_shmat() - there we end up with
> f_mode not matching f_flags, same way as in shmem and hugetlb.  If we
> could rectify that one as well, we'd be able to switch alloc_file_clone()
> to flags as well.  I would obviously prefer that kind of change to happen
> before these helpers go into mainline...

Actually, looking at the entire thing, I'm rather tempted to go for
	alloc_empty_file(f_flags, cred)
setting both f_flags and f_flags-derived part of f_mode, making
	alloc_file_pseudo(inode, mnt, name, f_flags, ops)
	alloc_file_clone(base, f_flags, ops)
do the same automatically as they call alloc_empty_file().

do_dentry_open() would, instead of
        f->f_mode |= OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
                                FMODE_PREAD | FMODE_PWRITE;
do just
        f->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;
just before calling ->open(), with comment along the lines of
"usually we want those set; let ->open() remove the wrong ones
if it wants to".
Al Viro July 12, 2018, 12:43 p.m. UTC | #2
On Wed, Jul 11, 2018 at 05:15:40PM +0100, Al Viro wrote:

> Actually, looking at the entire thing, I'm rather tempted to go for
> 	alloc_empty_file(f_flags, cred)
> setting both f_flags and f_flags-derived part of f_mode, making
> 	alloc_file_pseudo(inode, mnt, name, f_flags, ops)
> 	alloc_file_clone(base, f_flags, ops)
> do the same automatically as they call alloc_empty_file().

See vfs.git#work.open3; one commit added just before "pass creds
to get_empty_filp(), make sure dentry_open() passes the right creds",
one just after.

The first one ("alloc_file(): switch to passing O_... flags instead of
 FMODE_... mode") pulls ->f_flags assignments into preceding alloc_file(),
the second ("pass ->f_flags value to alloc_empty_file()") makes
alloc_empty_file() set ->f_flags and ->f_flags-derived part of ->f_mode,
with do_dentry_open() leaving these parts of ->f_mode alone.

The rest is pretty much identical to the previous iteration of the
series, except that alloc_file wrappers inherit the switch to passing
O_...

A question regarding the customs in such situations - are previous
Reviewed-by/Acked-by normally kept across rebases like that?
Linus Torvalds July 12, 2018, 3:05 p.m. UTC | #3
On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> A question regarding the customs in such situations - are previous
> Reviewed-by/Acked-by normally kept across rebases like that?

Yeah, unless there were big changes, keep the reviewed/acked-by lines.

Otherwise you'd never be able to handle different people giving
slightly different feedback about separate issues.

              Linus
Al Viro July 12, 2018, 3:53 p.m. UTC | #4
On Thu, Jul 12, 2018 at 08:05:26AM -0700, Linus Torvalds wrote:
> On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > A question regarding the customs in such situations - are previous
> > Reviewed-by/Acked-by normally kept across rebases like that?
> 
> Yeah, unless there were big changes, keep the reviewed/acked-by lines.
> 
> Otherwise you'd never be able to handle different people giving
> slightly different feedback about separate issues.

OK...  Miklos, I've pushed #ovl-candidate, with equivalent of the beginning
of your branch.  I'm *not* saying that I've no remaining issues
with your series - this is just how I'd prefer to resolve that group
of conflicts.

Everything past "vfs: simplify dentry_open()" could live on top of that
one, or its equivalent.

I'm going to put #work-open3 into -next, let's figure out what to do with
the conflicts; what I can promise is never-rebased status for #for-ovl
(the beginning of #work-open3 merged into #ovl-candidate).
Al Viro July 18, 2018, 2:56 a.m. UTC | #5
On Thu, Jul 12, 2018 at 04:53:37PM +0100, Al Viro wrote:
> On Thu, Jul 12, 2018 at 08:05:26AM -0700, Linus Torvalds wrote:
> > On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > A question regarding the customs in such situations - are previous
> > > Reviewed-by/Acked-by normally kept across rebases like that?
> > 
> > Yeah, unless there were big changes, keep the reviewed/acked-by lines.
> > 
> > Otherwise you'd never be able to handle different people giving
> > slightly different feedback about separate issues.
> 
> OK...  Miklos, I've pushed #ovl-candidate, with equivalent of the beginning
> of your branch.  I'm *not* saying that I've no remaining issues
> with your series - this is just how I'd prefer to resolve that group
> of conflicts.
> 
> Everything past "vfs: simplify dentry_open()" could live on top of that
> one, or its equivalent.
> 
> I'm going to put #work-open3 into -next, let's figure out what to do with
> the conflicts; what I can promise is never-rebased status for #for-ovl
> (the beginning of #work-open3 merged into #ovl-candidate).

... and now it even builds.  Said that, I would really like to hear something
from you - I can duplicate the entire overlayfs-next and merge it into
my #for-next and ask Steven to use that instead of your tree, but I very
much dislike going over your head like that.

I realize that you'd been away for a while and probably are digging yourself
from under the piles of mail, but it's getting late in the cycle and I want
to get #for-next into reasonably sane shape.  Please, look through that
thing and respond.
Stephen Rothwell July 18, 2018, 3:29 a.m. UTC | #6
Hi Al,

On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> ... and now it even builds.  Said that, I would really like to hear something
> from you - I can duplicate the entire overlayfs-next and merge it into
> my #for-next and ask Steven to use that instead of your tree, but I very
> much dislike going over your head like that.
> 
> I realize that you'd been away for a while and probably are digging yourself
> from under the piles of mail, but it's getting late in the cycle and I want
> to get #for-next into reasonably sane shape.  Please, look through that
> thing and respond.

Almost everything has been removed from the overlayfs tree in
linux-next today.  The only commit there currently is:

67810693077a ovl: fix wrong use of impure dir cache in ovl_iterate()
Miklos Szeredi July 18, 2018, 7:25 a.m. UTC | #7
On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> Hi Al,
>
> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>>
>> ... and now it even builds.  Said that, I would really like to hear something
>> from you - I can duplicate the entire overlayfs-next and merge it into
>> my #for-next and ask Steven to use that instead of your tree, but I very
>> much dislike going over your head like that.
>>
>> I realize that you'd been away for a while and probably are digging yourself
>> from under the piles of mail, but it's getting late in the cycle and I want
>> to get #for-next into reasonably sane shape.  Please, look through that
>> thing and respond.
>
> Almost everything has been removed from the overlayfs tree in
> linux-next today.  The only commit there currently is:
>
> 67810693077a ovl: fix wrong use of impure dir cache in ovl_iterate()

Al, thank you very much for taking care of this.   I've already begone
to go through those and will finish up the merge, hopefully today.

Thanks,
Miklos
Miklos Szeredi July 18, 2018, 12:10 p.m. UTC | #8
On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> Hi Al,
>>
>> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>>>
>>> ... and now it even builds.  Said that, I would really like to hear something
>>> from you - I can duplicate the entire overlayfs-next and merge it into
>>> my #for-next and ask Steven to use that instead of your tree, but I very
>>> much dislike going over your head like that.
>>>
>>> I realize that you'd been away for a while and probably are digging yourself
>>> from under the piles of mail, but it's getting late in the cycle and I want
>>> to get #for-next into reasonably sane shape.  Please, look through that
>>> thing and respond.

In "ovl: stack file ops" this:

    AV: make it use open_with_fake_path(), don't mess with override_creds

Maybe it's the way to go, but looks broken as is; e.g. NFS does call
current_creds() from its open method to get the credentials to work
with.

Okay, so ->open() is a file op, and file ops should use file->f_cred,
but how are we going to enforce this?

Thanks,
Miklos
Al Viro July 18, 2018, 12:43 p.m. UTC | #9
On Wed, Jul 18, 2018 at 02:10:32PM +0200, Miklos Szeredi wrote:
> On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
> > On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >> Hi Al,
> >>
> >> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> >>>
> >>> ... and now it even builds.  Said that, I would really like to hear something
> >>> from you - I can duplicate the entire overlayfs-next and merge it into
> >>> my #for-next and ask Steven to use that instead of your tree, but I very
> >>> much dislike going over your head like that.
> >>>
> >>> I realize that you'd been away for a while and probably are digging yourself
> >>> from under the piles of mail, but it's getting late in the cycle and I want
> >>> to get #for-next into reasonably sane shape.  Please, look through that
> >>> thing and respond.
> 
> In "ovl: stack file ops" this:
> 
>     AV: make it use open_with_fake_path(), don't mess with override_creds
> 
> Maybe it's the way to go, but looks broken as is; e.g. NFS does call
> current_creds() from its open method to get the credentials to work
> with.

It *is* broken.  For now leave override_creds() as in your variant, but
we really want to deal with that crap eventually.

> Okay, so ->open() is a file op, and file ops should use file->f_cred,
> but how are we going to enforce this?

I'd say we cut down on the use of current_cred() when deep in call chain...
Al Viro July 18, 2018, 1:46 p.m. UTC | #10
On Wed, Jul 18, 2018 at 01:43:40PM +0100, Al Viro wrote:

> It *is* broken.  For now leave override_creds() as in your variant, but
> we really want to deal with that crap eventually.
> 
> > Okay, so ->open() is a file op, and file ops should use file->f_cred,
> > but how are we going to enforce this?
> 
> I'd say we cut down on the use of current_cred() when deep in call chain...

Actually, how about this:

#define call_with_creds(__cred, expr) ({		\
	__typeof__(expr) ____res;			\
	const struct cred *____old = current->cred;	\
	const struct cred *____new = (__cred);		\
	rcu_assign_pointer(current->cred, ____new);	\
	____res = expr;					\
	BUG_ON(current->cred != ____new);		\
	rcu_assign_pointer(current->cred, ____old);	\
	____res;					\
})

and replacing
                error = open(inode, f);
with
		error = call_with_cred(f->f_cred, open(inode, f));

possibly with similar at other methods callsites?  Unlike override_creds()
and revert_creds() it's cheap - no validation of creds, no cacheline
ping-pong...

Folks?
Miklos Szeredi July 18, 2018, 3:46 p.m. UTC | #11
On Wed, Jul 18, 2018 at 2:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Jul 18, 2018 at 02:10:32PM +0200, Miklos Szeredi wrote:
>> On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote:
>> > On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> >> Hi Al,
>> >>
>> >> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>> >>>
>> >>> ... and now it even builds.  Said that, I would really like to hear something
>> >>> from you - I can duplicate the entire overlayfs-next and merge it into
>> >>> my #for-next and ask Steven to use that instead of your tree, but I very
>> >>> much dislike going over your head like that.
>> >>>
>> >>> I realize that you'd been away for a while and probably are digging yourself
>> >>> from under the piles of mail, but it's getting late in the cycle and I want
>> >>> to get #for-next into reasonably sane shape.  Please, look through that
>> >>> thing and respond.

Pushed updated series based on your vfs.git#for-ovl branch to the
overlayfs-next tree.  There's the additional patch dealing with
nr_files accounting (will post for review shortly).  That one has a
trivial conflict with the mount series, otherwise merges cleanly with
viro/vfs.git#for-next.

I like the call_with_creds() idea.  I didn't realize that
override_creds()/revert_creds() can be quite heavyweight due to doing
(unnecessary in this case) refcounting.   Could use call_with_creds()
in overlayfs too, since we hold ref on ofs->creator_cred for the
lifetime of the filesystem.

Thanks,
Miklos
Al Viro July 18, 2018, 6:12 p.m. UTC | #12
On Wed, Jul 18, 2018 at 05:46:09PM +0200, Miklos Szeredi wrote:

> I like the call_with_creds() idea.  I didn't realize that
> override_creds()/revert_creds() can be quite heavyweight due to doing
> (unnecessary in this case) refcounting.   Could use call_with_creds()
> in overlayfs too, since we hold ref on ofs->creator_cred for the
> lifetime of the filesystem.

OK...

/* expr is non-void here */
#define __call_with_creds(__cred, expr) ({			\
	const struct cred *____old = current->cred;		\
	const struct cred *____new = __cred;			\
	__typeof__(expr) __res;					\
	rcu_assign_pointer(current->cred, ____new);		\
	__res = (expr);						\
	BUG_ON(current->cred != ____new);			\
	rcu_assign_pointer(current->cred, ____old);		\
	__res; })

/* expr for non-void, expr,0 for void */
#define __wrap_void(expr) __builtin_choose_expr(		\
	__builtin_types_compatible_p(__typeof__(expr), void),	\
		((expr),0), (expr))

/*
 * Evaluate an expression with current->cred temporary set to __cred.
 * NOTE: expr must not result in changed ->cred - any changes during
 * its evaluation must be undone.
 */
#define call_with_creds(__cred, expr)	\
	((__typeof__(expr))__call_with_creds(__cred, __wrap_void(expr)))

Linus, David - do you have any objections to the above?  I would like
to do some of the file method calls via that - e.g. have callers of
->write() (both __vfs_write() and do_loop_readv_writev()) use
	nr = call_with_creds(file->f_cred,
			file->f_op->write(file, iovec.iov_base,
					  iovec.iov_len, ppos));
instead of
	nr = file->f_op->write(file, iovec.iov_base,
				  iovec.iov_len, ppos));
Ditto for call of open() in do_dentry_open(), etc.
Linus Torvalds July 18, 2018, 6:19 p.m. UTC | #13
On Wed, Jul 18, 2018 at 11:13 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Linus, David - do you have any objections to the above?

I damn well do.

I explained earlier why it's wrong and fragile, and why it can just
cause the *reverse* security problem if you do it wrong. So now you
take a subtle bug, and make it even more subtle, and encourage people
to do this known-broken model of using creds at IO time.

No.

Some debugging option to just clear current->creds entirely and catch
mis-uses, sure. But saying "we have shit buggy garbage in random write
functions, so we'll just paper over it"? No.

                 Linus
Al Viro July 18, 2018, 7:46 p.m. UTC | #14
On Wed, Jul 18, 2018 at 11:19:18AM -0700, Linus Torvalds wrote:
> On Wed, Jul 18, 2018 at 11:13 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Linus, David - do you have any objections to the above?
> 
> I damn well do.
> 
> I explained earlier why it's wrong and fragile, and why it can just
> cause the *reverse* security problem if you do it wrong. So now you
> take a subtle bug, and make it even more subtle, and encourage people
> to do this known-broken model of using creds at IO time.
> 
> No.
> 
> Some debugging option to just clear current->creds entirely and catch
> mis-uses, sure. But saying "we have shit buggy garbage in random write
> functions, so we'll just paper over it"? No.

Huh?  Nevermind ->write(), what about open()?  Here's a specific question
Miklos brought when I suggested to get rid of that override:
/*
 * These allocate and release file read/write context information.
 */
int nfs_open(struct inode *inode, struct file *filp)
{
        struct nfs_open_context *ctx;

        ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);

struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
                                                fmode_t f_mode,
                                                struct file *filp)
{
        struct nfs_open_context *ctx;
        struct rpc_cred *cred = rpc_lookup_cred();

struct rpc_cred *rpc_lookup_cred(void)
{
        return rpcauth_lookupcred(&generic_auth, 0);

struct rpc_cred *
rpcauth_lookupcred(struct rpc_auth *auth, int flags)
{
        struct auth_cred acred;
        struct rpc_cred *ret;
        const struct cred *cred = current_cred();

How should we bring the cred passed to do_dentry_open() where open() has been
called to rpcauth_lookupcred() where we end up looking for rpc_cred by what
should've been the cred passed to do_dentry_open() and is, instead, current_cred()?

We can pass filp->f_cred to rpc_lookup_cred() variant that gets it as an explicit
argument and feed it down to rpcauth_lookupcred() variant that does the same.
We can basically ignore the ->f_cred here.  Or we can get current_cred() equal
to ->f_cred for the duration of open().

I'd probably prefer the first variant, but the last part of the question Miklos
asked
> Okay, so ->open() is a file op, and file ops should use file->f_cred,
> but how are we going to enforce this?
is not trivial - how do we find the places where that kind of thing happens and
what do we do in the meanwhile?  I don't see any quick answers - any suggestions
would be very welcome.  It's not just direct current_cred() callers; that stuff
gets called deep in call chains.  And lifting it all the way up means a lot of
methods that need to get an explicit struct cred * argument.  Are you OK with
going in that direction?

I'm honestly not sure - it's not an attempt to maneuver you into changing your
policy re ->write().  Do we care about ->f_cred at all and if we do, how do we
get it consistent across the filesystems?  I'd buy "it's a weird and obscure thing"
for overlayfs, but that example is on NFS...

We definitely do have bugs in that area - consider e.g.
static int ecryptfs_threadfn(void *ignored)
{
        set_freezable();
        while (1)  {
                struct ecryptfs_open_req *req;

                wait_event_freezable(
                        ecryptfs_kthread_ctl.wait,
                        (!list_empty(&ecryptfs_kthread_ctl.req_list)
                         || kthread_should_stop()));
                mutex_lock(&ecryptfs_kthread_ctl.mux);
                if (ecryptfs_kthread_ctl.flags & ECRYPTFS_KTHREAD_ZOMBIE) {
                        mutex_unlock(&ecryptfs_kthread_ctl.mux);
                        goto out;
                }
                while (!list_empty(&ecryptfs_kthread_ctl.req_list)) {
                        req = list_first_entry(&ecryptfs_kthread_ctl.req_list,
                                               struct ecryptfs_open_req,
                                               kthread_ctl_list);
                        list_del(&req->kthread_ctl_list);
                        *req->lower_file = dentry_open(&req->path,
                                (O_RDWR | O_LARGEFILE), current_cred());
                        complete(&req->done);
                }
                mutex_unlock(&ecryptfs_kthread_ctl.mux);
        }
out:
        return 0;
}

It's a kernel thread, so current_cred() looks bogus...
Linus Torvalds July 18, 2018, 7:53 p.m. UTC | #15
On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Huh?  Nevermind ->write(), what about open()?

What about open?

At open time, file->f_cred is the same as current_cred().

So yes, open uses current cred. What's the problem?

Now, if you then use a tasklet or some other thread to do the open,
then obviously that is no longer true. But then the problem is that
you're doing the open() itself in the wrong context, and that has
nothing to do with any general issue, and everything to do with "you
changed to another context without pulling all the context data with
you - you're buggy". Doing some kind of "call_with_creds()" isn't the
solultion - it's just part of the whole thing (what about user
accounting etc? If you switch to another thread to do the work, you
have way more issues than just creds).

              Linus
Al Viro July 18, 2018, 8:04 p.m. UTC | #16
On Wed, Jul 18, 2018 at 12:53:48PM -0700, Linus Torvalds wrote:
> On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Huh?  Nevermind ->write(), what about open()?
> 
> What about open?
> 
> At open time, file->f_cred is the same as current_cred().

int cachefiles_write_page(struct fscache_storage *op, struct page *page)
{
...
        file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred);


int ecryptfs_privileged_open(struct file **lower_file,
                             struct dentry *lower_dentry,
                             struct vfsmount *lower_mnt,
                             const struct cred *cred)
...
        (*lower_file) = dentry_open(&req.path, flags, cred);


/* Derived from fs/exec.c:flush_old_files. */
static inline void flush_unauthorized_files(const struct cred *cred,
                                            struct files_struct *files)
...
        devnull = dentry_open(&selinux_null, O_RDWR, cred);

(granted, here we don't care much, /dev/null being what it is)

In mainline:
struct file *filp_clone_open(struct file *oldfile)
{
...
        retval = vfs_open(&oldfile->f_path, file, oldfile->f_cred);
Al Viro July 18, 2018, 8:15 p.m. UTC | #17
On Wed, Jul 18, 2018 at 09:04:11PM +0100, Al Viro wrote:
> On Wed, Jul 18, 2018 at 12:53:48PM -0700, Linus Torvalds wrote:
> > On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Huh?  Nevermind ->write(), what about open()?
> > 
> > What about open?
> > 
> > At open time, file->f_cred is the same as current_cred().
> 
> int cachefiles_write_page(struct fscache_storage *op, struct page *page)
> {
> ...
>         file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred);
> 
> 
> int ecryptfs_privileged_open(struct file **lower_file,
>                              struct dentry *lower_dentry,
>                              struct vfsmount *lower_mnt,
>                              const struct cred *cred)
> ...
>         (*lower_file) = dentry_open(&req.path, flags, cred);

Actually, scratch that one - in this case it is current_cred() (whether that's
the right value or not).

cachefile_write_page() case is for real, AFAICS.
Linus Torvalds July 18, 2018, 8:43 p.m. UTC | #18
On Wed, Jul 18, 2018 at 1:04 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>
> int cachefiles_write_page(struct fscache_storage *op, struct page *page)
> {
> ...
>         file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred);

Ugh. So on the one hand, in this case I'd actually be more ok with the
whole "call_with_creds()" model, because open() is actually *supposed*
to use creds.

At the same time, my stronger reaction is that "it's actually better
to just pass down the creds as a pointer like we do".

So I think the real problem is simply that people use the current
creds without thinking about it. Often it's just hidden in a
"capable(CAP_SYS_ADMIN)" call or something like that, where people
don't even _think_ about the whole thing.

What I really think I'd prefer is to have some simple way to "poison"
current_cred().  It could be something as simple as a per-thread
counter, and we'd have current_cred() do

        WARN_ON_ONCE(in_interrupt() || current->cred_poison);

and then read/write/open could just inc/dec the cred_poison counter
(when the debug option is set).

We wouldn't even need to catch all the odd cases. We'd only need to
inc/dec the poison counter in the normal read/write/open cases, not
even worrying about the odd special cases (like that
cachefiles_write_page() case).

Why? Because then we'd make sure the filesystems do the right thing,
and then cachefiles_write_page() etc wouldn't even have to worry,
because all the _common_ open cases have already tested that yes, it
uses the right cred pointer and doesn't try to get whatever "current"
happens to be.

So I'd really like to just fix the cases where we access the wrong
creds. And I'd be more than happy to add a few wrapper functions to
make us _see_ the cases where we do so.

What I don't like is the idea of trying to make bad creds accesses
"magically do the right thing".

See what I'm aiming for?

             Linus
Al Viro July 18, 2018, 9:22 p.m. UTC | #19
On Wed, Jul 18, 2018 at 01:43:00PM -0700, Linus Torvalds wrote:

> What I really think I'd prefer is to have some simple way to "poison"
> current_cred().  It could be something as simple as a per-thread
> counter, and we'd have current_cred() do
> 
>         WARN_ON_ONCE(in_interrupt() || current->cred_poison);

Why is counter any better than LSB of a pointer?
David Howells July 18, 2018, 9:27 p.m. UTC | #20
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> and then read/write/open could just inc/dec the cred_poison counter
> (when the debug option is set).

As I may have said, I have tried modifying the kernel to pass the cred pointer
down.  The drivers and ioctl() implementations are/were particularly nasty in
this respect.  So many of them were doing checks against the current thread,
not f_cred.

I think I need to work out some way to automate the process of adding in the
extra parameter as it's not something that I think can be trivially done with
coccinelle.

David
David Howells July 18, 2018, 9:28 p.m. UTC | #21
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I explained earlier why it's wrong and fragile, and why it can just
> cause the *reverse* security problem if you do it wrong. So now you
> take a subtle bug, and make it even more subtle, and encourage people
> to do this known-broken model of using creds at IO time.

Are network filesystems allowed to use f_cred at I/O time to determine the
authentication/encryption parameters to commune with the server?

David
Linus Torvalds July 18, 2018, 11:06 p.m. UTC | #22
On Wed, Jul 18, 2018 at 2:22 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Why is counter any better than LSB of a pointer?

Easier nesting, but if you do it with the "surround by a macro" model
I guess you can just save/restore instead (like you did in
call_with_creds).

           Linus
Linus Torvalds July 18, 2018, 11:13 p.m. UTC | #23
On Wed, Jul 18, 2018 at 2:28 PM David Howells <dhowells@redhat.com> wrote:
>
> Are network filesystems allowed to use f_cred at I/O time to determine the
> authentication/encryption parameters to commune with the server?

Absolutely. file->f_cred is very much "what was my ID at open time".

Of course, you may well have reasons why you actually want to cache
the key itself (and hide it in private_data or similar rather than
look it up, but if looking it up by uid is ok, then file->f_cred is
ok.

And if you check permissions at IO time (again using file->f_cred),
that's ok from a kernel perspective, but it's not really
POSIX-compliant. But obviously a lot of netrwork filesystems aren't
posix-compliant anyway.

                  Linus
Linus Torvalds July 18, 2018, 11:16 p.m. UTC | #24
On Wed, Jul 18, 2018 at 2:27 PM David Howells <dhowells@redhat.com> wrote:
>
> As I may have said, I have tried modifying the kernel to pass the cred pointer
> down.

It should always be there in the 'struct file *'.

Now, we may have some broken stuff that passes only inodes down, but
they probably really should be fixed.

> The drivers and ioctl() implementations are/were particularly nasty in
> this respect.  So many of them were doing checks against the current thread,
> not f_cred.

So ioctl() may be ok, simply because at least you shouldn't be able to
fool suid programs to do ioctl's on untrusted file descriptors.

So using current_cred() is still technically very wrong, but it's
probably not a huge problem in practice.

Now, if there's some cachefs kind of "do ioctl at the behest of
somebody else", then *that* would be a problem. I'm hoping there
isn't.

               Linus
diff mbox

Patch

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 77e2d623e115..6b8f5e37f0f7 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -613,7 +613,7 @@  in your dentry operations instead.
 --
 [mandatory]
 	alloc_file() has become static now; two wrappers are to be used instead.
-	alloc_file_pseudo(inode, vfsmount, name, mode, ops) is for the cases
+	alloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
 	when dentry needs to be created; that's the majority of old alloc_file()
 	users.  Calling conventions: on success a reference to new struct file
 	is returned and callers reference to inode is subsumed by that.  On
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index e0b9c00aecde..8fd5ec4d6042 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -90,11 +90,10 @@  static struct file *cxl_getfile(const char *name,
 	}
 
 	file = alloc_file_pseudo(inode, cxl_vfs_mount, name,
-				 OPEN_FMODE(flags), fops);
+				 flags & (O_ACCMODE | O_NONBLOCK), fops);
 	if (IS_ERR(file))
 		goto err_inode;
 
-	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
 	file->private_data = priv;
 
 	return file;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 6d0632174ec6..a43d44e7e7dd 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -115,7 +115,7 @@  static struct file *ocxlflash_getfile(struct device *dev, const char *name,
 	}
 
 	file = alloc_file_pseudo(inode, ocxlflash_vfs_mount, name,
-				 OPEN_FMODE(flags), fops);
+				 flags & (O_ACCMODE | O_NONBLOCK), fops);
 	if (IS_ERR(file)) {
 		rc = PTR_ERR(file);
 		dev_err(dev, "%s: alloc_file failed rc=%d\n",
@@ -123,7 +123,6 @@  static struct file *ocxlflash_getfile(struct device *dev, const char *name,
 		goto err4;
 	}
 
-	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
 	file->private_data = priv;
 out:
 	return file;
diff --git a/fs/aio.c b/fs/aio.c
index c5244c68f90e..c3a8bac16374 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -225,11 +225,9 @@  static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
 	inode->i_size = PAGE_SIZE * nr_pages;
 
 	file = alloc_file_pseudo(inode, aio_mnt, "[aio]",
-				FMODE_READ | FMODE_WRITE, &aio_ring_fops);
+				O_RDWR, &aio_ring_fops);
 	if (IS_ERR(file))
 		iput(inode);
-	else
-		file->f_flags = O_RDWR;
 	return file;
 }
 
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 7e13edd23db1..9a3c765fc7b1 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -85,13 +85,11 @@  struct file *anon_inode_getfile(const char *name,
 	 */
 	ihold(anon_inode_inode);
 	file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
-				 OPEN_FMODE(flags), fops);
+				 flags & (O_ACCMODE | O_NONBLOCK), fops);
 	if (IS_ERR(file))
 		goto err;
 
 	file->f_mapping = anon_inode_inode->i_mapping;
-
-	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
 	file->private_data = priv;
 
 	return file;
diff --git a/fs/file_table.c b/fs/file_table.c
index c5f651fd6830..af9141d8e29f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -182,7 +182,7 @@  static struct file *alloc_file(const struct path *path, fmode_t mode,
 }
 
 struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
-				const char *name, fmode_t mode,
+				const char *name, int flags,
 				const struct file_operations *fops)
 {
 	static const struct dentry_operations anon_ops = {
@@ -199,11 +199,12 @@  struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
 		d_set_d_op(path.dentry, &anon_ops);
 	path.mnt = mntget(mnt);
 	d_instantiate(path.dentry, inode);
-	file = alloc_file(&path, mode, fops);
+	file = alloc_file(&path, OPEN_FMODE(flags), fops);
 	if (IS_ERR(file)) {
 		ihold(inode);
 		path_put(&path);
 	}
+	file->f_flags = flags;
 	return file;
 }
 EXPORT_SYMBOL(alloc_file_pseudo);
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 86ffe04f73d6..87605c73361b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1358,8 +1358,7 @@  struct file *hugetlb_file_setup(const char *name, size_t size,
 			acctflag))
 		file = ERR_PTR(-ENOMEM);
 	else
-		file = alloc_file_pseudo(inode, mnt, name,
-					FMODE_WRITE | FMODE_READ,
+		file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
 					&hugetlbfs_file_operations);
 	if (!IS_ERR(file))
 		return file;
diff --git a/fs/pipe.c b/fs/pipe.c
index 94323a1d7c92..f5d30579e017 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -750,14 +750,15 @@  int create_pipe_files(struct file **res, int flags)
 	if (!inode)
 		return -ENFILE;
 
-	f = alloc_file_pseudo(inode, pipe_mnt, "", FMODE_WRITE, &pipefifo_fops);
+	f = alloc_file_pseudo(inode, pipe_mnt, "",
+				O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT)),
+				&pipefifo_fops);
 	if (IS_ERR(f)) {
 		free_pipe_info(inode->i_pipe);
 		iput(inode);
 		return PTR_ERR(f);
 	}
 
-	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->private_data = inode->i_pipe;
 
 	res[0] = alloc_file_clone(f, FMODE_READ, &pipefifo_fops);
diff --git a/include/linux/file.h b/include/linux/file.h
index 325b36ca336d..1972776e1677 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -20,7 +20,7 @@  struct dentry;
 struct inode;
 struct path;
 extern struct file *alloc_file_pseudo(struct inode *, struct vfsmount *,
-	const char *, fmode_t, const struct file_operations *);
+	const char *, int, const struct file_operations *);
 extern struct file *alloc_file_clone(struct file *, fmode_t,
 	const struct file_operations *);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index fd21df189f32..d7e984141be5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3922,8 +3922,7 @@  static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name, l
 	clear_nlink(inode);	/* It is unlinked */
 	res = ERR_PTR(ramfs_nommu_expand_for_mapping(inode, size));
 	if (!IS_ERR(res))
-		res = alloc_file_pseudo(inode, mnt, name,
-				FMODE_WRITE | FMODE_READ,
+		res = alloc_file_pseudo(inode, mnt, name, O_RDWR,
 				&shmem_file_operations);
 	if (IS_ERR(res))
 		iput(inode);
diff --git a/net/socket.c b/net/socket.c
index 81cf9906cae5..4cf3568caf9f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -397,14 +397,14 @@  struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 		dname = sock->sk ? sock->sk->sk_prot_creator->name : "";
 
 	file = alloc_file_pseudo(SOCK_INODE(sock), sock_mnt, dname,
-				FMODE_READ | FMODE_WRITE, &socket_file_ops);
+				O_RDWR | (flags & O_NONBLOCK),
+				&socket_file_ops);
 	if (IS_ERR(file)) {
 		sock_release(sock);
 		return file;
 	}
 
 	sock->file = file;
-	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
 	file->private_data = sock;
 	return file;
 }