Message ID | 20180711152555.GR30522@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 11, 2018 at 04:25:55PM +0100, Al Viro wrote: > FWIW, looking at the ->f_flags handling, I'm seriously tempted to do > alloc_file_pseudo(inode, mnt, name, f_flags, ops). > > Reason: right now all but two callers of alloc_file_pseudo() are followed > by setting ->f_flags and for all those callers the mode argument passed > to alloc_file_pseudo() is equal to OPEN_FMODE(->f_flags value to be). > > The exceptions are __shmem_file_setup() and hugetlb_file_setup(). Both > end up with FMODE_READ | FMODE_WRITE combined with 0 (i.e. O_RDONLY) in > f_flags. Which smells like a bug in making, at the very least. > > Unless somebody has a good reason why those shouldn't have O_RDWR, > I want to add (and fold back into individual "convert to alloc_file_pseudo" > commits) the following: [snip] > Objections? Another odd beastie is do_shmat() - there we end up with > f_mode not matching f_flags, same way as in shmem and hugetlb. If we > could rectify that one as well, we'd be able to switch alloc_file_clone() > to flags as well. I would obviously prefer that kind of change to happen > before these helpers go into mainline... Actually, looking at the entire thing, I'm rather tempted to go for alloc_empty_file(f_flags, cred) setting both f_flags and f_flags-derived part of f_mode, making alloc_file_pseudo(inode, mnt, name, f_flags, ops) alloc_file_clone(base, f_flags, ops) do the same automatically as they call alloc_empty_file(). do_dentry_open() would, instead of f->f_mode |= OPEN_FMODE(f->f_flags) | FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; do just f->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; just before calling ->open(), with comment along the lines of "usually we want those set; let ->open() remove the wrong ones if it wants to".
On Wed, Jul 11, 2018 at 05:15:40PM +0100, Al Viro wrote: > Actually, looking at the entire thing, I'm rather tempted to go for > alloc_empty_file(f_flags, cred) > setting both f_flags and f_flags-derived part of f_mode, making > alloc_file_pseudo(inode, mnt, name, f_flags, ops) > alloc_file_clone(base, f_flags, ops) > do the same automatically as they call alloc_empty_file(). See vfs.git#work.open3; one commit added just before "pass creds to get_empty_filp(), make sure dentry_open() passes the right creds", one just after. The first one ("alloc_file(): switch to passing O_... flags instead of FMODE_... mode") pulls ->f_flags assignments into preceding alloc_file(), the second ("pass ->f_flags value to alloc_empty_file()") makes alloc_empty_file() set ->f_flags and ->f_flags-derived part of ->f_mode, with do_dentry_open() leaving these parts of ->f_mode alone. The rest is pretty much identical to the previous iteration of the series, except that alloc_file wrappers inherit the switch to passing O_... A question regarding the customs in such situations - are previous Reviewed-by/Acked-by normally kept across rebases like that?
On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > A question regarding the customs in such situations - are previous > Reviewed-by/Acked-by normally kept across rebases like that? Yeah, unless there were big changes, keep the reviewed/acked-by lines. Otherwise you'd never be able to handle different people giving slightly different feedback about separate issues. Linus
On Thu, Jul 12, 2018 at 08:05:26AM -0700, Linus Torvalds wrote: > On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > A question regarding the customs in such situations - are previous > > Reviewed-by/Acked-by normally kept across rebases like that? > > Yeah, unless there were big changes, keep the reviewed/acked-by lines. > > Otherwise you'd never be able to handle different people giving > slightly different feedback about separate issues. OK... Miklos, I've pushed #ovl-candidate, with equivalent of the beginning of your branch. I'm *not* saying that I've no remaining issues with your series - this is just how I'd prefer to resolve that group of conflicts. Everything past "vfs: simplify dentry_open()" could live on top of that one, or its equivalent. I'm going to put #work-open3 into -next, let's figure out what to do with the conflicts; what I can promise is never-rebased status for #for-ovl (the beginning of #work-open3 merged into #ovl-candidate).
On Thu, Jul 12, 2018 at 04:53:37PM +0100, Al Viro wrote: > On Thu, Jul 12, 2018 at 08:05:26AM -0700, Linus Torvalds wrote: > > On Thu, Jul 12, 2018 at 5:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > A question regarding the customs in such situations - are previous > > > Reviewed-by/Acked-by normally kept across rebases like that? > > > > Yeah, unless there were big changes, keep the reviewed/acked-by lines. > > > > Otherwise you'd never be able to handle different people giving > > slightly different feedback about separate issues. > > OK... Miklos, I've pushed #ovl-candidate, with equivalent of the beginning > of your branch. I'm *not* saying that I've no remaining issues > with your series - this is just how I'd prefer to resolve that group > of conflicts. > > Everything past "vfs: simplify dentry_open()" could live on top of that > one, or its equivalent. > > I'm going to put #work-open3 into -next, let's figure out what to do with > the conflicts; what I can promise is never-rebased status for #for-ovl > (the beginning of #work-open3 merged into #ovl-candidate). ... and now it even builds. Said that, I would really like to hear something from you - I can duplicate the entire overlayfs-next and merge it into my #for-next and ask Steven to use that instead of your tree, but I very much dislike going over your head like that. I realize that you'd been away for a while and probably are digging yourself from under the piles of mail, but it's getting late in the cycle and I want to get #for-next into reasonably sane shape. Please, look through that thing and respond.
Hi Al, On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: > > ... and now it even builds. Said that, I would really like to hear something > from you - I can duplicate the entire overlayfs-next and merge it into > my #for-next and ask Steven to use that instead of your tree, but I very > much dislike going over your head like that. > > I realize that you'd been away for a while and probably are digging yourself > from under the piles of mail, but it's getting late in the cycle and I want > to get #for-next into reasonably sane shape. Please, look through that > thing and respond. Almost everything has been removed from the overlayfs tree in linux-next today. The only commit there currently is: 67810693077a ovl: fix wrong use of impure dir cache in ovl_iterate()
On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > Hi Al, > > On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: >> >> ... and now it even builds. Said that, I would really like to hear something >> from you - I can duplicate the entire overlayfs-next and merge it into >> my #for-next and ask Steven to use that instead of your tree, but I very >> much dislike going over your head like that. >> >> I realize that you'd been away for a while and probably are digging yourself >> from under the piles of mail, but it's getting late in the cycle and I want >> to get #for-next into reasonably sane shape. Please, look through that >> thing and respond. > > Almost everything has been removed from the overlayfs tree in > linux-next today. The only commit there currently is: > > 67810693077a ovl: fix wrong use of impure dir cache in ovl_iterate() Al, thank you very much for taking care of this. I've already begone to go through those and will finish up the merge, hopefully today. Thanks, Miklos
On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: >> Hi Al, >> >> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: >>> >>> ... and now it even builds. Said that, I would really like to hear something >>> from you - I can duplicate the entire overlayfs-next and merge it into >>> my #for-next and ask Steven to use that instead of your tree, but I very >>> much dislike going over your head like that. >>> >>> I realize that you'd been away for a while and probably are digging yourself >>> from under the piles of mail, but it's getting late in the cycle and I want >>> to get #for-next into reasonably sane shape. Please, look through that >>> thing and respond. In "ovl: stack file ops" this: AV: make it use open_with_fake_path(), don't mess with override_creds Maybe it's the way to go, but looks broken as is; e.g. NFS does call current_creds() from its open method to get the credentials to work with. Okay, so ->open() is a file op, and file ops should use file->f_cred, but how are we going to enforce this? Thanks, Miklos
On Wed, Jul 18, 2018 at 02:10:32PM +0200, Miklos Szeredi wrote: > On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > > On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > >> Hi Al, > >> > >> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: > >>> > >>> ... and now it even builds. Said that, I would really like to hear something > >>> from you - I can duplicate the entire overlayfs-next and merge it into > >>> my #for-next and ask Steven to use that instead of your tree, but I very > >>> much dislike going over your head like that. > >>> > >>> I realize that you'd been away for a while and probably are digging yourself > >>> from under the piles of mail, but it's getting late in the cycle and I want > >>> to get #for-next into reasonably sane shape. Please, look through that > >>> thing and respond. > > In "ovl: stack file ops" this: > > AV: make it use open_with_fake_path(), don't mess with override_creds > > Maybe it's the way to go, but looks broken as is; e.g. NFS does call > current_creds() from its open method to get the credentials to work > with. It *is* broken. For now leave override_creds() as in your variant, but we really want to deal with that crap eventually. > Okay, so ->open() is a file op, and file ops should use file->f_cred, > but how are we going to enforce this? I'd say we cut down on the use of current_cred() when deep in call chain...
On Wed, Jul 18, 2018 at 01:43:40PM +0100, Al Viro wrote: > It *is* broken. For now leave override_creds() as in your variant, but > we really want to deal with that crap eventually. > > > Okay, so ->open() is a file op, and file ops should use file->f_cred, > > but how are we going to enforce this? > > I'd say we cut down on the use of current_cred() when deep in call chain... Actually, how about this: #define call_with_creds(__cred, expr) ({ \ __typeof__(expr) ____res; \ const struct cred *____old = current->cred; \ const struct cred *____new = (__cred); \ rcu_assign_pointer(current->cred, ____new); \ ____res = expr; \ BUG_ON(current->cred != ____new); \ rcu_assign_pointer(current->cred, ____old); \ ____res; \ }) and replacing error = open(inode, f); with error = call_with_cred(f->f_cred, open(inode, f)); possibly with similar at other methods callsites? Unlike override_creds() and revert_creds() it's cheap - no validation of creds, no cacheline ping-pong... Folks?
On Wed, Jul 18, 2018 at 2:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Jul 18, 2018 at 02:10:32PM +0200, Miklos Szeredi wrote: >> On Wed, Jul 18, 2018 at 9:25 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> > On Wed, Jul 18, 2018 at 5:29 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: >> >> Hi Al, >> >> >> >> On Wed, 18 Jul 2018 03:56:37 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: >> >>> >> >>> ... and now it even builds. Said that, I would really like to hear something >> >>> from you - I can duplicate the entire overlayfs-next and merge it into >> >>> my #for-next and ask Steven to use that instead of your tree, but I very >> >>> much dislike going over your head like that. >> >>> >> >>> I realize that you'd been away for a while and probably are digging yourself >> >>> from under the piles of mail, but it's getting late in the cycle and I want >> >>> to get #for-next into reasonably sane shape. Please, look through that >> >>> thing and respond. Pushed updated series based on your vfs.git#for-ovl branch to the overlayfs-next tree. There's the additional patch dealing with nr_files accounting (will post for review shortly). That one has a trivial conflict with the mount series, otherwise merges cleanly with viro/vfs.git#for-next. I like the call_with_creds() idea. I didn't realize that override_creds()/revert_creds() can be quite heavyweight due to doing (unnecessary in this case) refcounting. Could use call_with_creds() in overlayfs too, since we hold ref on ofs->creator_cred for the lifetime of the filesystem. Thanks, Miklos
On Wed, Jul 18, 2018 at 05:46:09PM +0200, Miklos Szeredi wrote: > I like the call_with_creds() idea. I didn't realize that > override_creds()/revert_creds() can be quite heavyweight due to doing > (unnecessary in this case) refcounting. Could use call_with_creds() > in overlayfs too, since we hold ref on ofs->creator_cred for the > lifetime of the filesystem. OK... /* expr is non-void here */ #define __call_with_creds(__cred, expr) ({ \ const struct cred *____old = current->cred; \ const struct cred *____new = __cred; \ __typeof__(expr) __res; \ rcu_assign_pointer(current->cred, ____new); \ __res = (expr); \ BUG_ON(current->cred != ____new); \ rcu_assign_pointer(current->cred, ____old); \ __res; }) /* expr for non-void, expr,0 for void */ #define __wrap_void(expr) __builtin_choose_expr( \ __builtin_types_compatible_p(__typeof__(expr), void), \ ((expr),0), (expr)) /* * Evaluate an expression with current->cred temporary set to __cred. * NOTE: expr must not result in changed ->cred - any changes during * its evaluation must be undone. */ #define call_with_creds(__cred, expr) \ ((__typeof__(expr))__call_with_creds(__cred, __wrap_void(expr))) Linus, David - do you have any objections to the above? I would like to do some of the file method calls via that - e.g. have callers of ->write() (both __vfs_write() and do_loop_readv_writev()) use nr = call_with_creds(file->f_cred, file->f_op->write(file, iovec.iov_base, iovec.iov_len, ppos)); instead of nr = file->f_op->write(file, iovec.iov_base, iovec.iov_len, ppos)); Ditto for call of open() in do_dentry_open(), etc.
On Wed, Jul 18, 2018 at 11:13 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Linus, David - do you have any objections to the above? I damn well do. I explained earlier why it's wrong and fragile, and why it can just cause the *reverse* security problem if you do it wrong. So now you take a subtle bug, and make it even more subtle, and encourage people to do this known-broken model of using creds at IO time. No. Some debugging option to just clear current->creds entirely and catch mis-uses, sure. But saying "we have shit buggy garbage in random write functions, so we'll just paper over it"? No. Linus
On Wed, Jul 18, 2018 at 11:19:18AM -0700, Linus Torvalds wrote: > On Wed, Jul 18, 2018 at 11:13 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Linus, David - do you have any objections to the above? > > I damn well do. > > I explained earlier why it's wrong and fragile, and why it can just > cause the *reverse* security problem if you do it wrong. So now you > take a subtle bug, and make it even more subtle, and encourage people > to do this known-broken model of using creds at IO time. > > No. > > Some debugging option to just clear current->creds entirely and catch > mis-uses, sure. But saying "we have shit buggy garbage in random write > functions, so we'll just paper over it"? No. Huh? Nevermind ->write(), what about open()? Here's a specific question Miklos brought when I suggested to get rid of that override: /* * These allocate and release file read/write context information. */ int nfs_open(struct inode *inode, struct file *filp) { struct nfs_open_context *ctx; ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry, fmode_t f_mode, struct file *filp) { struct nfs_open_context *ctx; struct rpc_cred *cred = rpc_lookup_cred(); struct rpc_cred *rpc_lookup_cred(void) { return rpcauth_lookupcred(&generic_auth, 0); struct rpc_cred * rpcauth_lookupcred(struct rpc_auth *auth, int flags) { struct auth_cred acred; struct rpc_cred *ret; const struct cred *cred = current_cred(); How should we bring the cred passed to do_dentry_open() where open() has been called to rpcauth_lookupcred() where we end up looking for rpc_cred by what should've been the cred passed to do_dentry_open() and is, instead, current_cred()? We can pass filp->f_cred to rpc_lookup_cred() variant that gets it as an explicit argument and feed it down to rpcauth_lookupcred() variant that does the same. We can basically ignore the ->f_cred here. Or we can get current_cred() equal to ->f_cred for the duration of open(). I'd probably prefer the first variant, but the last part of the question Miklos asked > Okay, so ->open() is a file op, and file ops should use file->f_cred, > but how are we going to enforce this? is not trivial - how do we find the places where that kind of thing happens and what do we do in the meanwhile? I don't see any quick answers - any suggestions would be very welcome. It's not just direct current_cred() callers; that stuff gets called deep in call chains. And lifting it all the way up means a lot of methods that need to get an explicit struct cred * argument. Are you OK with going in that direction? I'm honestly not sure - it's not an attempt to maneuver you into changing your policy re ->write(). Do we care about ->f_cred at all and if we do, how do we get it consistent across the filesystems? I'd buy "it's a weird and obscure thing" for overlayfs, but that example is on NFS... We definitely do have bugs in that area - consider e.g. static int ecryptfs_threadfn(void *ignored) { set_freezable(); while (1) { struct ecryptfs_open_req *req; wait_event_freezable( ecryptfs_kthread_ctl.wait, (!list_empty(&ecryptfs_kthread_ctl.req_list) || kthread_should_stop())); mutex_lock(&ecryptfs_kthread_ctl.mux); if (ecryptfs_kthread_ctl.flags & ECRYPTFS_KTHREAD_ZOMBIE) { mutex_unlock(&ecryptfs_kthread_ctl.mux); goto out; } while (!list_empty(&ecryptfs_kthread_ctl.req_list)) { req = list_first_entry(&ecryptfs_kthread_ctl.req_list, struct ecryptfs_open_req, kthread_ctl_list); list_del(&req->kthread_ctl_list); *req->lower_file = dentry_open(&req->path, (O_RDWR | O_LARGEFILE), current_cred()); complete(&req->done); } mutex_unlock(&ecryptfs_kthread_ctl.mux); } out: return 0; } It's a kernel thread, so current_cred() looks bogus...
On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Huh? Nevermind ->write(), what about open()? What about open? At open time, file->f_cred is the same as current_cred(). So yes, open uses current cred. What's the problem? Now, if you then use a tasklet or some other thread to do the open, then obviously that is no longer true. But then the problem is that you're doing the open() itself in the wrong context, and that has nothing to do with any general issue, and everything to do with "you changed to another context without pulling all the context data with you - you're buggy". Doing some kind of "call_with_creds()" isn't the solultion - it's just part of the whole thing (what about user accounting etc? If you switch to another thread to do the work, you have way more issues than just creds). Linus
On Wed, Jul 18, 2018 at 12:53:48PM -0700, Linus Torvalds wrote: > On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Huh? Nevermind ->write(), what about open()? > > What about open? > > At open time, file->f_cred is the same as current_cred(). int cachefiles_write_page(struct fscache_storage *op, struct page *page) { ... file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred); int ecryptfs_privileged_open(struct file **lower_file, struct dentry *lower_dentry, struct vfsmount *lower_mnt, const struct cred *cred) ... (*lower_file) = dentry_open(&req.path, flags, cred); /* Derived from fs/exec.c:flush_old_files. */ static inline void flush_unauthorized_files(const struct cred *cred, struct files_struct *files) ... devnull = dentry_open(&selinux_null, O_RDWR, cred); (granted, here we don't care much, /dev/null being what it is) In mainline: struct file *filp_clone_open(struct file *oldfile) { ... retval = vfs_open(&oldfile->f_path, file, oldfile->f_cred);
On Wed, Jul 18, 2018 at 09:04:11PM +0100, Al Viro wrote: > On Wed, Jul 18, 2018 at 12:53:48PM -0700, Linus Torvalds wrote: > > On Wed, Jul 18, 2018 at 12:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > Huh? Nevermind ->write(), what about open()? > > > > What about open? > > > > At open time, file->f_cred is the same as current_cred(). > > int cachefiles_write_page(struct fscache_storage *op, struct page *page) > { > ... > file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred); > > > int ecryptfs_privileged_open(struct file **lower_file, > struct dentry *lower_dentry, > struct vfsmount *lower_mnt, > const struct cred *cred) > ... > (*lower_file) = dentry_open(&req.path, flags, cred); Actually, scratch that one - in this case it is current_cred() (whether that's the right value or not). cachefile_write_page() case is for real, AFAICS.
On Wed, Jul 18, 2018 at 1:04 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > int cachefiles_write_page(struct fscache_storage *op, struct page *page) > { > ... > file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred); Ugh. So on the one hand, in this case I'd actually be more ok with the whole "call_with_creds()" model, because open() is actually *supposed* to use creds. At the same time, my stronger reaction is that "it's actually better to just pass down the creds as a pointer like we do". So I think the real problem is simply that people use the current creds without thinking about it. Often it's just hidden in a "capable(CAP_SYS_ADMIN)" call or something like that, where people don't even _think_ about the whole thing. What I really think I'd prefer is to have some simple way to "poison" current_cred(). It could be something as simple as a per-thread counter, and we'd have current_cred() do WARN_ON_ONCE(in_interrupt() || current->cred_poison); and then read/write/open could just inc/dec the cred_poison counter (when the debug option is set). We wouldn't even need to catch all the odd cases. We'd only need to inc/dec the poison counter in the normal read/write/open cases, not even worrying about the odd special cases (like that cachefiles_write_page() case). Why? Because then we'd make sure the filesystems do the right thing, and then cachefiles_write_page() etc wouldn't even have to worry, because all the _common_ open cases have already tested that yes, it uses the right cred pointer and doesn't try to get whatever "current" happens to be. So I'd really like to just fix the cases where we access the wrong creds. And I'd be more than happy to add a few wrapper functions to make us _see_ the cases where we do so. What I don't like is the idea of trying to make bad creds accesses "magically do the right thing". See what I'm aiming for? Linus
On Wed, Jul 18, 2018 at 01:43:00PM -0700, Linus Torvalds wrote: > What I really think I'd prefer is to have some simple way to "poison" > current_cred(). It could be something as simple as a per-thread > counter, and we'd have current_cred() do > > WARN_ON_ONCE(in_interrupt() || current->cred_poison); Why is counter any better than LSB of a pointer?
Linus Torvalds <torvalds@linux-foundation.org> wrote: > and then read/write/open could just inc/dec the cred_poison counter > (when the debug option is set). As I may have said, I have tried modifying the kernel to pass the cred pointer down. The drivers and ioctl() implementations are/were particularly nasty in this respect. So many of them were doing checks against the current thread, not f_cred. I think I need to work out some way to automate the process of adding in the extra parameter as it's not something that I think can be trivially done with coccinelle. David
Linus Torvalds <torvalds@linux-foundation.org> wrote: > I explained earlier why it's wrong and fragile, and why it can just > cause the *reverse* security problem if you do it wrong. So now you > take a subtle bug, and make it even more subtle, and encourage people > to do this known-broken model of using creds at IO time. Are network filesystems allowed to use f_cred at I/O time to determine the authentication/encryption parameters to commune with the server? David
On Wed, Jul 18, 2018 at 2:22 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Why is counter any better than LSB of a pointer? Easier nesting, but if you do it with the "surround by a macro" model I guess you can just save/restore instead (like you did in call_with_creds). Linus
On Wed, Jul 18, 2018 at 2:28 PM David Howells <dhowells@redhat.com> wrote: > > Are network filesystems allowed to use f_cred at I/O time to determine the > authentication/encryption parameters to commune with the server? Absolutely. file->f_cred is very much "what was my ID at open time". Of course, you may well have reasons why you actually want to cache the key itself (and hide it in private_data or similar rather than look it up, but if looking it up by uid is ok, then file->f_cred is ok. And if you check permissions at IO time (again using file->f_cred), that's ok from a kernel perspective, but it's not really POSIX-compliant. But obviously a lot of netrwork filesystems aren't posix-compliant anyway. Linus
On Wed, Jul 18, 2018 at 2:27 PM David Howells <dhowells@redhat.com> wrote: > > As I may have said, I have tried modifying the kernel to pass the cred pointer > down. It should always be there in the 'struct file *'. Now, we may have some broken stuff that passes only inodes down, but they probably really should be fixed. > The drivers and ioctl() implementations are/were particularly nasty in > this respect. So many of them were doing checks against the current thread, > not f_cred. So ioctl() may be ok, simply because at least you shouldn't be able to fool suid programs to do ioctl's on untrusted file descriptors. So using current_cred() is still technically very wrong, but it's probably not a huge problem in practice. Now, if there's some cachefs kind of "do ioctl at the behest of somebody else", then *that* would be a problem. I'm hoping there isn't. Linus
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 77e2d623e115..6b8f5e37f0f7 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -613,7 +613,7 @@ in your dentry operations instead. -- [mandatory] alloc_file() has become static now; two wrappers are to be used instead. - alloc_file_pseudo(inode, vfsmount, name, mode, ops) is for the cases + alloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases when dentry needs to be created; that's the majority of old alloc_file() users. Calling conventions: on success a reference to new struct file is returned and callers reference to inode is subsumed by that. On diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c index e0b9c00aecde..8fd5ec4d6042 100644 --- a/drivers/misc/cxl/api.c +++ b/drivers/misc/cxl/api.c @@ -90,11 +90,10 @@ static struct file *cxl_getfile(const char *name, } file = alloc_file_pseudo(inode, cxl_vfs_mount, name, - OPEN_FMODE(flags), fops); + flags & (O_ACCMODE | O_NONBLOCK), fops); if (IS_ERR(file)) goto err_inode; - file->f_flags = flags & (O_ACCMODE | O_NONBLOCK); file->private_data = priv; return file; diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c index 6d0632174ec6..a43d44e7e7dd 100644 --- a/drivers/scsi/cxlflash/ocxl_hw.c +++ b/drivers/scsi/cxlflash/ocxl_hw.c @@ -115,7 +115,7 @@ static struct file *ocxlflash_getfile(struct device *dev, const char *name, } file = alloc_file_pseudo(inode, ocxlflash_vfs_mount, name, - OPEN_FMODE(flags), fops); + flags & (O_ACCMODE | O_NONBLOCK), fops); if (IS_ERR(file)) { rc = PTR_ERR(file); dev_err(dev, "%s: alloc_file failed rc=%d\n", @@ -123,7 +123,6 @@ static struct file *ocxlflash_getfile(struct device *dev, const char *name, goto err4; } - file->f_flags = flags & (O_ACCMODE | O_NONBLOCK); file->private_data = priv; out: return file; diff --git a/fs/aio.c b/fs/aio.c index c5244c68f90e..c3a8bac16374 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -225,11 +225,9 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) inode->i_size = PAGE_SIZE * nr_pages; file = alloc_file_pseudo(inode, aio_mnt, "[aio]", - FMODE_READ | FMODE_WRITE, &aio_ring_fops); + O_RDWR, &aio_ring_fops); if (IS_ERR(file)) iput(inode); - else - file->f_flags = O_RDWR; return file; } diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 7e13edd23db1..9a3c765fc7b1 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -85,13 +85,11 @@ struct file *anon_inode_getfile(const char *name, */ ihold(anon_inode_inode); file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name, - OPEN_FMODE(flags), fops); + flags & (O_ACCMODE | O_NONBLOCK), fops); if (IS_ERR(file)) goto err; file->f_mapping = anon_inode_inode->i_mapping; - - file->f_flags = flags & (O_ACCMODE | O_NONBLOCK); file->private_data = priv; return file; diff --git a/fs/file_table.c b/fs/file_table.c index c5f651fd6830..af9141d8e29f 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -182,7 +182,7 @@ static struct file *alloc_file(const struct path *path, fmode_t mode, } struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt, - const char *name, fmode_t mode, + const char *name, int flags, const struct file_operations *fops) { static const struct dentry_operations anon_ops = { @@ -199,11 +199,12 @@ struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt, d_set_d_op(path.dentry, &anon_ops); path.mnt = mntget(mnt); d_instantiate(path.dentry, inode); - file = alloc_file(&path, mode, fops); + file = alloc_file(&path, OPEN_FMODE(flags), fops); if (IS_ERR(file)) { ihold(inode); path_put(&path); } + file->f_flags = flags; return file; } EXPORT_SYMBOL(alloc_file_pseudo); diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 86ffe04f73d6..87605c73361b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1358,8 +1358,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, acctflag)) file = ERR_PTR(-ENOMEM); else - file = alloc_file_pseudo(inode, mnt, name, - FMODE_WRITE | FMODE_READ, + file = alloc_file_pseudo(inode, mnt, name, O_RDWR, &hugetlbfs_file_operations); if (!IS_ERR(file)) return file; diff --git a/fs/pipe.c b/fs/pipe.c index 94323a1d7c92..f5d30579e017 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -750,14 +750,15 @@ int create_pipe_files(struct file **res, int flags) if (!inode) return -ENFILE; - f = alloc_file_pseudo(inode, pipe_mnt, "", FMODE_WRITE, &pipefifo_fops); + f = alloc_file_pseudo(inode, pipe_mnt, "", + O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT)), + &pipefifo_fops); if (IS_ERR(f)) { free_pipe_info(inode->i_pipe); iput(inode); return PTR_ERR(f); } - f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT)); f->private_data = inode->i_pipe; res[0] = alloc_file_clone(f, FMODE_READ, &pipefifo_fops); diff --git a/include/linux/file.h b/include/linux/file.h index 325b36ca336d..1972776e1677 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -20,7 +20,7 @@ struct dentry; struct inode; struct path; extern struct file *alloc_file_pseudo(struct inode *, struct vfsmount *, - const char *, fmode_t, const struct file_operations *); + const char *, int, const struct file_operations *); extern struct file *alloc_file_clone(struct file *, fmode_t, const struct file_operations *); diff --git a/mm/shmem.c b/mm/shmem.c index fd21df189f32..d7e984141be5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3922,8 +3922,7 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name, l clear_nlink(inode); /* It is unlinked */ res = ERR_PTR(ramfs_nommu_expand_for_mapping(inode, size)); if (!IS_ERR(res)) - res = alloc_file_pseudo(inode, mnt, name, - FMODE_WRITE | FMODE_READ, + res = alloc_file_pseudo(inode, mnt, name, O_RDWR, &shmem_file_operations); if (IS_ERR(res)) iput(inode); diff --git a/net/socket.c b/net/socket.c index 81cf9906cae5..4cf3568caf9f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -397,14 +397,14 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname) dname = sock->sk ? sock->sk->sk_prot_creator->name : ""; file = alloc_file_pseudo(SOCK_INODE(sock), sock_mnt, dname, - FMODE_READ | FMODE_WRITE, &socket_file_ops); + O_RDWR | (flags & O_NONBLOCK), + &socket_file_ops); if (IS_ERR(file)) { sock_release(sock); return file; } sock->file = file; - file->f_flags = O_RDWR | (flags & O_NONBLOCK); file->private_data = sock; return file; }