Message ID | 20201120231441.29911-15-ebiederm@xmission.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [v2,01/24] exec: Move unshare_files to fix posix file locking during exec | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Fri, Nov 20, 2020 at 05:14:32PM -0600, Eric W. Biederman wrote: > When discussing[1] exec and posix file locks it was realized that none > of the callers of get_files_struct fundamentally needed to call > get_files_struct, and that by switching them to helper functions > instead it will both simplify their code and remove unnecessary > increments of files_struct.count. Those unnecessary increments can > result in exec unnecessarily unsharing files_struct which breaking > posix locks, and it can result in fget_light having to fallback to > fget reducing system performance. > > Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving > the checking for the maximum file descritor into the generic code, and > by remvoing the need for capturing and releasing a reference on > files_struct. > > As task_lookup_fd_rcu may update the fd ctx->pos has been changed > to be the fd +2 after task_lookup_fd_rcu returns. > + for (fd = ctx->pos - 2;; fd++) { > struct file *f; > struct fd_data data; > char name[10 + 1]; > unsigned int len; > > - f = files_lookup_fd_rcu(files, fd); > + f = task_lookup_next_fd_rcu(p, &fd); Ugh... That makes for a massive cacheline pingpong on task_lock - instead of grabbing/dropping task_lock() once in the beginning, we do that for every damn descriptor. I really don't like this one. If anything, I would rather have a helper that would collect a bunch of pairs (fd,mode) into an array and have lookups batched into it. With the loop in that sucker grabbing a reasonable amount into a local array, then doing proc_fill_cache() for each collected.
On Tue, Dec 08, 2020 at 04:38:09PM -0600, Eric W. Biederman wrote: > Is there any reason we don't simply rcu free the files_struct? > That would remove the need for the task_lock entirely. Umm... Potentially interesting part here is the interaction with close_files(); currently that can't overlap with any of those 3rd-party accesses to descriptor table, but with your changes here it's very much possible. OTOH, it's not like close_files() did much beyond the effects of already possible close(2) done by one of the threads sharing that sucker. It's _probably_ safe (at least for proc_readfd_common()), but I'll need to look at the other places doing that kind of access. Especially the BPF foulness... Oh, and in any case, the trick with repurposing ->rcu of embedded fdtable deserves a comment. It's not hard to explain, so...
On Tue, Dec 08, 2020 at 10:24:57PM -0600, Eric W. Biederman wrote: > Al Viro <viro@zeniv.linux.org.uk> writes: > > > On Tue, Dec 08, 2020 at 04:38:09PM -0600, Eric W. Biederman wrote: > > > >> Is there any reason we don't simply rcu free the files_struct? > >> That would remove the need for the task_lock entirely. > > > > Umm... Potentially interesting part here is the interaction with > > close_files(); currently that can't overlap with any of those > > 3rd-party accesses to descriptor table, but with your changes > > here it's very much possible. > > Good point. > > I was worried there might have been a concern about the overhead > introduced by always rcu freeing files table. > > > OTOH, it's not like close_files() did much beyond the effects of already > > possible close(2) done by one of the threads sharing that sucker. > > It's _probably_ safe (at least for proc_readfd_common()), but I'll need > > to look at the other places doing that kind of access. Especially the > > BPF foulness... Still digging, unfortunately ;-/ > > Oh, and in any case, the trick with repurposing ->rcu of embedded > > fdtable deserves a comment. It's not hard to explain, so... > > Agreed. Something like fdtable.rcu isn't used so use it so by reusing > it we keep from wasting memory in files_struct to have a dedicated > rcu_head. I'd probably go for something along the lines of "we can avoid adding a separate rcu_head into files_struct, since rcu_head in struct fdtable is only used for separately allocated instances, allowing us to repurpose files_struct->fdtab.rcu for RCU-delayed freeing of files_struct"...
diff --git a/fs/proc/fd.c b/fs/proc/fd.c index c1a984f3c4df..72c1525b4b3e 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -217,7 +217,6 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, instantiate_t instantiate) { struct task_struct *p = get_proc_task(file_inode(file)); - struct files_struct *files; unsigned int fd; if (!p) @@ -225,22 +224,18 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!dir_emit_dots(file, ctx)) goto out; - files = get_files_struct(p); - if (!files) - goto out; rcu_read_lock(); - for (fd = ctx->pos - 2; - fd < files_fdtable(files)->max_fds; - fd++, ctx->pos++) { + for (fd = ctx->pos - 2;; fd++) { struct file *f; struct fd_data data; char name[10 + 1]; unsigned int len; - f = files_lookup_fd_rcu(files, fd); + f = task_lookup_next_fd_rcu(p, &fd); + ctx->pos = fd + 2LL; if (!f) - continue; + break; data.mode = f->f_mode; rcu_read_unlock(); data.fd = fd; @@ -249,13 +244,11 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!proc_fill_cache(file, ctx, name, len, instantiate, p, &data)) - goto out_fd_loop; + goto out; cond_resched(); rcu_read_lock(); } rcu_read_unlock(); -out_fd_loop: - put_files_struct(files); out: put_task_struct(p); return 0;