diff mbox series

[v2,15/24] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu

Message ID 20201120231441.29911-15-ebiederm@xmission.com (mailing list archive)
State Not Applicable
Headers show
Series [v2,01/24] exec: Move unshare_files to fix posix file locking during exec | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Eric W. Biederman Nov. 20, 2020, 11:14 p.m. UTC
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count.  Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.

Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving
the checking for the maximum file descritor into the generic code, and
by remvoing the need for capturing and releasing a reference on
files_struct.

As task_lookup_fd_rcu may update the fd ctx->pos has been changed
to be the fd +2 after task_lookup_fd_rcu returns.

[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Andy Lavr <andy.lavr@gmail.com>
v1: https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/fd.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Comments

Al Viro Dec. 7, 2020, 11:29 p.m. UTC | #1
On Fri, Nov 20, 2020 at 05:14:32PM -0600, Eric W. Biederman wrote:
> When discussing[1] exec and posix file locks it was realized that none
> of the callers of get_files_struct fundamentally needed to call
> get_files_struct, and that by switching them to helper functions
> instead it will both simplify their code and remove unnecessary
> increments of files_struct.count.  Those unnecessary increments can
> result in exec unnecessarily unsharing files_struct which breaking
> posix locks, and it can result in fget_light having to fallback to
> fget reducing system performance.
> 
> Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving
> the checking for the maximum file descritor into the generic code, and
> by remvoing the need for capturing and releasing a reference on
> files_struct.
> 
> As task_lookup_fd_rcu may update the fd ctx->pos has been changed
> to be the fd +2 after task_lookup_fd_rcu returns.


> +	for (fd = ctx->pos - 2;; fd++) {
>  		struct file *f;
>  		struct fd_data data;
>  		char name[10 + 1];
>  		unsigned int len;
>  
> -		f = files_lookup_fd_rcu(files, fd);
> +		f = task_lookup_next_fd_rcu(p, &fd);

Ugh...  That makes for a massive cacheline pingpong on task_lock -
instead of grabbing/dropping task_lock() once in the beginning, we do
that for every damn descriptor.

I really don't like this one.  If anything, I would rather have
a helper that would collect a bunch of pairs (fd,mode) into an
array and have lookups batched into it.  With the loop in that
sucker grabbing a reasonable amount into a local array, then
doing proc_fill_cache() for each collected.
Al Viro Dec. 9, 2020, 4:07 a.m. UTC | #2
On Tue, Dec 08, 2020 at 04:38:09PM -0600, Eric W. Biederman wrote:

> Is there any reason we don't simply rcu free the files_struct?
> That would remove the need for the task_lock entirely.

Umm...  Potentially interesting part here is the interaction with
close_files(); currently that can't overlap with any of those
3rd-party accesses to descriptor table, but with your changes
here it's very much possible.

OTOH, it's not like close_files() did much beyond the effects of already
possible close(2) done by one of the threads sharing that sucker.
It's _probably_ safe (at least for proc_readfd_common()), but I'll need
to look at the other places doing that kind of access.  Especially the
BPF foulness...

Oh, and in any case, the trick with repurposing ->rcu of embedded
fdtable deserves a comment.  It's not hard to explain, so...
Al Viro Dec. 9, 2020, 2:23 p.m. UTC | #3
On Tue, Dec 08, 2020 at 10:24:57PM -0600, Eric W. Biederman wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
> 
> > On Tue, Dec 08, 2020 at 04:38:09PM -0600, Eric W. Biederman wrote:
> >
> >> Is there any reason we don't simply rcu free the files_struct?
> >> That would remove the need for the task_lock entirely.
> >
> > Umm...  Potentially interesting part here is the interaction with
> > close_files(); currently that can't overlap with any of those
> > 3rd-party accesses to descriptor table, but with your changes
> > here it's very much possible.
> 
> Good point.
> 
> I was worried there might have been a concern about the overhead
> introduced by always rcu freeing files table.
> 
> > OTOH, it's not like close_files() did much beyond the effects of already
> > possible close(2) done by one of the threads sharing that sucker.
> > It's _probably_ safe (at least for proc_readfd_common()), but I'll need
> > to look at the other places doing that kind of access.  Especially the
> > BPF foulness...

Still digging, unfortunately ;-/

> > Oh, and in any case, the trick with repurposing ->rcu of embedded
> > fdtable deserves a comment.  It's not hard to explain, so...
> 
> Agreed.  Something like fdtable.rcu isn't used so use it so by reusing
> it we keep from wasting memory in files_struct to have a dedicated
> rcu_head.

I'd probably go for something along the lines of "we can avoid adding
a separate rcu_head into files_struct, since rcu_head in struct fdtable
is only used for separately allocated instances, allowing us to repurpose
files_struct->fdtab.rcu for RCU-delayed freeing of files_struct"...
diff mbox series

Patch

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index c1a984f3c4df..72c1525b4b3e 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -217,7 +217,6 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 			      instantiate_t instantiate)
 {
 	struct task_struct *p = get_proc_task(file_inode(file));
-	struct files_struct *files;
 	unsigned int fd;
 
 	if (!p)
@@ -225,22 +224,18 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 
 	if (!dir_emit_dots(file, ctx))
 		goto out;
-	files = get_files_struct(p);
-	if (!files)
-		goto out;
 
 	rcu_read_lock();
-	for (fd = ctx->pos - 2;
-	     fd < files_fdtable(files)->max_fds;
-	     fd++, ctx->pos++) {
+	for (fd = ctx->pos - 2;; fd++) {
 		struct file *f;
 		struct fd_data data;
 		char name[10 + 1];
 		unsigned int len;
 
-		f = files_lookup_fd_rcu(files, fd);
+		f = task_lookup_next_fd_rcu(p, &fd);
+		ctx->pos = fd + 2LL;
 		if (!f)
-			continue;
+			break;
 		data.mode = f->f_mode;
 		rcu_read_unlock();
 		data.fd = fd;
@@ -249,13 +244,11 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 		if (!proc_fill_cache(file, ctx,
 				     name, len, instantiate, p,
 				     &data))
-			goto out_fd_loop;
+			goto out;
 		cond_resched();
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
-out_fd_loop:
-	put_files_struct(files);
 out:
 	put_task_struct(p);
 	return 0;