diff mbox series

[09/17] file: Implement fnext_task

Message ID 20200817220425.9389-9-ebiederm@xmission.com (mailing list archive)
State New, archived
Headers show
Series [01/17] exec: Move unshare_files to fix posix file locking during exec | expand

Commit Message

Eric W. Biederman Aug. 17, 2020, 10:04 p.m. UTC
As a companion to fget_task and fcheck_task implement fnext_task that
will return the struct file for the first file descriptor show number
is equal or greater than the fd argument value, or NULL if there is
no such struct file.

This allows file descriptors of foreign processes to be iterated through
safely, without needed to increment the count on files_struct.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/file.c               | 21 +++++++++++++++++++++
 include/linux/fdtable.h |  1 +
 2 files changed, 22 insertions(+)

Comments

Linus Torvalds Aug. 17, 2020, 11:54 p.m. UTC | #1
I like the series, but I have to say that the extension of the
horrible "fcheck*()" interfaces raises my hackles..

That interface is very very questionable to begin with, and it's easy
to get wrong.

I don't see you getting it wrong - you do seem to hold the RCU read
lock in the places I checked, but it worries me.

I think my worry could be at least partially mitigated with more
comments and docs:

On Mon, Aug 17, 2020 at 3:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> +struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd)

Please please *please* make it clear that because this does *not*
increment any reference counts, you have to be very very careful using
the "struct file" pointer this returns.

I really dislike the naming. The old "fcheck()" naming came from the
fact that at least one original user just mainly checked if the result
was NULL or not. And then others had to be careful to either hold the
file_lock spinlock, or at least the RCU read lock so that the result
didn't go away.

Here, you have "fnext_task()", and it doesn't even have that "check"
warning mark, or any comment about how dangerous it can be to use..

So the rule is that the "struct file" pointer this returns can only be
read under RCU, but the 'fd' it returns has a longer lifetime.

I think your uses are ok, and I think this is an improvement in
general, I just want a bigger "WARNING! Here be dragons!" sign (and
naming could be nicer).

            Linus
Linus Torvalds Aug. 18, 2020, 1:17 a.m. UTC | #2
On Mon, Aug 17, 2020 at 6:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I struggle with the fcheck name as I have not seen or at least not
> registed on the the user that just checks to see if the result is NULL.
> So the name fcheck never made a bit of sense to me.

Yeah, that name is not great. I just don't want to make things even worse.

> I will see if I can come up with some good descriptive comments around
> these functions.  Along with describing what these things are doing I am
> thinking maybe I should put "_rcu" in their names and have a debug check
> that verifies "_rcu" is held.

Yeah, something along the lines of "rcu_lookup_fd_task(tsk,fd)" would
be a *lot* more descriptive than fcheck_task().

And I think "fnext_task()" could be "rcu_lookup_next_fd_task(tsk,fd)".

Yes, those are much longer names, but it's not like you end up typing
them all that often, and I think being descriptive would be worth it.

And "fcheck()" and "fcheck_files()" would be good to rename too along
the same lines.

Something like "rcu_lookup_fd()" and "rcu_lookup_fd_files()" respectively?

I'm obviously trying to go for a "rcu_lookup_fd*()" kind of pattern,
and I'm not married to _that_ particular pattern but I think it would
be better than what we have now.

                       Linus
Christian Brauner Aug. 18, 2020, 11:05 a.m. UTC | #3
On Mon, Aug 17, 2020 at 06:17:35PM -0700, Linus Torvalds wrote:
> On Mon, Aug 17, 2020 at 6:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > I struggle with the fcheck name as I have not seen or at least not
> > registed on the the user that just checks to see if the result is NULL.
> > So the name fcheck never made a bit of sense to me.
> 
> Yeah, that name is not great. I just don't want to make things even worse.
> 
> > I will see if I can come up with some good descriptive comments around
> > these functions.  Along with describing what these things are doing I am
> > thinking maybe I should put "_rcu" in their names and have a debug check
> > that verifies "_rcu" is held.
> 
> Yeah, something along the lines of "rcu_lookup_fd_task(tsk,fd)" would
> be a *lot* more descriptive than fcheck_task().
> 
> And I think "fnext_task()" could be "rcu_lookup_next_fd_task(tsk,fd)".
> 
> Yes, those are much longer names, but it's not like you end up typing
> them all that often, and I think being descriptive would be worth it.
> 
> And "fcheck()" and "fcheck_files()" would be good to rename too along
> the same lines.
> 
> Something like "rcu_lookup_fd()" and "rcu_lookup_fd_files()" respectively?
> 
> I'm obviously trying to go for a "rcu_lookup_fd*()" kind of pattern,
> and I'm not married to _that_ particular pattern but I think it would
> be better than what we have now.

In fs/inode.c and a few other places we have the *_rcu suffix pattern
already so maybe:

fcheck() -> fd_file_rcu() or lookup_fd_rcu()
fcheck_files() -> fd_files_rcu() or lookup_fd_files_rcu()
fnext_task() -> fd_file_from_task_rcu() or lookup_next_fd_from_task_rcu()

rather than as prefix or sm.

Christian
Alexei Starovoitov Aug. 19, 2020, 3:54 p.m. UTC | #4
On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> The bug in the existing code is that bpf_iter does get_file instead
> of get_file_rcu.  Does anyone have any sense of how to add debugging
> to get_file to notice when it is being called in the wrong context?

That bug is already fixed in bpf tree.
See commit cf28f3bbfca0 ("bpf: Use get_file_rcu() instead of
get_file() for task_file iterator")
Linus Torvalds Aug. 19, 2020, 6:32 p.m. UTC | #5
On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> So I sat down and played with it and here is what I wound up with is:
>
> __fcheck_files -> files_lookup_fd_raw
> fcheck_files   -> files_lookup_fd_locked
> fcheck_files   -> files_lookup_fd_rcu
> fcheck         -> lookup_fd_rcu
> ...
> fcheck_task    -> task_lookup_fd_fcu
> fnext_task     -> task_lookup_next_fd_rcu

This certainly looks fine to me. No confusion about what it does. So Ack.

                   Linus
Cyrill Gorcunov Aug. 20, 2020, 9:50 p.m. UTC | #6
On Mon, Aug 17, 2020 at 05:04:17PM -0500, Eric W. Biederman wrote:
> As a companion to fget_task and fcheck_task implement fnext_task that
> will return the struct file for the first file descriptor show number
> is equal or greater than the fd argument value, or NULL if there is
> no such struct file.
> 
> This allows file descriptors of foreign processes to be iterated through
> safely, without needed to increment the count on files_struct.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/file.c               | 21 +++++++++++++++++++++
>  include/linux/fdtable.h |  1 +
>  2 files changed, 22 insertions(+)
> 
> diff --git a/fs/file.c b/fs/file.c
> index 8d4b385055e9..88f9f78869f8 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -876,6 +876,27 @@ struct file *fcheck_task(struct task_struct *task, unsigned int fd)
>  	return file;
>  }
>  
> +struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd)
> +{
> +	/* Must be called with rcu_read_lock held */
> +	struct files_struct *files;
> +	unsigned int fd = *ret_fd;
> +	struct file *file = NULL;
> +
> +	task_lock(task);
> +	files = task->files;
> +	if (files) {
> +		for (; fd < files_fdtable(files)->max_fds; fd++) {
> +			file = fcheck_files(files, fd);
> +			if (file)
> +				break;
> +		}
> +	}
> +	task_unlock(task);
> +	*ret_fd = fd;
> +	return file;
> +}

Eric, if only I'm not missing something obvious you could
escape @fd/@ret_fd operations in case if task->files = NULL,
iow

	if (files) {
		unsigned int fd = *ret_fd;
		for (; fd < files_fdtable(files)->max_fds; fd++) {
			file = fcheck_files(files, fd);
			if (file)
				break;
		}
		*ret_fd = fd;
	}
Alexei Starovoitov Aug. 21, 2020, 4:17 p.m. UTC | #7
On Fri, Aug 21, 2020 at 8:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> The bug in the existing code is that bpf_iter does get_file instead
> >> of get_file_rcu.  Does anyone have any sense of how to add debugging
> >> to get_file to notice when it is being called in the wrong context?
> >
> > That bug is already fixed in bpf tree.
> > See commit cf28f3bbfca0 ("bpf: Use get_file_rcu() instead of
> > get_file() for task_file iterator")
>
> I wished you had based that change on -rc1 instead of some random
> looking place in David's Millers net tree.

random?
It's a well documented process. Please see:
Documentation/bpf/bpf_devel_QA.rst

> I am glad to see that our existing debug checks can catch that
> kind of problem when the code is exercised enough.

They did not. Please see the commit log of the fix.
It was a NULL pointer dereference.

> I am going to pull this change into my tree on top of -rc1 so we won't
> have unnecessary conflicts.  Hopefully this will show up in -rc2 so the
> final version of this patchset can use an easily describable base.

Please do not cherry pick fixes from other trees. You need to wait
until the bpf tree gets merged into net tree and net into Linus's tree.
It's only a couple days away. Hopefully it's there by -rc2,
but I cannot speak for Dave's schedule.
We'll send bpf tree pull-req to Dave today.
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index 8d4b385055e9..88f9f78869f8 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -876,6 +876,27 @@  struct file *fcheck_task(struct task_struct *task, unsigned int fd)
 	return file;
 }
 
+struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd)
+{
+	/* Must be called with rcu_read_lock held */
+	struct files_struct *files;
+	unsigned int fd = *ret_fd;
+	struct file *file = NULL;
+
+	task_lock(task);
+	files = task->files;
+	if (files) {
+		for (; fd < files_fdtable(files)->max_fds; fd++) {
+			file = fcheck_files(files, fd);
+			if (file)
+				break;
+		}
+	}
+	task_unlock(task);
+	*ret_fd = fd;
+	return file;
+}
+
 /*
  * Lightweight file lookup - no refcnt increment if fd table isn't shared.
  *
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index def9debd2ce2..a3a054084f49 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -104,6 +104,7 @@  static inline struct file *fcheck_files(struct files_struct *files, unsigned int
  */
 #define fcheck(fd)	fcheck_files(current->files, fd)
 struct file *fcheck_task(struct task_struct *task, unsigned int fd);
+struct file *fnext_task(struct task_struct *task, unsigned int *fd);
 
 struct task_struct;