diff mbox

[v6,1/5] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read

Message ID 20180221202908.17258-1-ebiederm@xmission.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman Feb. 21, 2018, 8:29 p.m. UTC
At the point of fuse_dev_do_read the user space process that initiated the
action on the fuse filesystem may no longer exist.  The process have been
killed or may have fired an asynchronous request and exited.

If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid,
fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
the pid has been reallocated it can return practically any pid.  Any pid is
possible as the pid allocator allocates pid numbers in different pid
namespaces independently.

The only way to make translation in fuse_dev_do_read reliable is to call
get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
fuse_dev_do_read.  That reference counting in other contexts has been shown
to bounce cache lines between processors and in general be slow.  So that is
not desirable.

The only known user of running the fuse server in a different pid namespace
from the filesystem does not care what the pids are in the fuse messages
so removing this code should not matter.

Getting the translation to a server running outside of the pid namespace
of a container can still be achieved by playing setns games at mount time.
It is also possible to add an option to pass a pid namespace into the fuse
filesystem at mount time.

Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/fuse/dev.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Miklos Szeredi Feb. 22, 2018, 10:13 a.m. UTC | #1
On Wed, Feb 21, 2018 at 9:29 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> At the point of fuse_dev_do_read the user space process that initiated the
> action on the fuse filesystem may no longer exist.  The process have been
> killed or may have fired an asynchronous request and exited.
>
> If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid,
> fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
> the pid has been reallocated it can return practically any pid.  Any pid is
> possible as the pid allocator allocates pid numbers in different pid
> namespaces independently.
>
> The only way to make translation in fuse_dev_do_read reliable is to call
> get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
> fuse_dev_do_read.  That reference counting in other contexts has been shown
> to bounce cache lines between processors and in general be slow.  So that is
> not desirable.
>
> The only known user of running the fuse server in a different pid namespace
> from the filesystem does not care what the pids are in the fuse messages
> so removing this code should not matter.

Shouldn't we at least zero out the pid in that case?

Thanks,
Miklos


>
> Getting the translation to a server running outside of the pid namespace
> of a container can still be achieved by playing setns games at mount time.
> It is also possible to add an option to pass a pid namespace into the fuse
> filesystem at mount time.
>
> Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/fuse/dev.c | 6 ------
>  1 file changed, 6 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 5d06384c2cae..0fb58f364fa6 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1260,12 +1260,6 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>         in = &req->in;
>         reqsize = in->h.len;
>
> -       if (task_active_pid_ns(current) != fc->pid_ns) {
> -               rcu_read_lock();
> -               in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns));
> -               rcu_read_unlock();
> -       }
> -
>         /* If request is too large, reply with an error and restart the read */
>         if (nbytes < reqsize) {
>                 req->out.h.error = -EIO;
> --
> 2.14.1
>
Eric W. Biederman Feb. 22, 2018, 7:04 p.m. UTC | #2
Miklos Szeredi <mszeredi@redhat.com> writes:

> On Wed, Feb 21, 2018 at 9:29 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> At the point of fuse_dev_do_read the user space process that initiated the
>> action on the fuse filesystem may no longer exist.  The process have been
>> killed or may have fired an asynchronous request and exited.
>>
>> If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid,
>> fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
>> the pid has been reallocated it can return practically any pid.  Any pid is
>> possible as the pid allocator allocates pid numbers in different pid
>> namespaces independently.
>>
>> The only way to make translation in fuse_dev_do_read reliable is to call
>> get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
>> fuse_dev_do_read.  That reference counting in other contexts has been shown
>> to bounce cache lines between processors and in general be slow.  So that is
>> not desirable.
>>
>> The only known user of running the fuse server in a different pid namespace
>> from the filesystem does not care what the pids are in the fuse messages
>> so removing this code should not matter.
>
> Shouldn't we at least zero out the pid in that case?

This is an explicit case of passing a file descriptor between pid
namespaces.  So I think there are plenty of buyer be ware signs out.
So I don't know if there are any real world advantages of zeroing the
pid.

I can see a case for using the pid namespace of the opener of /dev/fuse
instead of the pid namespace of the mounter of the fuse filesystem.
Although in practice I would be surprised if they were different.

I am very leary about caring during a read operation.  Caring about the
current processes during read/write tends to break caching, is error prone
as the need for this patch demonstrates, and is generally likely to be
slower than not caring.

So yes we can zero the pid.   I don't think it is wise to zero the pid
unless we zero the pid in fuse_req_init_context.

Eric
diff mbox

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 5d06384c2cae..0fb58f364fa6 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1260,12 +1260,6 @@  static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	in = &req->in;
 	reqsize = in->h.len;
 
-	if (task_active_pid_ns(current) != fc->pid_ns) {
-		rcu_read_lock();
-		in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns));
-		rcu_read_unlock();
-	}
-
 	/* If request is too large, reply with an error and restart the read */
 	if (nbytes < reqsize) {
 		req->out.h.error = -EIO;