Message ID | 20121008105437.18668.99905.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Cc'ing Eric since I seem to recall he suggested doing it this way? Seems OK to me, but maybe that swap_root should be in common code? (Or maybe we could use set_fs_root()?) I'm assuming it's up to Trond to take this.--b. On Mon, Oct 08, 2012 at 02:56:32PM +0400, Stanislav Kinsbursky wrote: > Today, there is a problem in connecting of local SUNRPC thansports. These > transports uses UNIX sockets and connection itself is done by rpciod > workqueue. > But all local transports are connecting in rpciod context. I.e. UNIX sockets > lookup is done in context of process file system root. > This works nice until we will try to mount NFS from process with other root - > for example in container. This container can have it's own (nested) root and > rcpbind process, listening on it's own unix sockets. But NFS mount attempt in > this container will register new service (Lockd for example) in global rpcbind > - not containers's one. > This patch solves the problem by switching rpciod kernel thread's file system > root to the right one (stored on transport) while connecting of local > transports. > > Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> > --- > net/sunrpc/xprtsock.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- > 1 files changed, 50 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index aaaadfb..ecbced1 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -37,6 +37,7 @@ > #include <linux/sunrpc/svcsock.h> > #include <linux/sunrpc/xprtsock.h> > #include <linux/file.h> > +#include <linux/fs_struct.h> > #ifdef CONFIG_SUNRPC_BACKCHANNEL > #include <linux/sunrpc/bc_xprt.h> > #endif > @@ -255,6 +256,11 @@ struct sock_xprt { > void (*old_state_change)(struct sock *); > void (*old_write_space)(struct sock *); > void (*old_error_report)(struct sock *); > + > + /* > + * Saved transport creator root. Required for local transports only. > + */ > + struct path root; > }; > > /* > @@ -1876,6 +1882,30 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, > return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); > } > > +/* > + * __xs_local_swap_root - swap current root to tranport's root helper. > + * @new_root - path to set to current fs->root. > + * @old_root - optinal paoinet to save current root to > + * > + * This routine is requieed to connecting unix sockets to absolute path in > + * proper root environment. > + * Note: no path_get() will be called. I.e. caller have to hold reference to > + * new_root. > + */ > +static void __xs_local_swap_root(struct path *new_root, struct path *old_root) > +{ > + struct fs_struct *fs = current->fs; > + > + spin_lock(&fs->lock); > + write_seqcount_begin(&fs->seq); > + if (old_root) > + *old_root = fs->root; > + fs->root = *new_root; > + write_seqcount_end(&fs->seq); > + spin_unlock(&fs->lock); > +} > + > + > /** > * xs_local_setup_socket - create AF_LOCAL socket, connect to a local endpoint > * @xprt: RPC transport to connect > @@ -1891,6 +1921,7 @@ static void xs_local_setup_socket(struct work_struct *work) > struct rpc_xprt *xprt = &transport->xprt; > struct socket *sock; > int status = -EIO; > + struct path root; > > current->flags |= PF_FSTRANS; > > @@ -1907,7 +1938,10 @@ static void xs_local_setup_socket(struct work_struct *work) > dprintk("RPC: worker connecting xprt %p via AF_LOCAL to %s\n", > xprt, xprt->address_strings[RPC_DISPLAY_ADDR]); > > + __xs_local_swap_root(&transport->root, &root); > status = xs_local_finish_connecting(xprt, sock); > + __xs_local_swap_root(&root, NULL); > + > switch (status) { > case 0: > dprintk("RPC: xprt %p connected to %s\n", > @@ -2256,6 +2290,18 @@ static void xs_connect(struct rpc_task *task) > } > } > > +static void xs_local_destroy(struct rpc_xprt *xprt) > +{ > + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); > + struct path root = transport->root; > + > + dprintk("RPC: xs_local_destroy xprt %p\n", xprt); > + > + xs_destroy(xprt); > + > + path_put(&root); > +} > + > /** > * xs_local_print_stats - display AF_LOCAL socket-specifc stats > * @xprt: rpc_xprt struct containing statistics > @@ -2475,7 +2521,7 @@ static struct rpc_xprt_ops xs_local_ops = { > .send_request = xs_local_send_request, > .set_retrans_timeout = xprt_set_retrans_timeout_def, > .close = xs_close, > - .destroy = xs_destroy, > + .destroy = xs_local_destroy, > .print_stats = xs_local_print_stats, > }; > > @@ -2654,8 +2700,10 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args) > dprintk("RPC: set up xprt to %s via AF_LOCAL\n", > xprt->address_strings[RPC_DISPLAY_ADDR]); > > - if (try_module_get(THIS_MODULE)) > + if (try_module_get(THIS_MODULE)) { > + get_fs_root(current->fs, &transport->root); > return xprt; > + } > ret = ERR_PTR(-EINVAL); > out_err: > xprt_free(xprt); > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: > Cc'ing Eric since I seem to recall he suggested doing it this way? > > Seems OK to me, but maybe that swap_root should be in common code? (Or > maybe we could use set_fs_root()?) > > I'm assuming it's up to Trond to take this.--b. I'm reluctant to do that at this time since the original proposal was precisely that of export set_fs_root() and using it around the AF_LOCAL socket bind. That proposal was NACKed by Al Viro. If Al is OK with the idea of us creating a private version of set_fs_root, then I'd like to see an official Acked-by: that we can append to this commit. Cheers Trond > On Mon, Oct 08, 2012 at 02:56:32PM +0400, Stanislav Kinsbursky wrote: > > Today, there is a problem in connecting of local SUNRPC thansports. These > > transports uses UNIX sockets and connection itself is done by rpciod > > workqueue. > > But all local transports are connecting in rpciod context. I.e. UNIX sockets > > lookup is done in context of process file system root. > > This works nice until we will try to mount NFS from process with other root - > > for example in container. This container can have it's own (nested) root and > > rcpbind process, listening on it's own unix sockets. But NFS mount attempt in > > this container will register new service (Lockd for example) in global rpcbind > > - not containers's one. > > This patch solves the problem by switching rpciod kernel thread's file system > > root to the right one (stored on transport) while connecting of local > > transports. > > > > Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> > > --- > > net/sunrpc/xprtsock.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- > > 1 files changed, 50 insertions(+), 2 deletions(-) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index aaaadfb..ecbced1 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -37,6 +37,7 @@ > > #include <linux/sunrpc/svcsock.h> > > #include <linux/sunrpc/xprtsock.h> > > #include <linux/file.h> > > +#include <linux/fs_struct.h> > > #ifdef CONFIG_SUNRPC_BACKCHANNEL > > #include <linux/sunrpc/bc_xprt.h> > > #endif > > @@ -255,6 +256,11 @@ struct sock_xprt { > > void (*old_state_change)(struct sock *); > > void (*old_write_space)(struct sock *); > > void (*old_error_report)(struct sock *); > > + > > + /* > > + * Saved transport creator root. Required for local transports only. > > + */ > > + struct path root; > > }; > > > > /* > > @@ -1876,6 +1882,30 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, > > return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); > > } > > > > +/* > > + * __xs_local_swap_root - swap current root to tranport's root helper. > > + * @new_root - path to set to current fs->root. > > + * @old_root - optinal paoinet to save current root to > > + * > > + * This routine is requieed to connecting unix sockets to absolute path in > > + * proper root environment. > > + * Note: no path_get() will be called. I.e. caller have to hold reference to > > + * new_root. > > + */ > > +static void __xs_local_swap_root(struct path *new_root, struct path *old_root) > > +{ > > + struct fs_struct *fs = current->fs; > > + > > + spin_lock(&fs->lock); > > + write_seqcount_begin(&fs->seq); > > + if (old_root) > > + *old_root = fs->root; > > + fs->root = *new_root; > > + write_seqcount_end(&fs->seq); > > + spin_unlock(&fs->lock); > > +} > > + > > + > > /** > > * xs_local_setup_socket - create AF_LOCAL socket, connect to a local endpoint > > * @xprt: RPC transport to connect > > @@ -1891,6 +1921,7 @@ static void xs_local_setup_socket(struct work_struct *work) > > struct rpc_xprt *xprt = &transport->xprt; > > struct socket *sock; > > int status = -EIO; > > + struct path root; > > > > current->flags |= PF_FSTRANS; > > > > @@ -1907,7 +1938,10 @@ static void xs_local_setup_socket(struct work_struct *work) > > dprintk("RPC: worker connecting xprt %p via AF_LOCAL to %s\n", > > xprt, xprt->address_strings[RPC_DISPLAY_ADDR]); > > > > + __xs_local_swap_root(&transport->root, &root); > > status = xs_local_finish_connecting(xprt, sock); > > + __xs_local_swap_root(&root, NULL); > > + > > switch (status) { > > case 0: > > dprintk("RPC: xprt %p connected to %s\n", > > @@ -2256,6 +2290,18 @@ static void xs_connect(struct rpc_task *task) > > } > > } > > > > +static void xs_local_destroy(struct rpc_xprt *xprt) > > +{ > > + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); > > + struct path root = transport->root; > > + > > + dprintk("RPC: xs_local_destroy xprt %p\n", xprt); > > + > > + xs_destroy(xprt); > > + > > + path_put(&root); > > +} > > + > > /** > > * xs_local_print_stats - display AF_LOCAL socket-specifc stats > > * @xprt: rpc_xprt struct containing statistics > > @@ -2475,7 +2521,7 @@ static struct rpc_xprt_ops xs_local_ops = { > > .send_request = xs_local_send_request, > > .set_retrans_timeout = xprt_set_retrans_timeout_def, > > .close = xs_close, > > - .destroy = xs_destroy, > > + .destroy = xs_local_destroy, > > .print_stats = xs_local_print_stats, > > }; > > > > @@ -2654,8 +2700,10 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args) > > dprintk("RPC: set up xprt to %s via AF_LOCAL\n", > > xprt->address_strings[RPC_DISPLAY_ADDR]); > > > > - if (try_module_get(THIS_MODULE)) > > + if (try_module_get(THIS_MODULE)) { > > + get_fs_root(current->fs, &transport->root); > > return xprt; > > + } > > ret = ERR_PTR(-EINVAL); > > out_err: > > xprt_free(xprt); > > -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
"Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >> Cc'ing Eric since I seem to recall he suggested doing it this way? Yes. On second look setting fs->root won't work. We need to change fs. The problem is that by default all kernel threads share fs so changing fs->root will have non-local consequences. I very much believe we want if at all possible to perform a local modification. Changing fs isn't all that different from what devtmpfs is doing. >> Seems OK to me, but maybe that swap_root should be in common code? (Or >> maybe we could use set_fs_root()?) >> >> I'm assuming it's up to Trond to take this.--b. > > I'm reluctant to do that at this time since the original proposal was > precisely that of export set_fs_root() and using it around the AF_LOCAL > socket bind. That proposal was NACKed by Al Viro. > If Al is OK with the idea of us creating a private version of > set_fs_root, then I'd like to see an official Acked-by: that we can > append to this commit. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: > "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: > > > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: > >> Cc'ing Eric since I seem to recall he suggested doing it this way? > > Yes. On second look setting fs->root won't work. We need to change fs. > The problem is that by default all kernel threads share fs so changing > fs->root will have non-local consequences. Oh, huh. And we can't "unshare" it somehow? Or, previously you suggested: - introduce sockaddr_fd that can be applied to AF_UNIX sockets, and teach unix_bind and unix_connect how to deal with a second type of sockaddr, AT_FD: struct sockaddr_fd { short fd_family; short pad; int fd; } - introduce sockaddr_unix_at that takes a directory file descriptor as well as a unix path, and teach unix_bind and unix_connect to deal with a second sockaddr type, AF_UNIX_AT: struct sockaddr_unix_at { short family; short pad; int dfd; char path[102]; } Any other options? > I very much believe we want if at all possible to perform a local > modification. > > Changing fs isn't all that different from what devtmpfs is doing. Sorry, I don't know much about devtmpfs, are you suggesting it as a model? What exactly should we look at? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"J. Bruce Fields" <bfields@fieldses.org> writes: > On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: >> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: >> >> > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >> >> Cc'ing Eric since I seem to recall he suggested doing it this way? >> >> Yes. On second look setting fs->root won't work. We need to change fs. >> The problem is that by default all kernel threads share fs so changing >> fs->root will have non-local consequences. > > Oh, huh. And we can't "unshare" it somehow? I don't fully understand how nfs uses kernel threads and work queues. My general understanding is work queues reuse their kernel threads between different users. So it is mostly a don't pollute your environment thing. If there was a dedicated kernel thread for each environment this would be trivial. What I was suggesting here is changing task->fs instead of task->fs.root. That should just require task_lock(). > Or, previously you suggested: > > - introduce sockaddr_fd that can be applied to AF_UNIX sockets, > and teach unix_bind and unix_connect how to deal with a second > type of sockaddr, AT_FD: > struct sockaddr_fd { short fd_family; short pad; int fd; } > > - introduce sockaddr_unix_at that takes a directory file > descriptor as well as a unix path, and teach unix_bind and > unix_connect to deal with a second sockaddr type, AF_UNIX_AT: > struct sockaddr_unix_at { short family; short pad; int dfd; char path[102]; } > > Any other options? I am still half hoping we don't have to change the userspace API/ABI. There is sanity checking on that path that no one seems interested in to solve this problem. This is a weird issue as we are dealing with both the vfs and the networking stack. Fundamentally we need to change task->fs.root or we need to capitialize on the openat functionality in the kernel, so that we don't create mountains of special cases to support this. I think swapping task->fs instead of task->fs.root is effecitely the same complexity. >> I very much believe we want if at all possible to perform a local >> modification. >> >> Changing fs isn't all that different from what devtmpfs is doing. > > Sorry, I don't know much about devtmpfs, are you suggesting it as a > model? What exactly should we look at? Roughly all I meant was that devtmpsfsd is a kernel thread that runs with an unshared fs struct. Although I admit devtmpfsd is for all practical purposes a userspace daemon that just happens to run in kernel space. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 09, 2012 at 03:47:42PM -0700, Eric W. Biederman wrote: > "J. Bruce Fields" <bfields@fieldses.org> writes: > > > On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: > >> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: > >> > >> > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: > >> >> Cc'ing Eric since I seem to recall he suggested doing it this way? > >> > >> Yes. On second look setting fs->root won't work. We need to change fs. > >> The problem is that by default all kernel threads share fs so changing > >> fs->root will have non-local consequences. > > > > Oh, huh. And we can't "unshare" it somehow? > > I don't fully understand how nfs uses kernel threads and work queues. > My general understanding is work queues reuse their kernel threads > between different users. So it is mostly a don't pollute your > environment thing. If there was a dedicated kernel thread for each > environment this would be trivial. > > What I was suggesting here is changing task->fs instead of > task->fs.root. That should just require task_lock(). Oh, OK, got it--if that works, great. > > Sorry, I don't know much about devtmpfs, are you suggesting it as a > > model? What exactly should we look at? > > Roughly all I meant was that devtmpsfsd is a kernel thread that runs > with an unshared fs struct. Although I admit devtmpfsd is for all > practical purposes a userspace daemon that just happens to run in kernel > space. Thanks for the explanation. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ebiederm@xmission.com (Eric W. Biederman) writes: > "J. Bruce Fields" <bfields@fieldses.org> writes: > >> On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: >>> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: >>> >>> > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >>> >> Cc'ing Eric since I seem to recall he suggested doing it this way? >>> >>> Yes. On second look setting fs->root won't work. We need to change fs. >>> The problem is that by default all kernel threads share fs so changing >>> fs->root will have non-local consequences. >> >> Oh, huh. And we can't "unshare" it somehow? > > I don't fully understand how nfs uses kernel threads and work queues. > My general understanding is work queues reuse their kernel threads > between different users. So it is mostly a don't pollute your > environment thing. If there was a dedicated kernel thread for each > environment this would be trivial. > > What I was suggesting here is changing task->fs instead of > task->fs.root. That should just require task_lock(). > >> Or, previously you suggested: >> >> - introduce sockaddr_fd that can be applied to AF_UNIX sockets, >> and teach unix_bind and unix_connect how to deal with a second >> type of sockaddr, AT_FD: >> struct sockaddr_fd { short fd_family; short pad; int fd; } >> >> - introduce sockaddr_unix_at that takes a directory file >> descriptor as well as a unix path, and teach unix_bind and >> unix_connect to deal with a second sockaddr type, AF_UNIX_AT: >> struct sockaddr_unix_at { short family; short pad; int dfd; char path[102]; } >> >> Any other options? > > I am still half hoping we don't have to change the userspace API/ABI. > There is sanity checking on that path that no one seems interested in to > solve this problem. There is a good option if we are up to userspace ABI extensions. Implement open(2) on unix domain sockets. Where open(2) would essentially equal connect(2) on unix domain sockets. With an open(2) implementation we could use file_open_path and the implementation should be pretty straight forward and maintainable. So implementing open(2) looks like a good alternative implementation route. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
10.10.2012 02:47, Eric W. Biederman ?????: > "J. Bruce Fields" <bfields@fieldses.org> writes: > >> On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: >>> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: >>> >>>> On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >>>>> Cc'ing Eric since I seem to recall he suggested doing it this way? >>> Yes. On second look setting fs->root won't work. We need to change fs. >>> The problem is that by default all kernel threads share fs so changing >>> fs->root will have non-local consequences. >> Oh, huh. And we can't "unshare" it somehow? > I don't fully understand how nfs uses kernel threads and work queues. > My general understanding is work queues reuse their kernel threads > between different users. So it is mostly a don't pollute your > environment thing. If there was a dedicated kernel thread for each > environment this would be trivial. One kernel thread per environment is exactly what we are trying to avoid if possible. And the reason why we don't want to do this is that it's really looks like overkill. > What I was suggesting here is changing task->fs instead of > task->fs.root. That should just require task_lock(). > >> Or, previously you suggested: >> >> - introduce sockaddr_fd that can be applied to AF_UNIX sockets, >> and teach unix_bind and unix_connect how to deal with a second >> type of sockaddr, AT_FD: >> struct sockaddr_fd { short fd_family; short pad; int fd; } >> >> - introduce sockaddr_unix_at that takes a directory file >> descriptor as well as a unix path, and teach unix_bind and >> unix_connect to deal with a second sockaddr type, AF_UNIX_AT: >> struct sockaddr_unix_at { short family; short pad; int dfd; char path[102]; } >> >> Any other options? > I am still half hoping we don't have to change the userspace API/ABI. > There is sanity checking on that path that no one seems interested in to > solve this problem. > > This is a weird issue as we are dealing with both the vfs and the > networking stack. Fundamentally we need to change task->fs.root or > we need to capitialize on the openat functionality in the kernel, so > that we don't create mountains of special cases to support this. > > I think swapping task->fs instead of task->fs.root is effecitely the > same complexity. Thanks for the idea. And for mentioning, that kernel threads shares fs struct. I missed it. > >>> I very much believe we want if at all possible to perform a local >>> modification. >>> >>> Changing fs isn't all that different from what devtmpfs is doing. >> Sorry, I don't know much about devtmpfs, are you suggesting it as a >> model? What exactly should we look at? > Roughly all I meant was that devtmpsfsd is a kernel thread that runs > with an unshared fs struct. Although I admit devtmpfsd is for all > practical purposes a userspace daemon that just happens to run in kernel > space. > > Eric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
10.10.2012 06:00, Eric W. Biederman ?????: > ebiederm@xmission.com (Eric W. Biederman) writes: > >> "J. Bruce Fields" <bfields@fieldses.org> writes: >> >>> On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: >>>> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: >>>> >>>>> On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >>>>>> Cc'ing Eric since I seem to recall he suggested doing it this way? >>>> Yes. On second look setting fs->root won't work. We need to change fs. >>>> The problem is that by default all kernel threads share fs so changing >>>> fs->root will have non-local consequences. >>> Oh, huh. And we can't "unshare" it somehow? >> I don't fully understand how nfs uses kernel threads and work queues. >> My general understanding is work queues reuse their kernel threads >> between different users. So it is mostly a don't pollute your >> environment thing. If there was a dedicated kernel thread for each >> environment this would be trivial. >> >> What I was suggesting here is changing task->fs instead of >> task->fs.root. That should just require task_lock(). >> >>> Or, previously you suggested: >>> >>> - introduce sockaddr_fd that can be applied to AF_UNIX sockets, >>> and teach unix_bind and unix_connect how to deal with a second >>> type of sockaddr, AT_FD: >>> struct sockaddr_fd { short fd_family; short pad; int fd; } >>> >>> - introduce sockaddr_unix_at that takes a directory file >>> descriptor as well as a unix path, and teach unix_bind and >>> unix_connect to deal with a second sockaddr type, AF_UNIX_AT: >>> struct sockaddr_unix_at { short family; short pad; int dfd; char path[102]; } >>> >>> Any other options? >> I am still half hoping we don't have to change the userspace API/ABI. >> There is sanity checking on that path that no one seems interested in to >> solve this problem. > There is a good option if we are up to userspace ABI extensions. > > Implement open(2) on unix domain sockets. Where open(2) would > essentially equal connect(2) on unix domain sockets. > > With an open(2) implementation we could use file_open_path and the > implementation should be pretty straight forward and maintainable. > So implementing open(2) looks like a good alternative implementation > route. This requires patching of vfs layer as well. I don't want to say, that the idea is not good. But it requires much more time to implement and test. And this patch addresses the problem, which exist already and would be great to fix it as soon as possible. So, probably, implementing open for unix sockets is the next and more generic step. Thanks. > Eric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
10.10.2012 05:23, J. Bruce Fields ?????: > On Tue, Oct 09, 2012 at 03:47:42PM -0700, Eric W. Biederman wrote: >> "J. Bruce Fields" <bfields@fieldses.org> writes: >> >>> On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: >>>> "Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: >>>> >>>>> On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: >>>>>> Cc'ing Eric since I seem to recall he suggested doing it this way? >>>> >>>> Yes. On second look setting fs->root won't work. We need to change fs. >>>> The problem is that by default all kernel threads share fs so changing >>>> fs->root will have non-local consequences. >>> >>> Oh, huh. And we can't "unshare" it somehow? >> >> I don't fully understand how nfs uses kernel threads and work queues. >> My general understanding is work queues reuse their kernel threads >> between different users. So it is mostly a don't pollute your >> environment thing. If there was a dedicated kernel thread for each >> environment this would be trivial. >> >> What I was suggesting here is changing task->fs instead of >> task->fs.root. That should just require task_lock(). > > Oh, OK, got it--if that works, great. > The main problem with swapping fs struct is actually the same as in root swapping. I.e. routines for copy fs_struct are not exported. It could be done on place, but I don't think, that Al Viro would support such implementation. Trond? >>> Sorry, I don't know much about devtmpfs, are you suggesting it as a >>> model? What exactly should we look at? >> >> Roughly all I meant was that devtmpsfsd is a kernel thread that runs >> with an unshared fs struct. Although I admit devtmpfsd is for all >> practical purposes a userspace daemon that just happens to run in kernel >> space. > > Thanks for the explanation. > > --b. >
On Wed, Oct 10, 2012 at 02:32:28PM +0400, Stanislav Kinsbursky wrote: > 10.10.2012 05:23, J. Bruce Fields ?????: > >On Tue, Oct 09, 2012 at 03:47:42PM -0700, Eric W. Biederman wrote: > >>"J. Bruce Fields" <bfields@fieldses.org> writes: > >> > >>>On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote: > >>>>"Myklebust, Trond" <Trond.Myklebust@netapp.com> writes: > >>>> > >>>>>On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote: > >>>>>>Cc'ing Eric since I seem to recall he suggested doing it this way? > >>>> > >>>>Yes. On second look setting fs->root won't work. We need to change fs. > >>>>The problem is that by default all kernel threads share fs so changing > >>>>fs->root will have non-local consequences. > >>> > >>>Oh, huh. And we can't "unshare" it somehow? > >> > >>I don't fully understand how nfs uses kernel threads and work queues. > >>My general understanding is work queues reuse their kernel threads > >>between different users. So it is mostly a don't pollute your > >>environment thing. If there was a dedicated kernel thread for each > >>environment this would be trivial. > >> > >>What I was suggesting here is changing task->fs instead of > >>task->fs.root. That should just require task_lock(). > > > >Oh, OK, got it--if that works, great. > > > > The main problem with swapping fs struct is actually the same as in > root swapping. I.e. routines for copy fs_struct are not exported. > It could be done on place, but I don't think, that Al Viro would > support such implementation. > Trond? It seems like we got stalled here.... Could you go ahead and try a patch, and see what people think? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
09.10.2012 23:35, J. Bruce Fields ?????: > Cc'ing Eric since I seem to recall he suggested doing it this way? > > Seems OK to me, but maybe that swap_root should be in common code? (Or > maybe we could use set_fs_root()?) > This patch is not good since, as Eric mentioned, all kernel threads share same fs struct. We can swap whole fs struct. Or we can unshare fs struct (unshare_fs_struct() is exported) and swap root in this case. But this approach is to close to set_fs_root() logic, which is not exported and seems there are some valid reasons for it. > I'm assuming it's up to Trond to take this.--b. > > On Mon, Oct 08, 2012 at 02:56:32PM +0400, Stanislav Kinsbursky wrote: >> Today, there is a problem in connecting of local SUNRPC thansports. These >> transports uses UNIX sockets and connection itself is done by rpciod >> workqueue. >> But all local transports are connecting in rpciod context. I.e. UNIX sockets >> lookup is done in context of process file system root. >> This works nice until we will try to mount NFS from process with other root - >> for example in container. This container can have it's own (nested) root and >> rcpbind process, listening on it's own unix sockets. But NFS mount attempt in >> this container will register new service (Lockd for example) in global rpcbind >> - not containers's one. >> This patch solves the problem by switching rpciod kernel thread's file system >> root to the right one (stored on transport) while connecting of local >> transports. >> >> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> >> --- >> net/sunrpc/xprtsock.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 files changed, 50 insertions(+), 2 deletions(-) >> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index aaaadfb..ecbced1 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -37,6 +37,7 @@ >> #include <linux/sunrpc/svcsock.h> >> #include <linux/sunrpc/xprtsock.h> >> #include <linux/file.h> >> +#include <linux/fs_struct.h> >> #ifdef CONFIG_SUNRPC_BACKCHANNEL >> #include <linux/sunrpc/bc_xprt.h> >> #endif >> @@ -255,6 +256,11 @@ struct sock_xprt { >> void (*old_state_change)(struct sock *); >> void (*old_write_space)(struct sock *); >> void (*old_error_report)(struct sock *); >> + >> + /* >> + * Saved transport creator root. Required for local transports only. >> + */ >> + struct path root; >> }; >> >> /* >> @@ -1876,6 +1882,30 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, >> return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); >> } >> >> +/* >> + * __xs_local_swap_root - swap current root to tranport's root helper. >> + * @new_root - path to set to current fs->root. >> + * @old_root - optinal paoinet to save current root to >> + * >> + * This routine is requieed to connecting unix sockets to absolute path in >> + * proper root environment. >> + * Note: no path_get() will be called. I.e. caller have to hold reference to >> + * new_root. >> + */ >> +static void __xs_local_swap_root(struct path *new_root, struct path *old_root) >> +{ >> + struct fs_struct *fs = current->fs; >> + >> + spin_lock(&fs->lock); >> + write_seqcount_begin(&fs->seq); >> + if (old_root) >> + *old_root = fs->root; >> + fs->root = *new_root; >> + write_seqcount_end(&fs->seq); >> + spin_unlock(&fs->lock); >> +} >> + >> + >> /** >> * xs_local_setup_socket - create AF_LOCAL socket, connect to a local endpoint >> * @xprt: RPC transport to connect >> @@ -1891,6 +1921,7 @@ static void xs_local_setup_socket(struct work_struct *work) >> struct rpc_xprt *xprt = &transport->xprt; >> struct socket *sock; >> int status = -EIO; >> + struct path root; >> >> current->flags |= PF_FSTRANS; >> >> @@ -1907,7 +1938,10 @@ static void xs_local_setup_socket(struct work_struct *work) >> dprintk("RPC: worker connecting xprt %p via AF_LOCAL to %s\n", >> xprt, xprt->address_strings[RPC_DISPLAY_ADDR]); >> >> + __xs_local_swap_root(&transport->root, &root); >> status = xs_local_finish_connecting(xprt, sock); >> + __xs_local_swap_root(&root, NULL); >> + >> switch (status) { >> case 0: >> dprintk("RPC: xprt %p connected to %s\n", >> @@ -2256,6 +2290,18 @@ static void xs_connect(struct rpc_task *task) >> } >> } >> >> +static void xs_local_destroy(struct rpc_xprt *xprt) >> +{ >> + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); >> + struct path root = transport->root; >> + >> + dprintk("RPC: xs_local_destroy xprt %p\n", xprt); >> + >> + xs_destroy(xprt); >> + >> + path_put(&root); >> +} >> + >> /** >> * xs_local_print_stats - display AF_LOCAL socket-specifc stats >> * @xprt: rpc_xprt struct containing statistics >> @@ -2475,7 +2521,7 @@ static struct rpc_xprt_ops xs_local_ops = { >> .send_request = xs_local_send_request, >> .set_retrans_timeout = xprt_set_retrans_timeout_def, >> .close = xs_close, >> - .destroy = xs_destroy, >> + .destroy = xs_local_destroy, >> .print_stats = xs_local_print_stats, >> }; >> >> @@ -2654,8 +2700,10 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args) >> dprintk("RPC: set up xprt to %s via AF_LOCAL\n", >> xprt->address_strings[RPC_DISPLAY_ADDR]); >> >> - if (try_module_get(THIS_MODULE)) >> + if (try_module_get(THIS_MODULE)) { >> + get_fs_root(current->fs, &transport->root); >> return xprt; >> + } >> ret = ERR_PTR(-EINVAL); >> out_err: >> xprt_free(xprt); >>
On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote: > 09.10.2012 23:35, J. Bruce Fields ?????: > >Cc'ing Eric since I seem to recall he suggested doing it this way? > > > >Seems OK to me, but maybe that swap_root should be in common code? (Or > >maybe we could use set_fs_root()?) > > > > This patch is not good since, as Eric mentioned, all kernel threads > share same fs struct. > We can swap whole fs struct. Or we can unshare fs struct > (unshare_fs_struct() is exported) and swap root in this case. > But this approach is to close to set_fs_root() logic, which is not > exported and seems there are some valid reasons for it. What are those reasons? Googling found one previous thread: http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 There Trond requests an ACK from Al or Cristoph for the export, but I don't see either an ACK or any objection. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
06.11.2012 16:06, J. Bruce Fields ?????: > On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote: >> 09.10.2012 23:35, J. Bruce Fields ?????: >>> Cc'ing Eric since I seem to recall he suggested doing it this way? >>> >>> Seems OK to me, but maybe that swap_root should be in common code? (Or >>> maybe we could use set_fs_root()?) >>> >> >> This patch is not good since, as Eric mentioned, all kernel threads >> share same fs struct. >> We can swap whole fs struct. Or we can unshare fs struct >> (unshare_fs_struct() is exported) and swap root in this case. >> But this approach is to close to set_fs_root() logic, which is not >> exported and seems there are some valid reasons for it. > > What are those reasons? > I don't know them. Trond told, that Al doesn't like the idea of set_fs_root() exporting. > Googling found one previous thread: > > http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > There Trond requests an ACK from Al or Cristoph for the export, but I > don't see either an ACK or any objection. > Cristoph told me on LSF something line "No ... way", when I asked him about set_fs_root() exporting. But I had no opportunity to ask why. > --b. >
On Tue, Nov 06, 2012 at 07:06:42AM -0500, J. Bruce Fields wrote: > On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote: > > 09.10.2012 23:35, J. Bruce Fields ??????????: > > >Cc'ing Eric since I seem to recall he suggested doing it this way? > > > > > >Seems OK to me, but maybe that swap_root should be in common code? (Or > > >maybe we could use set_fs_root()?) > > > > > > > This patch is not good since, as Eric mentioned, all kernel threads > > share same fs struct. > > We can swap whole fs struct. Or we can unshare fs struct > > (unshare_fs_struct() is exported) and swap root in this case. > > But this approach is to close to set_fs_root() logic, which is not > > exported and seems there are some valid reasons for it. > > What are those reasons? > > Googling found one previous thread: > > http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > There Trond requests an ACK from Al or Cristoph for the export, but I > don't see either an ACK or any objection. I really don't think messing with current->fs for workqueue worker threads is a good idea, as the worker threads are shared by different workqueues and thus this can easily cause havoc for entirely unrelated subsystems. To do this properly you'll need to avoid current->fs references in the sunrpc code. And just in case it wasn't clear: the hack in this iteration is even worse than the original. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 04:11:11PM +0400, Stanislav Kinsbursky wrote: > 06.11.2012 16:06, J. Bruce Fields ?????: > >On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote: > >>09.10.2012 23:35, J. Bruce Fields ?????: > >>>Cc'ing Eric since I seem to recall he suggested doing it this way? > >>> > >>>Seems OK to me, but maybe that swap_root should be in common code? (Or > >>>maybe we could use set_fs_root()?) > >>> > >> > >>This patch is not good since, as Eric mentioned, all kernel threads > >>share same fs struct. > >>We can swap whole fs struct. Or we can unshare fs struct > >>(unshare_fs_struct() is exported) and swap root in this case. > >>But this approach is to close to set_fs_root() logic, which is not > >>exported and seems there are some valid reasons for it. > > > >What are those reasons? > > > > I don't know them. > Trond told, that Al doesn't like the idea of set_fs_root() exporting. > > >Googling found one previous thread: > > > > http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > > >There Trond requests an ACK from Al or Cristoph for the export, but I > >don't see either an ACK or any objection. > > > > Cristoph told me on LSF something line "No ... way", when I asked > him about set_fs_root() exporting. > But I had no opportunity to ask why. So, Al or Christoph: NFS calls up to rpcbind from the kernel using AF_LOCAL, so it looks up a path when it connects. We'd like to export set_fs_root() so we can temporarily put our task in the right namespace around the connect call. Is that reasonable? I just want to get a reason on record so we stop going in circles here. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 07:40:35AM -0500, Christoph Hellwig wrote: > On Tue, Nov 06, 2012 at 07:06:42AM -0500, J. Bruce Fields wrote: > > On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote: > > > 09.10.2012 23:35, J. Bruce Fields ??????????: > > > >Cc'ing Eric since I seem to recall he suggested doing it this way? > > > > > > > >Seems OK to me, but maybe that swap_root should be in common code? (Or > > > >maybe we could use set_fs_root()?) > > > > > > > > > > This patch is not good since, as Eric mentioned, all kernel threads > > > share same fs struct. > > > We can swap whole fs struct. Or we can unshare fs struct > > > (unshare_fs_struct() is exported) and swap root in this case. > > > But this approach is to close to set_fs_root() logic, which is not > > > exported and seems there are some valid reasons for it. > > > > What are those reasons? > > > > Googling found one previous thread: > > > > http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > > > There Trond requests an ACK from Al or Cristoph for the export, but I > > don't see either an ACK or any objection. > > I really don't think messing with current->fs for workqueue worker > threads is a good idea, as the worker threads are shared by different > workqueues and thus this can easily cause havoc for entirely unrelated > subsystems. So you're worried that a bug in the nfs code could modify the root and then not restore it? --b. > > To do this properly you'll need to avoid current->fs references in the > sunrpc code. > > And just in case it wasn't clear: the hack in this iteration is even > worse than the original. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > So you're worried that a bug in the nfs code could modify the root and > then not restore it? At least the link you pointed to earlier never sets it back. Instead of messing with it I'd rather have the sunrpc code use vfs_path_lookup and not care about current->fs->root at all. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: > On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > > So you're worried that a bug in the nfs code could modify the root and > > then not restore it? > > At least the link you pointed to earlier never sets it back. This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 + get_fs_root(current->fs, &root); + set_fs_root(current->fs, &transport->root); + status = xs_local_finish_connecting(xprt, sock); + + set_fs_root(current->fs, &root); + path_put(&root); > Instead > of messing with it I'd rather have the sunrpc code use vfs_path_lookup > and not care about current->fs->root at all. The annoyance is that the lookup happens somewhere lower down in the networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd need some new (internal) API. We'd likely be the only user of that new API. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: > On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: > > On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > > > So you're worried that a bug in the nfs code could modify the root and > > > then not restore it? > > > > At least the link you pointed to earlier never sets it back. > > This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > + get_fs_root(current->fs, &root); > + set_fs_root(current->fs, &transport->root); > + > status = xs_local_finish_connecting(xprt, sock); > + > + set_fs_root(current->fs, &root); > + path_put(&root); > > > Instead > > of messing with it I'd rather have the sunrpc code use vfs_path_lookup > > and not care about current->fs->root at all. > > The annoyance is that the lookup happens somewhere lower down in the > networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd > need some new (internal) API. We'd likely be the only user of that new > API. So, if the only drawback is really just the risk of introducing a bug that leaves the fs_root changed--the above seems simple enough for that not to be a great risk, right? Is there any other hazard to doing this that people can think of? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
07.11.2012 22:33, J. Bruce Fields ?????: > On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: >> On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: >>> On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: >>>> So you're worried that a bug in the nfs code could modify the root and >>>> then not restore it? >>> >>> At least the link you pointed to earlier never sets it back. >> >> This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 >> >> + get_fs_root(current->fs, &root); >> + set_fs_root(current->fs, &transport->root); >> + >> status = xs_local_finish_connecting(xprt, sock); >> + >> + set_fs_root(current->fs, &root); >> + path_put(&root); >> >>> Instead >>> of messing with it I'd rather have the sunrpc code use vfs_path_lookup >>> and not care about current->fs->root at all. >> >> The annoyance is that the lookup happens somewhere lower down in the >> networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd >> need some new (internal) API. We'd likely be the only user of that new >> API. > > So, if the only drawback is really just the risk of introducing a bug > that leaves the fs_root changed--the above seems simple enough for that > not to be a great risk, right? > If we unshare rpciod fs struct (which is exported already), then we won't affect other kthreads by root swapping. But would be great to hear Trond's opinion about this approach. Trond, could you tell us your feeling about all this? > Is there any other hazard to doing this that people can think of? > > --b. >
On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote: > 07.11.2012 22:33, J. Bruce Fields ?????: > >On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: > >>On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: > >>>On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > >>>>So you're worried that a bug in the nfs code could modify the root and > >>>>then not restore it? > >>> > >>>At least the link you pointed to earlier never sets it back. > >> > >>This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > >> > >> + get_fs_root(current->fs, &root); > >> + set_fs_root(current->fs, &transport->root); > >> + > >> status = xs_local_finish_connecting(xprt, sock); > >> + > >> + set_fs_root(current->fs, &root); > >> + path_put(&root); > >> > >>>Instead > >>>of messing with it I'd rather have the sunrpc code use vfs_path_lookup > >>>and not care about current->fs->root at all. > >> > >>The annoyance is that the lookup happens somewhere lower down in the > >>networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd > >>need some new (internal) API. We'd likely be the only user of that new > >>API. > > > >So, if the only drawback is really just the risk of introducing a bug > >that leaves the fs_root changed--the above seems simple enough for that > >not to be a great risk, right? > > > > If we unshare rpciod fs struct (which is exported already), then we I'm not sure what you mean by that. Do workqueues actually have their own dedicated set of associated tasks? I thought all workqueues shared a common pool of tasks these days. > won't affect other kthreads by root swapping. > But would be great to hear Trond's opinion about this approach. > > Trond, could you tell us your feeling about all this? I think it's often easier to get people to comment on an actual patch, and this one would be quite short, so try that.... --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-11-14 at 16:01 -0500, J. Bruce Fields wrote: > On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote: > > 07.11.2012 22:33, J. Bruce Fields ?????: > > >On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: > > >>On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: > > >>>On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > > >>>>So you're worried that a bug in the nfs code could modify the root and > > >>>>then not restore it? > > >>> > > >>>At least the link you pointed to earlier never sets it back. > > >> > > >>This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > >> > > >> + get_fs_root(current->fs, &root); > > >> + set_fs_root(current->fs, &transport->root); > > >> + > > >> status = xs_local_finish_connecting(xprt, sock); > > >> + > > >> + set_fs_root(current->fs, &root); > > >> + path_put(&root); > > >> > > >>>Instead > > >>>of messing with it I'd rather have the sunrpc code use vfs_path_lookup > > >>>and not care about current->fs->root at all. > > >> > > >>The annoyance is that the lookup happens somewhere lower down in the > > >>networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd > > >>need some new (internal) API. We'd likely be the only user of that new > > >>API. > > > > > >So, if the only drawback is really just the risk of introducing a bug > > >that leaves the fs_root changed--the above seems simple enough for that > > >not to be a great risk, right? > > > > > > > If we unshare rpciod fs struct (which is exported already), then we > > I'm not sure what you mean by that. Do workqueues actually have their > own dedicated set of associated tasks? I thought all workqueues shared > a common pool of tasks these days. > > > won't affect other kthreads by root swapping. > > But would be great to hear Trond's opinion about this approach. > > > > Trond, could you tell us your feeling about all this? > > I think it's often easier to get people to comment on an actual patch, > and this one would be quite short, so try that.... unshare() would break expectations for other users of workqueue threads unless you "reshare()" afterwards. Either way that's going to be seriously ugly. OK, let's look at this again. Do we ever use AF_LOCAL connections for anything other than synchronous rpc calls to the local rpcbind daemon in order to register/unregister new services? If not, then let's just move the AF_LOCAL connection back into the process context and out of rpciod. That implies that the process needs to be privileged, but it needs privileges in order to start RPC daemons anyway. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
On Wed, Nov 14, 2012 at 09:36:33PM +0000, Myklebust, Trond wrote: > On Wed, 2012-11-14 at 16:01 -0500, J. Bruce Fields wrote: > > On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote: > > > 07.11.2012 22:33, J. Bruce Fields ?????: > > > >On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: > > > >>On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: > > > >>>On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: > > > >>>>So you're worried that a bug in the nfs code could modify the root and > > > >>>>then not restore it? > > > >>> > > > >>>At least the link you pointed to earlier never sets it back. > > > >> > > > >>This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 > > > >> > > > >> + get_fs_root(current->fs, &root); > > > >> + set_fs_root(current->fs, &transport->root); > > > >> + > > > >> status = xs_local_finish_connecting(xprt, sock); > > > >> + > > > >> + set_fs_root(current->fs, &root); > > > >> + path_put(&root); > > > >> > > > >>>Instead > > > >>>of messing with it I'd rather have the sunrpc code use vfs_path_lookup > > > >>>and not care about current->fs->root at all. > > > >> > > > >>The annoyance is that the lookup happens somewhere lower down in the > > > >>networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd > > > >>need some new (internal) API. We'd likely be the only user of that new > > > >>API. > > > > > > > >So, if the only drawback is really just the risk of introducing a bug > > > >that leaves the fs_root changed--the above seems simple enough for that > > > >not to be a great risk, right? > > > > > > > > > > If we unshare rpciod fs struct (which is exported already), then we > > > > I'm not sure what you mean by that. Do workqueues actually have their > > own dedicated set of associated tasks? I thought all workqueues shared > > a common pool of tasks these days. > > > > > won't affect other kthreads by root swapping. > > > But would be great to hear Trond's opinion about this approach. > > > > > > Trond, could you tell us your feeling about all this? > > > > I think it's often easier to get people to comment on an actual patch, > > and this one would be quite short, so try that.... > > unshare() would break expectations for other users of workqueue threads > unless you "reshare()" afterwards. Either way that's going to be > seriously ugly. > > OK, let's look at this again. Do we ever use AF_LOCAL connections for > anything other than synchronous rpc calls to the local rpcbind daemon in > order to register/unregister new services? Simo's patches use them for upcalls to svcgssd. Those will always be done from server threads. > If not, then let's just move > the AF_LOCAL connection back into the process context and out of rpciod. Remind me how this helps? --b. > > That implies that the process needs to be privileged, but it needs > privileges in order to start RPC daemons anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gV2VkLCAyMDEyLTExLTE0IGF0IDE2OjQyIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6 DQo+IE9uIFdlZCwgTm92IDE0LCAyMDEyIGF0IDA5OjM2OjMzUE0gKzAwMDAsIE15a2xlYnVzdCwg VHJvbmQgd3JvdGU6DQo+ID4gT24gV2VkLCAyMDEyLTExLTE0IGF0IDE2OjAxIC0wNTAwLCBKLiBC cnVjZSBGaWVsZHMgd3JvdGU6DQo+ID4gPiBPbiBNb24sIE5vdiAxMiwgMjAxMiBhdCAxMjozNzo1 NFBNICswNDAwLCBTdGFuaXNsYXYgS2luc2J1cnNreSB3cm90ZToNCj4gPiA+ID4gMDcuMTEuMjAx MiAyMjozMywgSi4gQnJ1Y2UgRmllbGRzINC/0LjRiNC10YI6DQo+ID4gPiA+ID5PbiBUdWUsIE5v diAwNiwgMjAxMiBhdCAwODozNjowNUFNIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+ ID4gPiA+ID4+T24gVHVlLCBOb3YgMDYsIDIwMTIgYXQgMDg6MTA6MThBTSAtMDUwMCwgQ2hyaXN0 b3BoIEhlbGx3aWcgd3JvdGU6DQo+ID4gPiA+ID4+Pk9uIFR1ZSwgTm92IDA2LCAyMDEyIGF0IDA4 OjA3OjA2QU0gLTA1MDAsIEouIEJydWNlIEZpZWxkcyB3cm90ZToNCj4gPiA+ID4gPj4+PlNvIHlv dSdyZSB3b3JyaWVkIHRoYXQgYSBidWcgaW4gdGhlIG5mcyBjb2RlIGNvdWxkIG1vZGlmeSB0aGUg cm9vdCBhbmQNCj4gPiA+ID4gPj4+PnRoZW4gbm90IHJlc3RvcmUgaXQ/DQo+ID4gPiA+ID4+Pg0K PiA+ID4gPiA+Pj5BdCBsZWFzdCB0aGUgbGluayB5b3UgcG9pbnRlZCB0byBlYXJsaWVyIG5ldmVy IHNldHMgaXQgYmFjay4NCj4gPiA+ID4gPj4NCj4gPiA+ID4gPj5UaGlzPyBodHRwOi8vdGhyZWFk LmdtYW5lLm9yZy9nbWFuZS5saW51eC5rZXJuZWwvMTI1OTk4Ni9mb2N1cz00NzY4Nw0KPiA+ID4g PiA+Pg0KPiA+ID4gPiA+PgkrCWdldF9mc19yb290KGN1cnJlbnQtPmZzLCAmcm9vdCk7DQo+ID4g PiA+ID4+CSsJc2V0X2ZzX3Jvb3QoY3VycmVudC0+ZnMsICZ0cmFuc3BvcnQtPnJvb3QpOw0KPiA+ ID4gPiA+PgkrDQo+ID4gPiA+ID4+CSAJc3RhdHVzID0geHNfbG9jYWxfZmluaXNoX2Nvbm5lY3Rp bmcoeHBydCwgc29jayk7DQo+ID4gPiA+ID4+CSsNCj4gPiA+ID4gPj4JKwlzZXRfZnNfcm9vdChj dXJyZW50LT5mcywgJnJvb3QpOw0KPiA+ID4gPiA+PgkrCXBhdGhfcHV0KCZyb290KTsNCj4gPiA+ ID4gPj4NCj4gPiA+ID4gPj4+SW5zdGVhZA0KPiA+ID4gPiA+Pj5vZiBtZXNzaW5nIHdpdGggaXQg SSdkIHJhdGhlciBoYXZlIHRoZSBzdW5ycGMgY29kZSB1c2UgdmZzX3BhdGhfbG9va3VwDQo+ID4g PiA+ID4+PmFuZCBub3QgY2FyZSBhYm91dCBjdXJyZW50LT5mcy0+cm9vdCBhdCBhbGwuDQo+ID4g PiA+ID4+DQo+ID4gPiA+ID4+VGhlIGFubm95YW5jZSBpcyB0aGF0IHRoZSBsb29rdXAgaGFwcGVu cyBzb21ld2hlcmUgbG93ZXIgZG93biBpbiB0aGUNCj4gPiA+ID4gPj5uZXR3b3JraW5nIGNvZGUg KG5ldC91bml4L2FmX3VuaXguYzp1bml4X2ZpbmRfb3RoZXIsIEkgdGhpbmspLiAgU28gd2UnZA0K PiA+ID4gPiA+Pm5lZWQgc29tZSBuZXcgKGludGVybmFsKSBBUEkuICBXZSdkIGxpa2VseSBiZSB0 aGUgb25seSB1c2VyIG9mIHRoYXQgbmV3DQo+ID4gPiA+ID4+QVBJLg0KPiA+ID4gPiA+DQo+ID4g PiA+ID5TbywgaWYgdGhlIG9ubHkgZHJhd2JhY2sgaXMgcmVhbGx5IGp1c3QgdGhlIHJpc2sgb2Yg aW50cm9kdWNpbmcgYSBidWcNCj4gPiA+ID4gPnRoYXQgbGVhdmVzIHRoZSBmc19yb290IGNoYW5n ZWQtLXRoZSBhYm92ZSBzZWVtcyBzaW1wbGUgZW5vdWdoIGZvciB0aGF0DQo+ID4gPiA+ID5ub3Qg dG8gYmUgYSBncmVhdCByaXNrLCByaWdodD8NCj4gPiA+ID4gPg0KPiA+ID4gPiANCj4gPiA+ID4g SWYgd2UgdW5zaGFyZSBycGNpb2QgZnMgc3RydWN0ICh3aGljaCBpcyBleHBvcnRlZCBhbHJlYWR5 KSwgdGhlbiB3ZQ0KPiA+ID4gDQo+ID4gPiBJJ20gbm90IHN1cmUgd2hhdCB5b3UgbWVhbiBieSB0 aGF0LiAgRG8gd29ya3F1ZXVlcyBhY3R1YWxseSBoYXZlIHRoZWlyDQo+ID4gPiBvd24gZGVkaWNh dGVkIHNldCBvZiBhc3NvY2lhdGVkIHRhc2tzPyAgSSB0aG91Z2h0IGFsbCB3b3JrcXVldWVzIHNo YXJlZA0KPiA+ID4gYSBjb21tb24gcG9vbCBvZiB0YXNrcyB0aGVzZSBkYXlzLg0KPiA+ID4gDQo+ ID4gPiA+IHdvbid0IGFmZmVjdCBvdGhlciBrdGhyZWFkcyBieSByb290IHN3YXBwaW5nLg0KPiA+ ID4gPiBCdXQgd291bGQgYmUgZ3JlYXQgdG8gaGVhciBUcm9uZCdzIG9waW5pb24gYWJvdXQgdGhp cyBhcHByb2FjaC4NCj4gPiA+ID4gDQo+ID4gPiA+IFRyb25kLCBjb3VsZCB5b3UgdGVsbCB1cyB5 b3VyIGZlZWxpbmcgYWJvdXQgYWxsIHRoaXM/DQo+ID4gPiANCj4gPiA+IEkgdGhpbmsgaXQncyBv ZnRlbiBlYXNpZXIgdG8gZ2V0IHBlb3BsZSB0byBjb21tZW50IG9uIGFuIGFjdHVhbCBwYXRjaCwN Cj4gPiA+IGFuZCB0aGlzIG9uZSB3b3VsZCBiZSBxdWl0ZSBzaG9ydCwgc28gdHJ5IHRoYXQuLi4u DQo+ID4gDQo+ID4gdW5zaGFyZSgpIHdvdWxkIGJyZWFrIGV4cGVjdGF0aW9ucyBmb3Igb3RoZXIg dXNlcnMgb2Ygd29ya3F1ZXVlIHRocmVhZHMNCj4gPiB1bmxlc3MgeW91ICJyZXNoYXJlKCkiIGFm dGVyd2FyZHMuIEVpdGhlciB3YXkgdGhhdCdzIGdvaW5nIHRvIGJlDQo+ID4gc2VyaW91c2x5IHVn bHkuDQo+ID4gDQo+ID4gT0ssIGxldCdzIGxvb2sgYXQgdGhpcyBhZ2Fpbi4gRG8gd2UgZXZlciB1 c2UgQUZfTE9DQUwgY29ubmVjdGlvbnMgZm9yDQo+ID4gYW55dGhpbmcgb3RoZXIgdGhhbiBzeW5j aHJvbm91cyBycGMgY2FsbHMgdG8gdGhlIGxvY2FsIHJwY2JpbmQgZGFlbW9uIGluDQo+ID4gb3Jk ZXIgdG8gcmVnaXN0ZXIvdW5yZWdpc3RlciBuZXcgc2VydmljZXM/DQo+IA0KPiBTaW1vJ3MgcGF0 Y2hlcyB1c2UgdGhlbSBmb3IgdXBjYWxscyB0byBzdmNnc3NkLiAgVGhvc2Ugd2lsbCBhbHdheXMg YmUNCj4gZG9uZSBmcm9tIHNlcnZlciB0aHJlYWRzLg0KDQpBbnkgcmVhc29uIHdoeSB5b3UgY2Fu J3Qgc2V0IHRoYXQgdXAgd2hlbiB5b3Ugc3RhcnQgbmZzZD8NCg0KPiA+IElmIG5vdCwgdGhlbiBs ZXQncyBqdXN0IG1vdmUNCj4gPiB0aGUgQUZfTE9DQUwgY29ubmVjdGlvbiBiYWNrIGludG8gdGhl IHByb2Nlc3MgY29udGV4dCBhbmQgb3V0IG9mIHJwY2lvZC4NCj4gDQo+IFJlbWluZCBtZSBob3cg dGhpcyBoZWxwcz8NCg0KcnBjaW9kIHNoYXJlcyB0aGUgJ2luaXQnIHByb2Nlc3MgbmV0IG5hbWVz cGFjZSBhbmQgY2hyb290IHByb3BlcnRpZXMuDQpJZiwgaG93ZXZlciB5b3UgY2FsbCBiaW5kKCkg ZnJvbSB0aGUgKGNvbnRhaW5lcmlzZWQpIHByb2Nlc3MgdGhhdCB3YXMNCnVzZWQgdG8gc3RhcnQg bmZzZCwgdGhlbiB5b3Ugd2lsbCBiZSB1c2luZyBmaWxlc3lzdGVtIHJvb3QgKGFuZCBuZXQNCm5h bWVzcGFjZSkgb2YgdGhhdCBjb250YWluZXIuDQoNCj4gLS1iLg0KPiANCj4gPiANCj4gPiBUaGF0 IGltcGxpZXMgdGhhdCB0aGUgcHJvY2VzcyBuZWVkcyB0byBiZSBwcml2aWxlZ2VkLCBidXQgaXQg bmVlZHMNCj4gPiBwcml2aWxlZ2VzIGluIG9yZGVyIHRvIHN0YXJ0IFJQQyBkYWVtb25zIGFueXdh eS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0K DQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 14, 2012 at 09:51:33PM +0000, Myklebust, Trond wrote: > On Wed, 2012-11-14 at 16:42 -0500, J. Bruce Fields wrote: > > Simo's patches use them for upcalls to svcgssd. Those will always be > > done from server threads. > > Any reason why you can't set that up when you start nfsd? Oh, right, I was thinking of the upcalls themselves--right, the connect we should be able to do on server start, I agree. > > > > If not, then let's just move > > > the AF_LOCAL connection back into the process context and out of rpciod. > > > > Remind me how this helps? > > rpciod shares the 'init' process net namespace and chroot properties. > If, however you call bind() from the (containerised) process that was > used to start nfsd, then you will be using filesystem root (and net > namespace) of that container. Got it. --b. > > > --b. > > > > > > > > That implies that the process needs to be privileged, but it needs > > > privileges in order to start RPC daemons anyway. > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"J. Bruce Fields" <bfields@fieldses.org> writes: > On Wed, Nov 14, 2012 at 09:51:33PM +0000, Myklebust, Trond wrote: >> On Wed, 2012-11-14 at 16:42 -0500, J. Bruce Fields wrote: >> > Simo's patches use them for upcalls to svcgssd. Those will always be >> > done from server threads. >> >> Any reason why you can't set that up when you start nfsd? > > Oh, right, I was thinking of the upcalls themselves--right, the connect > we should be able to do on server start, I agree. > >> >> > > If not, then let's just move >> > > the AF_LOCAL connection back into the process context and out of rpciod. >> > >> > Remind me how this helps? >> >> rpciod shares the 'init' process net namespace and chroot properties. >> If, however you call bind() from the (containerised) process that was >> used to start nfsd, then you will be using filesystem root (and net >> namespace) of that container. > > Got it. If you can move the connect and bind into the server start that does sound like a very good and maintainable solution. I suspect it might even be a smidge better for error handling. Is there ever a reason to reconnect one of these sockets? Eric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
15.11.2012 01:01, J. Bruce Fields ?????: > On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote: >> 07.11.2012 22:33, J. Bruce Fields ?????: >>> On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote: >>>> On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote: >>>>> On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote: >>>>>> So you're worried that a bug in the nfs code could modify the root and >>>>>> then not restore it? >>>>> >>>>> At least the link you pointed to earlier never sets it back. >>>> >>>> This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687 >>>> >>>> + get_fs_root(current->fs, &root); >>>> + set_fs_root(current->fs, &transport->root); >>>> + >>>> status = xs_local_finish_connecting(xprt, sock); >>>> + >>>> + set_fs_root(current->fs, &root); >>>> + path_put(&root); >>>> >>>>> Instead >>>>> of messing with it I'd rather have the sunrpc code use vfs_path_lookup >>>>> and not care about current->fs->root at all. >>>> >>>> The annoyance is that the lookup happens somewhere lower down in the >>>> networking code (net/unix/af_unix.c:unix_find_other, I think). So we'd >>>> need some new (internal) API. We'd likely be the only user of that new >>>> API. >>> >>> So, if the only drawback is really just the risk of introducing a bug >>> that leaves the fs_root changed--the above seems simple enough for that >>> not to be a great risk, right? >>> >> >> If we unshare rpciod fs struct (which is exported already), then we > > I'm not sure what you mean by that. Do workqueues actually have their > own dedicated set of associated tasks? I thought all workqueues shared > a common pool of tasks these days. > Any kernel thread is cloned in kthreadd context with CLONE_FS flag. I.e. all of them shares same fs struct, and changing fs->root in one of kthreads will affect all others. That's why either fs struct have to be unshared to swap fs->root or fs->cwd, or the whole fs struct have to swapped.
T24gV2VkLCAyMDEyLTExLTE0IGF0IDIyOjE0IC0wODAwLCBFcmljIFcuIEJpZWRlcm1hbiB3cm90 ZToNCj4gIkouIEJydWNlIEZpZWxkcyIgPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cml0ZXM6DQo+ IA0KPiA+IE9uIFdlZCwgTm92IDE0LCAyMDEyIGF0IDA5OjUxOjMzUE0gKzAwMDAsIE15a2xlYnVz dCwgVHJvbmQgd3JvdGU6DQo+ID4+IE9uIFdlZCwgMjAxMi0xMS0xNCBhdCAxNjo0MiAtMDUwMCwg Si4gQnJ1Y2UgRmllbGRzIHdyb3RlOg0KPiA+PiA+IFNpbW8ncyBwYXRjaGVzIHVzZSB0aGVtIGZv ciB1cGNhbGxzIHRvIHN2Y2dzc2QuICBUaG9zZSB3aWxsIGFsd2F5cyBiZQ0KPiA+PiA+IGRvbmUg ZnJvbSBzZXJ2ZXIgdGhyZWFkcy4NCj4gPj4gDQo+ID4+IEFueSByZWFzb24gd2h5IHlvdSBjYW4n dCBzZXQgdGhhdCB1cCB3aGVuIHlvdSBzdGFydCBuZnNkPw0KPiA+DQo+ID4gT2gsIHJpZ2h0LCBJ IHdhcyB0aGlua2luZyBvZiB0aGUgdXBjYWxscyB0aGVtc2VsdmVzLS1yaWdodCwgdGhlIGNvbm5l Y3QNCj4gPiB3ZSBzaG91bGQgYmUgYWJsZSB0byBkbyBvbiBzZXJ2ZXIgc3RhcnQsIEkgYWdyZWUu DQo+ID4NCj4gPj4gDQo+ID4+ID4gPiBJZiBub3QsIHRoZW4gbGV0J3MganVzdCBtb3ZlDQo+ID4+ ID4gPiB0aGUgQUZfTE9DQUwgY29ubmVjdGlvbiBiYWNrIGludG8gdGhlIHByb2Nlc3MgY29udGV4 dCBhbmQgb3V0IG9mIHJwY2lvZC4NCj4gPj4gPiANCj4gPj4gPiBSZW1pbmQgbWUgaG93IHRoaXMg aGVscHM/DQo+ID4+IA0KPiA+PiBycGNpb2Qgc2hhcmVzIHRoZSAnaW5pdCcgcHJvY2VzcyBuZXQg bmFtZXNwYWNlIGFuZCBjaHJvb3QgcHJvcGVydGllcy4NCj4gPj4gSWYsIGhvd2V2ZXIgeW91IGNh bGwgYmluZCgpIGZyb20gdGhlIChjb250YWluZXJpc2VkKSBwcm9jZXNzIHRoYXQgd2FzDQo+ID4+ IHVzZWQgdG8gc3RhcnQgbmZzZCwgdGhlbiB5b3Ugd2lsbCBiZSB1c2luZyBmaWxlc3lzdGVtIHJv b3QgKGFuZCBuZXQNCj4gPj4gbmFtZXNwYWNlKSBvZiB0aGF0IGNvbnRhaW5lci4NCj4gPg0KPiA+ IEdvdCBpdC4NCj4gDQo+IElmIHlvdSBjYW4gbW92ZSB0aGUgY29ubmVjdCBhbmQgYmluZCBpbnRv IHRoZSBzZXJ2ZXIgc3RhcnQgdGhhdCBkb2VzDQo+IHNvdW5kIGxpa2UgYSB2ZXJ5IGdvb2QgYW5k IG1haW50YWluYWJsZSBzb2x1dGlvbi4gIEkgc3VzcGVjdCBpdCBtaWdodA0KPiBldmVuIGJlIGEg c21pZGdlIGJldHRlciBmb3IgZXJyb3IgaGFuZGxpbmcuDQo+IA0KPiBJcyB0aGVyZSBldmVyIGEg cmVhc29uIHRvIHJlY29ubmVjdCBvbmUgb2YgdGhlc2Ugc29ja2V0cz8NCg0KTm90IGZvciB0aGUg cnBjYmluZCBjYXNlLCBob3dldmVyIHlvdSBjYW4gZWFzaWx5IGdldCBpbnRvIGEgc2l0dWF0aW9u DQp3aGVyZSB0aGUgdXNlciByZXN0YXJ0cyB0aGUgZ3NzIGRhZW1vbi4gVGhlIGdvb2QgbmV3cyBp cyB0aGF0IHRoZSBnc3MNCnVwY2FsbCBjb2RlIHRoYXQgdXNlcyBBRl9MT0NBTCBoYXNuJ3QgYmVl biBtZXJnZWQgdXBzdHJlYW0geWV0LCBzbyB0aGF0DQpwYXJ0aWN1bGFyIGludGVyZmFjZSBpcyBu b3QgeWV0IGxvY2tlZCBpbiBzdG9uZS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t DQp3d3cubmV0YXBwLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 15, 2012 at 01:34:16PM +0000, Myklebust, Trond wrote: > On Wed, 2012-11-14 at 22:14 -0800, Eric W. Biederman wrote: > > "J. Bruce Fields" <bfields@fieldses.org> writes: > > > > > On Wed, Nov 14, 2012 at 09:51:33PM +0000, Myklebust, Trond wrote: > > >> On Wed, 2012-11-14 at 16:42 -0500, J. Bruce Fields wrote: > > >> > Simo's patches use them for upcalls to svcgssd. Those will always be > > >> > done from server threads. > > >> > > >> Any reason why you can't set that up when you start nfsd? > > > > > > Oh, right, I was thinking of the upcalls themselves--right, the connect > > > we should be able to do on server start, I agree. > > > > > >> > > >> > > If not, then let's just move > > >> > > the AF_LOCAL connection back into the process context and out of rpciod. > > >> > > > >> > Remind me how this helps? > > >> > > >> rpciod shares the 'init' process net namespace and chroot properties. > > >> If, however you call bind() from the (containerised) process that was > > >> used to start nfsd, then you will be using filesystem root (and net > > >> namespace) of that container. > > > > > > Got it. > > > > If you can move the connect and bind into the server start that does > > sound like a very good and maintainable solution. I suspect it might > > even be a smidge better for error handling. > > > > Is there ever a reason to reconnect one of these sockets? > > Not for the rpcbind case, however you can easily get into a situation > where the user restarts the gss daemon. The good news is that the gss > upcall code that uses AF_LOCAL hasn't been merged upstream yet, so that > particular interface is not yet locked in stone. I think we do want to be able to allow the daemon to be restarted. I guess we can have it call into the rpc server code when it starts up and the server could do the connect then? We need some way for userspace to tell the server that the new upcall is supported anyway. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index aaaadfb..ecbced1 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -37,6 +37,7 @@ #include <linux/sunrpc/svcsock.h> #include <linux/sunrpc/xprtsock.h> #include <linux/file.h> +#include <linux/fs_struct.h> #ifdef CONFIG_SUNRPC_BACKCHANNEL #include <linux/sunrpc/bc_xprt.h> #endif @@ -255,6 +256,11 @@ struct sock_xprt { void (*old_state_change)(struct sock *); void (*old_write_space)(struct sock *); void (*old_error_report)(struct sock *); + + /* + * Saved transport creator root. Required for local transports only. + */ + struct path root; }; /* @@ -1876,6 +1882,30 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); } +/* + * __xs_local_swap_root - swap current root to tranport's root helper. + * @new_root - path to set to current fs->root. + * @old_root - optinal paoinet to save current root to + * + * This routine is requieed to connecting unix sockets to absolute path in + * proper root environment. + * Note: no path_get() will be called. I.e. caller have to hold reference to + * new_root. + */ +static void __xs_local_swap_root(struct path *new_root, struct path *old_root) +{ + struct fs_struct *fs = current->fs; + + spin_lock(&fs->lock); + write_seqcount_begin(&fs->seq); + if (old_root) + *old_root = fs->root; + fs->root = *new_root; + write_seqcount_end(&fs->seq); + spin_unlock(&fs->lock); +} + + /** * xs_local_setup_socket - create AF_LOCAL socket, connect to a local endpoint * @xprt: RPC transport to connect @@ -1891,6 +1921,7 @@ static void xs_local_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + struct path root; current->flags |= PF_FSTRANS; @@ -1907,7 +1938,10 @@ static void xs_local_setup_socket(struct work_struct *work) dprintk("RPC: worker connecting xprt %p via AF_LOCAL to %s\n", xprt, xprt->address_strings[RPC_DISPLAY_ADDR]); + __xs_local_swap_root(&transport->root, &root); status = xs_local_finish_connecting(xprt, sock); + __xs_local_swap_root(&root, NULL); + switch (status) { case 0: dprintk("RPC: xprt %p connected to %s\n", @@ -2256,6 +2290,18 @@ static void xs_connect(struct rpc_task *task) } } +static void xs_local_destroy(struct rpc_xprt *xprt) +{ + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct path root = transport->root; + + dprintk("RPC: xs_local_destroy xprt %p\n", xprt); + + xs_destroy(xprt); + + path_put(&root); +} + /** * xs_local_print_stats - display AF_LOCAL socket-specifc stats * @xprt: rpc_xprt struct containing statistics @@ -2475,7 +2521,7 @@ static struct rpc_xprt_ops xs_local_ops = { .send_request = xs_local_send_request, .set_retrans_timeout = xprt_set_retrans_timeout_def, .close = xs_close, - .destroy = xs_destroy, + .destroy = xs_local_destroy, .print_stats = xs_local_print_stats, }; @@ -2654,8 +2700,10 @@ static struct rpc_xprt *xs_setup_local(struct xprt_create *args) dprintk("RPC: set up xprt to %s via AF_LOCAL\n", xprt->address_strings[RPC_DISPLAY_ADDR]); - if (try_module_get(THIS_MODULE)) + if (try_module_get(THIS_MODULE)) { + get_fs_root(current->fs, &transport->root); return xprt; + } ret = ERR_PTR(-EINVAL); out_err: xprt_free(xprt);
Today, there is a problem in connecting of local SUNRPC thansports. These transports uses UNIX sockets and connection itself is done by rpciod workqueue. But all local transports are connecting in rpciod context. I.e. UNIX sockets lookup is done in context of process file system root. This works nice until we will try to mount NFS from process with other root - for example in container. This container can have it's own (nested) root and rcpbind process, listening on it's own unix sockets. But NFS mount attempt in this container will register new service (Lockd for example) in global rpcbind - not containers's one. This patch solves the problem by switching rpciod kernel thread's file system root to the right one (stored on transport) while connecting of local transports. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> --- net/sunrpc/xprtsock.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 50 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html