mbox series

[0/3] auth_gss: netns refcount leaks when use-gss-proxy==1

Message ID 1556692945-3996-1-git-send-email-wenbinzeng@tencent.com (mailing list archive)
Headers show
Series auth_gss: netns refcount leaks when use-gss-proxy==1 | expand

Message

Wenbin Zeng May 1, 2019, 6:42 a.m. UTC
This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.

The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.

It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.

To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.

Wenbin Zeng (3):
  nsfs: add evict callback into struct proc_ns_operations
  netns: add netns_evict into netns_operations
  auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
    use-gss-proxy==1

 fs/nsfs.c                      |  2 ++
 include/linux/proc_ns.h        |  1 +
 include/net/net_namespace.h    |  1 +
 net/core/net_namespace.c       | 12 ++++++++++++
 net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
 5 files changed, 22 insertions(+), 3 deletions(-)

Comments

J. Bruce Fields May 9, 2019, 8:52 p.m. UTC | #1
Thanks for figuring this out!

I guess I'll take these patches (with the one fix in your response to
Al) through the nfsd tree, unless someone tells me otherwise.  (The
original bug was introduced through nfsd.)

How serious are the consequences of the leak?  I'm wondering if it's
worth a stable cc or not.

--b.

On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> 
> The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> 
> It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> 	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> 
> To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> 
> Wenbin Zeng (3):
>   nsfs: add evict callback into struct proc_ns_operations
>   netns: add netns_evict into netns_operations
>   auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
>     use-gss-proxy==1
> 
>  fs/nsfs.c                      |  2 ++
>  include/linux/proc_ns.h        |  1 +
>  include/net/net_namespace.h    |  1 +
>  net/core/net_namespace.c       | 12 ++++++++++++
>  net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
>  5 files changed, 22 insertions(+), 3 deletions(-)
> 
> -- 
> 1.8.3.1
Wenbin Zeng May 10, 2019, 5:09 a.m. UTC | #2
On Thu, May 09, 2019 at 04:52:18PM -0400, J. Bruce Fields wrote:
> Thanks for figuring this out!
> 
> I guess I'll take these patches (with the one fix in your response to
> Al) through the nfsd tree, unless someone tells me otherwise.  (The
> original bug was introduced through nfsd.)

Thank you, Bruce.
I am submitting v2 with that fix right away.

> 
> How serious are the consequences of the leak?  I'm wondering if it's
> worth a stable cc or not.

Though the leak only happens with _privileged_ docker containers that have
gssproxy service enabled and use-gss-proxy set to 1, the consequences
can be ugly, the killed/stopped containers not only leave struct net
unfreed, also possibly leave behind veth devices linked to the netns, in
environments that containers are frequently killed/stopped, it is quite
ugly.

> 
> --b.
> 
> On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> > This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> > 
> > The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> > 
> > It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> > 	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> > Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> > 
> > To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> > 
> > Wenbin Zeng (3):
> >   nsfs: add evict callback into struct proc_ns_operations
> >   netns: add netns_evict into netns_operations
> >   auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
> >     use-gss-proxy==1
> > 
> >  fs/nsfs.c                      |  2 ++
> >  include/linux/proc_ns.h        |  1 +
> >  include/net/net_namespace.h    |  1 +
> >  net/core/net_namespace.c       | 12 ++++++++++++
> >  net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
> >  5 files changed, 22 insertions(+), 3 deletions(-)
> > 
> > -- 
> > 1.8.3.1