diff mbox series

[v2] SUNRPC: serialize gss upcalls for the same uid

Message ID 20221209003032.3211581-1-h-shimamoto@nec.com (mailing list archive)
State New, archived
Headers show
Series [v2] SUNRPC: serialize gss upcalls for the same uid | expand

Commit Message

SHIMAMOTO HIROSHI(島本 裕志) Dec. 9, 2022, 12:30 a.m. UTC
From: minoura <minoura@valinux.co.jp>

Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for due
to two reasons:

- the order of the msgs in pipe->in_downcall and those in pipe->pipe
  (that is, the order of the upcalls sent to rpc.gssd) might be
  different because pipe->lock is dropped between adding one to each
  list.
- rpc.gssd uses threads to write responses, which means we cannot
  guarantee the order of responses.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
 #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
 #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
 #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
 #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
 #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
 #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
 #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
 #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
 #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.

Actual scheduing of that process might be after rpc.gssd processes the
next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.

This patch introduces wait and retry at gss_add_msg() to serialize
when requests with the same uid but different service comes.

Fixes: Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service")
Cc: minoura makoto <minoura@valinux.co.jp>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Signed-off-by: minoura makoto <minoura@valinux.co.jp>
Tested-by: minoura makoto <minoura@valinux.co.jp>
---
v2: use gss_release_msg instead of refcount_dec in fatal_signal_pending case
---
 net/sunrpc/auth_gss/auth_gss.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

Comments

Trond Myklebust Dec. 10, 2022, 7:04 p.m. UTC | #1
> On Dec 8, 2022, at 19:30, Hiroshi Shimamoto <h-shimamoto@nec.com> wrote:
> 
> From: minoura <minoura@valinux.co.jp>
> 
> Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
> but different gss service") introduced `auth` argument to
> __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
> since it (and auth->service) was not (yet) determined.
> 
> When multiple upcalls with the same uid and different service are
> ongoing, it could happen that __gss_find_upcall(), which returns the
> first match found in the pipe->in_downcall list, could not find the
> correct gss_msg corresponding to the downcall we are looking for due
> to two reasons:
> 
> - the order of the msgs in pipe->in_downcall and those in pipe->pipe
>  (that is, the order of the upcalls sent to rpc.gssd) might be
>  different because pipe->lock is dropped between adding one to each
>  list.
> - rpc.gssd uses threads to write responses, which means we cannot
>  guarantee the order of responses.
> 
> We could see mount.nfs process hung in D state with multiple mount.nfs
> are executed in parallel.  The call trace below is of CentOS 7.9
> kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
> elrepo kernel-ml-6.0.7-1.el7.
> 
> PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
> #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
> #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
> #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
> #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
> #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
> #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
> #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
> #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
> #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
> #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
> 
> The scenario is like this. Let's say there are two upcalls for
> services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
> 
> When rpc.gssd reads pipe to get the upcall msg corresponding to
> service B from pipe->pipe and then writes the response, in
> gss_pipe_downcall the msg corresponding to service A will be picked
> because only uid is used to find the msg and it is before the one for
> B in pipe->in_downcall.  And the process waiting for the msg
> corresponding to service A will be woken up.

Wait a minute… The ‘service’ here is one of krb5, krb5i, or krb5p. What is being pushed down from user space is a RPCSEC_GSS context that can be used for any one of those services. So the ordering of A and B is not supposed to matter. Any one of those requests can take the context and make use of it.

However once the context has been used with one of the krb5, krb5i or krb5p services then it cannot be used with any of the others. This is why commit 9130b8dbc6ac that you referenced above separates the services in gss_add_msg().

> 
> Actual scheduing of that process might be after rpc.gssd processes the
> next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
> The process is scheduled to see gss_msg->ctx == NULL and
> gss_msg->msg.errno == 0, therefore it cannot break the loop in
> gss_create_upcall and is never woken up after that.
> 
> This patch introduces wait and retry at gss_add_msg() to serialize
> when requests with the same uid but different service comes.

As long as rpc.gssd returns one context downcall for each upcall (or it breaks the connection in order to force a retransmission) then we shouldn’t have to serialise anything.

However what we could do to fix the race you appear to be describing is to check if the upcall has completed yet before we accept the message as a candidate for the downcall. That could be just a simple check for (msg->copied != 0 && list_empty(&msg->list)). Maybe add a helper for that in include/linux/sunrpc/rpc_pipe_fs.h?
minoura makoto Dec. 11, 2022, 1:26 p.m. UTC | #2
Thanks for your comment.

> > On Dec 8, 2022, at 19:30, Hiroshi Shimamoto <h-shimamoto@nec.com> wrote:
> > 
> > From: minoura <minoura@valinux.co.jp>
> > 
> > Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
> > but different gss service") introduced `auth` argument to
> > __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
> > since it (and auth->service) was not (yet) determined.
> > 
> > When multiple upcalls with the same uid and different service are
> > ongoing, it could happen that __gss_find_upcall(), which returns the
> > first match found in the pipe->in_downcall list, could not find the
> > correct gss_msg corresponding to the downcall we are looking for due
> > to two reasons:
> > 
> > - the order of the msgs in pipe->in_downcall and those in pipe->pipe
> >  (that is, the order of the upcalls sent to rpc.gssd) might be
> >  different because pipe->lock is dropped between adding one to each
> >  list.
> > - rpc.gssd uses threads to write responses, which means we cannot
> >  guarantee the order of responses.
> > 
> > We could see mount.nfs process hung in D state with multiple mount.nfs
> > are executed in parallel.  The call trace below is of CentOS 7.9
> > kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
> > elrepo kernel-ml-6.0.7-1.el7.
> > 
> > PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
> > #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
> > #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
> > #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
> > #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
> > #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
> > #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
> > #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
> > #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
> > #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
> > #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
> > 
> > The scenario is like this. Let's say there are two upcalls for
> > services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
> > 
> > When rpc.gssd reads pipe to get the upcall msg corresponding to
> > service B from pipe->pipe and then writes the response, in
> > gss_pipe_downcall the msg corresponding to service A will be picked
> > because only uid is used to find the msg and it is before the one for
> > B in pipe->in_downcall.  And the process waiting for the msg
> > corresponding to service A will be woken up.

> Wait a minute… The ‘service’ here is one of krb5, krb5i, or
> krb5p. What is being pushed down from user space is a RPCSEC_GSS
> context that can be used for any one of those services. So the
> ordering of A and B is not supposed to matter. Any one of those
> requests can take the context and make use of it.

> However once the context has been used with one of the krb5, krb5i or
> krb5p services then it cannot be used with any of the others. This is
> why commit 9130b8dbc6ac that you referenced above separates the
> services in gss_add_msg().

Right.  (BTW I am wondering why we do not have to compare
->service in gss_match(), during looking up the cred cache...)

> > 
> > Actual scheduing of that process might be after rpc.gssd processes the
> > next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
> > The process is scheduled to see gss_msg->ctx == NULL and
> > gss_msg->msg.errno == 0, therefore it cannot break the loop in
> > gss_create_upcall and is never woken up after that.
> > 
> > This patch introduces wait and retry at gss_add_msg() to serialize
> > when requests with the same uid but different service comes.

> As long as rpc.gssd returns one context downcall for each upcall (or
> it breaks the connection in order to force a retransmission) then we
> shouldn’t have to serialise anything.

> However what we could do to fix the race you appear to be describing
> is to check if the upcall has completed yet before we accept the
> message as a candidate for the downcall. That could be just a simple
> check for (msg->copied != 0 && list_empty(&msg->list)). Maybe add a
> helper for that in include/linux/sunrpc/rpc_pipe_fs.h?

This makes perfect sense.  The race occurs because downcall
(mistakingly) picks a msg which is not sent to gssd yet, and
it can be distinguished with that condition.  Will revise
the patch. Thanks!

> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
SHIMAMOTO HIROSHI(島本 裕志) Dec. 11, 2022, 11:46 p.m. UTC | #3
> Subject: Re: [PATCH v2] SUNRPC: serialize gss upcalls for the same uid
> 
> 
> 
> > On Dec 8, 2022, at 19:30, Hiroshi Shimamoto <h-shimamoto@nec.com> wrote:
> >
> > From: minoura <minoura@valinux.co.jp>
> >
> > Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
> > but different gss service") introduced `auth` argument to
> > __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
> > since it (and auth->service) was not (yet) determined.
> >
> > When multiple upcalls with the same uid and different service are
> > ongoing, it could happen that __gss_find_upcall(), which returns the
> > first match found in the pipe->in_downcall list, could not find the
> > correct gss_msg corresponding to the downcall we are looking for due
> > to two reasons:
> >
> > - the order of the msgs in pipe->in_downcall and those in pipe->pipe
> >  (that is, the order of the upcalls sent to rpc.gssd) might be
> >  different because pipe->lock is dropped between adding one to each
> >  list.
> > - rpc.gssd uses threads to write responses, which means we cannot
> >  guarantee the order of responses.
> >
> > We could see mount.nfs process hung in D state with multiple mount.nfs
> > are executed in parallel.  The call trace below is of CentOS 7.9
> > kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
> > elrepo kernel-ml-6.0.7-1.el7.
> >
> > PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
> > #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
> > #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
> > #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
> > #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
> > #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
> > #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
> > #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
> > #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
> > #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
> > #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
> >
> > The scenario is like this. Let's say there are two upcalls for
> > services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
> >
> > When rpc.gssd reads pipe to get the upcall msg corresponding to
> > service B from pipe->pipe and then writes the response, in
> > gss_pipe_downcall the msg corresponding to service A will be picked
> > because only uid is used to find the msg and it is before the one for
> > B in pipe->in_downcall.  And the process waiting for the msg
> > corresponding to service A will be woken up.
> 
> Wait a minute… The ‘service’ here is one of krb5, krb5i, or krb5p. What is being pushed down from user
> space is a RPCSEC_GSS context that can be used for any one of those services. So the ordering of A and B
> is not supposed to matter. Any one of those requests can take the context and make use of it.
> 
> However once the context has been used with one of the krb5, krb5i or krb5p services then it cannot be used
> with any of the others. This is why commit 9130b8dbc6ac that you referenced above separates the services
> in gss_add_msg().

Thanks to describe the behavior, now I see the ordering hadn't caused a problem.
I had thought the ordering is the root cause to be fixed, but I have understood fixing the race to cause wrongly clear errno is enough.

Thanks,
Hiroshi

> 
> >
> > Actual scheduing of that process might be after rpc.gssd processes the
> > next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
> > The process is scheduled to see gss_msg->ctx == NULL and
> > gss_msg->msg.errno == 0, therefore it cannot break the loop in
> > gss_create_upcall and is never woken up after that.
> >
> > This patch introduces wait and retry at gss_add_msg() to serialize
> > when requests with the same uid but different service comes.
> 
> As long as rpc.gssd returns one context downcall for each upcall (or it breaks the connection in order to
> force a retransmission) then we shouldn’t have to serialise anything.
> 
> However what we could do to fix the race you appear to be describing is to check if the upcall has completed
> yet before we accept the message as a candidate for the downcall. That could be just a simple check for
> (msg->copied != 0 && list_empty(&msg->list)). Maybe add a helper for that in
> include/linux/sunrpc/rpc_pipe_fs.h?
> 
> 
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
SHIMAMOTO HIROSHI(島本 裕志) Dec. 12, 2022, 3:33 a.m. UTC | #4
> Subject: Re: [PATCH v2] SUNRPC: serialize gss upcalls for the same uid
> 
> 
> 
> > On Dec 8, 2022, at 19:30, Hiroshi Shimamoto <h-shimamoto@nec.com> wrote:
> >
> > From: minoura <minoura@valinux.co.jp>
> >
> > Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
> > but different gss service") introduced `auth` argument to
> > __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
> > since it (and auth->service) was not (yet) determined.
> >
> > When multiple upcalls with the same uid and different service are
> > ongoing, it could happen that __gss_find_upcall(), which returns the
> > first match found in the pipe->in_downcall list, could not find the
> > correct gss_msg corresponding to the downcall we are looking for due
> > to two reasons:
> >
> > - the order of the msgs in pipe->in_downcall and those in pipe->pipe
> >  (that is, the order of the upcalls sent to rpc.gssd) might be
> >  different because pipe->lock is dropped between adding one to each
> >  list.
> > - rpc.gssd uses threads to write responses, which means we cannot
> >  guarantee the order of responses.
> >
> > We could see mount.nfs process hung in D state with multiple mount.nfs
> > are executed in parallel.  The call trace below is of CentOS 7.9
> > kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
> > elrepo kernel-ml-6.0.7-1.el7.
> >
> > PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
> > #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
> > #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
> > #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
> > #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
> > #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
> > #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
> > #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
> > #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
> > #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
> > #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
> >
> > The scenario is like this. Let's say there are two upcalls for
> > services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
> >
> > When rpc.gssd reads pipe to get the upcall msg corresponding to
> > service B from pipe->pipe and then writes the response, in
> > gss_pipe_downcall the msg corresponding to service A will be picked
> > because only uid is used to find the msg and it is before the one for
> > B in pipe->in_downcall.  And the process waiting for the msg
> > corresponding to service A will be woken up.
> 
> Wait a minute… The ‘service’ here is one of krb5, krb5i, or krb5p. What is being pushed down from user
> space is a RPCSEC_GSS context that can be used for any one of those services. So the ordering of A and B
> is not supposed to matter. Any one of those requests can take the context and make use of it.
> 
> However once the context has been used with one of the krb5, krb5i or krb5p services then it cannot be used
> with any of the others. This is why commit 9130b8dbc6ac that you referenced above separates the services
> in gss_add_msg().

One question, how about simultaneous upcalls AUTH_GSS and AUTH_UNIX?
I'm not sure there is the such case, but an error could be taken for the successful case, no? 

Thanks,
Hiroshi
minoura makoto Dec. 12, 2022, 6 a.m. UTC | #5
> One question, how about simultaneous upcalls AUTH_GSS and AUTH_UNIX?
> I'm not sure there is the such case, but an error could be taken for the successful case, no? 

This is the auth_gss module, and we always come here to
obtain a GSS security context.  In sec=sys case we might
come here with RPC_AUTH_GSS_KRB5I (see nfs4_alloc_client()
in nfs4client.c), though.  In such case gssd might return an
EACCESS error.
diff mbox series

Patch

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 7bb247c51e2f..8f9b6ac7990b 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -296,14 +296,12 @@  gss_release_msg(struct gss_upcall_msg *gss_msg)
 }
 
 static struct gss_upcall_msg *
-__gss_find_upcall(struct rpc_pipe *pipe, kuid_t uid, const struct gss_auth *auth)
+__gss_find_upcall(struct rpc_pipe *pipe, kuid_t uid)
 {
 	struct gss_upcall_msg *pos;
 	list_for_each_entry(pos, &pipe->in_downcall, list) {
 		if (!uid_eq(pos->uid, uid))
 			continue;
-		if (auth && pos->auth->service != auth->service)
-			continue;
 		refcount_inc(&pos->count);
 		return pos;
 	}
@@ -319,14 +317,31 @@  gss_add_msg(struct gss_upcall_msg *gss_msg)
 {
 	struct rpc_pipe *pipe = gss_msg->pipe;
 	struct gss_upcall_msg *old;
+	DEFINE_WAIT(wait);
 
 	spin_lock(&pipe->lock);
-	old = __gss_find_upcall(pipe, gss_msg->uid, gss_msg->auth);
+retry:
+	old = __gss_find_upcall(pipe, gss_msg->uid);
 	if (old == NULL) {
 		refcount_inc(&gss_msg->count);
 		list_add(&gss_msg->list, &pipe->in_downcall);
-	} else
+	} else {
+		if (old->auth->service != gss_msg->auth->service) {
+			prepare_to_wait(&old->waitqueue, &wait, TASK_KILLABLE);
+			spin_unlock(&pipe->lock);
+			if (fatal_signal_pending(current)) {
+				finish_wait(&old->waitqueue, &wait);
+				gss_release_msg(old);
+				return ERR_PTR(-ERESTARTSYS);
+			}
+			schedule();
+			finish_wait(&old->waitqueue, &wait);
+			gss_release_msg(old);
+			spin_lock(&pipe->lock);
+			goto retry;
+		}
 		gss_msg = old;
+	}
 	spin_unlock(&pipe->lock);
 	return gss_msg;
 }
@@ -554,6 +569,10 @@  gss_setup_upcall(struct gss_auth *gss_auth, struct rpc_cred *cred)
 	if (IS_ERR(gss_new))
 		return gss_new;
 	gss_msg = gss_add_msg(gss_new);
+	if (IS_ERR(gss_msg)) {
+		gss_release_msg(gss_new);
+		return gss_msg;
+	}
 	if (gss_msg == gss_new) {
 		int res;
 		refcount_inc(&gss_msg->count);
@@ -732,7 +751,7 @@  gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
 	err = -ENOENT;
 	/* Find a matching upcall */
 	spin_lock(&pipe->lock);
-	gss_msg = __gss_find_upcall(pipe, uid, NULL);
+	gss_msg = __gss_find_upcall(pipe, uid);
 	if (gss_msg == NULL) {
 		spin_unlock(&pipe->lock);
 		goto err_put_ctx;