diff mbox

NFS: Fix infinite loop in gss_create_upcall()

Message ID 4DA60AB9.1050104@netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bryan Schumaker April 13, 2011, 8:42 p.m. UTC
On 04/12/2011 02:52 PM, Jiri Slaby wrote:
> On 04/12/2011 08:43 PM, Bryan Schumaker wrote:
>> On 04/12/2011 02:34 PM, Jiri Slaby wrote:
>>> On 04/12/2011 08:31 PM, Trond Myklebust wrote:
>>>>> Yes, it fixes the problem. But it waits 15s before it times out. This is
>>>>> inacceptable for automounted NFS dirs.
>>>>
>>>> I'm still confused as to why you are hitting it at all. In the normal
>>>> autonegotiation case, the client should be trying to use AUTH_SYS first
>>>> and then trying rpcsec_gss if and only if that fails.
>>>>
>>>> Are you really exporting a filesystem using AUTH_NULL as the only
>>>> supported flavour?
>>>
>>> I don't know, I connect to a nfs server which is not maintained by me.
>>> It looks like that. How can I find out?
>>
>> If you're not using gss for anything, you could try rmmod-ing rpcsec_gss_krb5 (and other rpcsec_gss_* modules).
> 
> I don't have NFS in modules. It's all built-in. And this one is
> unconditionally selected because of CONFIG_NFS_V4.

Does this patch help?

- Bryan

We should attempt an AUTH_NULL style mount before
trying gss flavors.  This should prevent a hang if
gss modules are loaded but the userspace program
isn't running.



> 
> regards,

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jiri Slaby April 14, 2011, 8:37 p.m. UTC | #1
On 04/13/2011 10:42 PM, Bryan Schumaker wrote:
> On 04/12/2011 02:52 PM, Jiri Slaby wrote:
>> On 04/12/2011 08:43 PM, Bryan Schumaker wrote:
>>> On 04/12/2011 02:34 PM, Jiri Slaby wrote:
>>>> On 04/12/2011 08:31 PM, Trond Myklebust wrote:
>>>>>> Yes, it fixes the problem. But it waits 15s before it times out. This is
>>>>>> inacceptable for automounted NFS dirs.
>>>>>
>>>>> I'm still confused as to why you are hitting it at all. In the normal
>>>>> autonegotiation case, the client should be trying to use AUTH_SYS first
>>>>> and then trying rpcsec_gss if and only if that fails.
>>>>>
>>>>> Are you really exporting a filesystem using AUTH_NULL as the only
>>>>> supported flavour?
>>>>
>>>> I don't know, I connect to a nfs server which is not maintained by me.
>>>> It looks like that. How can I find out?
>>>
>>> If you're not using gss for anything, you could try rmmod-ing rpcsec_gss_krb5 (and other rpcsec_gss_* modules).
>>
>> I don't have NFS in modules. It's all built-in. And this one is
>> unconditionally selected because of CONFIG_NFS_V4.
> 
> Does this patch help?

Nope, it makes things even worse:
# mount -oro,intr XXX:/yyy /mnt/c/
<15s delay here>
mount.nfs: access denied by server while mounting XXX:/yyy

So in nfs4_proc_get_root I do:
  printk("%s: %d %u\n", __func__, i, flav_array[i]);
  status = nfs4_lookup_root_sec(server, fhandle, info, flav_array[i]);
  printk("%s: res=%d\n", __func__, status);
and get:
[   18.159818] nfs4_proc_get_root: 0 1
[   18.214872] nfs4_proc_get_root: res=-1
[   18.214875] nfs4_proc_get_root: 1 0
[   18.254636] nfs4_proc_get_root: res=-1
[   18.254639] nfs4_proc_get_root: 2 390003
[   33.252174] RPC: AUTH_GSS upcall timed out.
[   33.252177] Please check user daemon is running.
[   33.252192] nfs4_proc_get_root: res=-13

If I revert that back and do the same:
[   28.275569] nfs4_proc_get_root: 0 1
[   28.296545] nfs4_proc_get_root: res=-1
[   28.296548] nfs4_proc_get_root: 1 390003
[   43.296107] RPC: AUTH_GSS upcall timed out.
[   43.296108] Please check user daemon is running.
[   43.296121] nfs4_proc_get_root: res=-13
[   43.296122] nfs4_proc_get_root: 2 0
[   43.318201] nfs4_proc_get_root: res=-1

I.e. all methods fail. And what matters is the last retval. From NULL it
is EPERM, from GSS it is EACCESS. For EPERM, mount(8) falls back to
nfs3, for EACCESS it dies terrible death.

linux-b984:~ # strace -fe mount -s 1000 mount -oro,intr XXX:/yyy /mnt/c/
Process 2396 attached
Process 2395 suspended
[pid  2396] mount("XXX:/yyy", "/mnt/c", "nfs", MS_RDONLY,
"intr,vers=4,addr=10.20.3.2,clientaddr=10.0.2.15") = -1 EPERM (Operation
not permitted)
[pid  2396] mount("XXX:/yyy", "/mnt/c", "nfs", MS_RDONLY,
"intr,addr=10.20.3.2,vers=3,proto=tcp,mountvers=3,mountproto=udp,mountport=709")
= 0
Process 2395 resumed
Process 2396 detached
--- SIGCHLD (Child exited) @ 0 (0) ---

thanks,
Trond Myklebust April 14, 2011, 9:21 p.m. UTC | #2
On Thu, 2011-04-14 at 22:37 +0200, Jiri Slaby wrote:
> On 04/13/2011 10:42 PM, Bryan Schumaker wrote:
> > On 04/12/2011 02:52 PM, Jiri Slaby wrote:
> >> On 04/12/2011 08:43 PM, Bryan Schumaker wrote:
> >>> On 04/12/2011 02:34 PM, Jiri Slaby wrote:
> >>>> On 04/12/2011 08:31 PM, Trond Myklebust wrote:
> >>>>>> Yes, it fixes the problem. But it waits 15s before it times out. This is
> >>>>>> inacceptable for automounted NFS dirs.
> >>>>>
> >>>>> I'm still confused as to why you are hitting it at all. In the normal
> >>>>> autonegotiation case, the client should be trying to use AUTH_SYS first
> >>>>> and then trying rpcsec_gss if and only if that fails.
> >>>>>
> >>>>> Are you really exporting a filesystem using AUTH_NULL as the only
> >>>>> supported flavour?
> >>>>
> >>>> I don't know, I connect to a nfs server which is not maintained by me.
> >>>> It looks like that. How can I find out?
> >>>
> >>> If you're not using gss for anything, you could try rmmod-ing rpcsec_gss_krb5 (and other rpcsec_gss_* modules).
> >>
> >> I don't have NFS in modules. It's all built-in. And this one is
> >> unconditionally selected because of CONFIG_NFS_V4.
> > 
> > Does this patch help?
> 
> Nope, it makes things even worse:
> # mount -oro,intr XXX:/yyy /mnt/c/
> <15s delay here>
> mount.nfs: access denied by server while mounting XXX:/yyy
> 
> So in nfs4_proc_get_root I do:
>   printk("%s: %d %u\n", __func__, i, flav_array[i]);
>   status = nfs4_lookup_root_sec(server, fhandle, info, flav_array[i]);
>   printk("%s: res=%d\n", __func__, status);
> and get:
> [   18.159818] nfs4_proc_get_root: 0 1
> [   18.214872] nfs4_proc_get_root: res=-1
> [   18.214875] nfs4_proc_get_root: 1 0
> [   18.254636] nfs4_proc_get_root: res=-1
> [   18.254639] nfs4_proc_get_root: 2 390003
> [   33.252174] RPC: AUTH_GSS upcall timed out.
> [   33.252177] Please check user daemon is running.
> [   33.252192] nfs4_proc_get_root: res=-13
> 
> If I revert that back and do the same:
> [   28.275569] nfs4_proc_get_root: 0 1
> [   28.296545] nfs4_proc_get_root: res=-1
> [   28.296548] nfs4_proc_get_root: 1 390003
> [   43.296107] RPC: AUTH_GSS upcall timed out.
> [   43.296108] Please check user daemon is running.
> [   43.296121] nfs4_proc_get_root: res=-13
> [   43.296122] nfs4_proc_get_root: 2 0
> [   43.318201] nfs4_proc_get_root: res=-1
> 
> I.e. all methods fail. And what matters is the last retval. From NULL it
> is EPERM, from GSS it is EACCESS. For EPERM, mount(8) falls back to
> nfs3, for EACCESS it dies terrible death.

OK. That's good information. Thanks for testing!

I'm still curious as to why that NFS server is refusing all NFSv4 mounts
with NFS4ERR_WRONGSEC. Unless NFSv4 really is configured only to export
the root filesystem with RPCSEC_GSS, then that definitely sounds like a
bug...

Cheers
  Trond
Jiri Slaby April 14, 2011, 9:30 p.m. UTC | #3
On 04/14/2011 11:21 PM, Trond Myklebust wrote:
> On Thu, 2011-04-14 at 22:37 +0200, Jiri Slaby wrote:
>> On 04/13/2011 10:42 PM, Bryan Schumaker wrote:
>>> On 04/12/2011 02:52 PM, Jiri Slaby wrote:
>>>> On 04/12/2011 08:43 PM, Bryan Schumaker wrote:
>>>>> On 04/12/2011 02:34 PM, Jiri Slaby wrote:
>>>>>> On 04/12/2011 08:31 PM, Trond Myklebust wrote:
>>>>>>>> Yes, it fixes the problem. But it waits 15s before it times out. This is
>>>>>>>> inacceptable for automounted NFS dirs.
>>>>>>>
>>>>>>> I'm still confused as to why you are hitting it at all. In the normal
>>>>>>> autonegotiation case, the client should be trying to use AUTH_SYS first
>>>>>>> and then trying rpcsec_gss if and only if that fails.
>>>>>>>
>>>>>>> Are you really exporting a filesystem using AUTH_NULL as the only
>>>>>>> supported flavour?
>>>>>>
>>>>>> I don't know, I connect to a nfs server which is not maintained by me.
>>>>>> It looks like that. How can I find out?
>>>>>
>>>>> If you're not using gss for anything, you could try rmmod-ing rpcsec_gss_krb5 (and other rpcsec_gss_* modules).
>>>>
>>>> I don't have NFS in modules. It's all built-in. And this one is
>>>> unconditionally selected because of CONFIG_NFS_V4.
>>>
>>> Does this patch help?
>>
>> Nope, it makes things even worse:
>> # mount -oro,intr XXX:/yyy /mnt/c/
>> <15s delay here>
>> mount.nfs: access denied by server while mounting XXX:/yyy
>>
>> So in nfs4_proc_get_root I do:
>>   printk("%s: %d %u\n", __func__, i, flav_array[i]);
>>   status = nfs4_lookup_root_sec(server, fhandle, info, flav_array[i]);
>>   printk("%s: res=%d\n", __func__, status);
>> and get:
>> [   18.159818] nfs4_proc_get_root: 0 1
>> [   18.214872] nfs4_proc_get_root: res=-1
>> [   18.214875] nfs4_proc_get_root: 1 0
>> [   18.254636] nfs4_proc_get_root: res=-1
>> [   18.254639] nfs4_proc_get_root: 2 390003
>> [   33.252174] RPC: AUTH_GSS upcall timed out.
>> [   33.252177] Please check user daemon is running.
>> [   33.252192] nfs4_proc_get_root: res=-13
>>
>> If I revert that back and do the same:
>> [   28.275569] nfs4_proc_get_root: 0 1
>> [   28.296545] nfs4_proc_get_root: res=-1
>> [   28.296548] nfs4_proc_get_root: 1 390003
>> [   43.296107] RPC: AUTH_GSS upcall timed out.
>> [   43.296108] Please check user daemon is running.
>> [   43.296121] nfs4_proc_get_root: res=-13
>> [   43.296122] nfs4_proc_get_root: 2 0
>> [   43.318201] nfs4_proc_get_root: res=-1
>>
>> I.e. all methods fail. And what matters is the last retval. From NULL it
>> is EPERM, from GSS it is EACCESS. For EPERM, mount(8) falls back to
>> nfs3, for EACCESS it dies terrible death.
> 
> OK. That's good information. Thanks for testing!
> 
> I'm still curious as to why that NFS server is refusing all NFSv4 mounts
> with NFS4ERR_WRONGSEC. Unless NFSv4 really is configured only to export
> the root filesystem with RPCSEC_GSS, then that definitely sounds like a
> bug...

With gssd running if that helps:
[  229.806528] nfs4_proc_get_root: 0 1
[  229.828491] nfs4_proc_get_root: res=-1
[  229.828494] nfs4_proc_get_root: 1 390003
[  229.896994] nfs4_proc_get_root: res=-13
[  229.896997] nfs4_proc_get_root: 2 0
[  229.920344] nfs4_proc_get_root: res=-1

thanks,
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9bf41ea..4e3c16b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2218,8 +2218,8 @@  static int nfs4_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
 	rpc_authflavor_t flav_array[NFS_MAX_SECFLAVORS + 2];
 
 	flav_array[0] = RPC_AUTH_UNIX;
-	len = gss_mech_list_pseudoflavors(&flav_array[1]);
-	flav_array[1+len] = RPC_AUTH_NULL;
+	flav_array[1] = RPC_AUTH_NULL;
+	len = gss_mech_list_pseudoflavors(&flav_array[2]);
 	len += 2;
 
 	for (i = 0; i < len; i++) {