diff mbox

Adding the nfs4_secure_mounts bool

Message ID 20131113112346.3f5f3bd0@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Nov. 13, 2013, 12:23 a.m. UTC
On Tue, 12 Nov 2013 11:16:34 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Tue, Nov 12, 2013 at 05:29:46AM +0000, Myklebust, Trond wrote:
> > 
> > On Nov 12, 2013, at 0:11, NeilBrown <neilb@suse.de> wrote:
> > 
> > > On Mon, 11 Nov 2013 15:33:14 -0500 Chuck Lever <chuck.lever@oracle.com> wrote:
> > > 
> > >> 
> > >> On Nov 11, 2013, at 1:59 PM, Steve Dickson <SteveD@redhat.com> wrote:
> > >> 
> > >>> On 11/11/13 13:30, Chuck Lever wrote:
> > >>>> 
> > >>>> On Nov 11, 2013, at 1:06 PM, Steve Dickson <SteveD@redhat.com> wrote:
> > >>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> On 09/11/13 18:12, Myklebust, Trond wrote:
> > >>>>>> One alternative to the above scheme, which I believe that I’ve 
> > >>>>>> suggested before, is to have a permanent entry in rpc_pipefs 
> > >>>>>> that rpc.gssd can open and that the kernel can use to detect 
> > >>>>>> that it is running. If we make it /var/lib/nfs/rpc_pipefs/gssd/clnt00/gssd, 
> > >>>>>> then AFAICS we don’t need to change nfs-utils at all, since all newer 
> > >>>>>> versions of rpc.gssd will try to open for read anything of the form 
> > >>>>>> /var/lib/nfs/rpc_pipefs/*/clntXX/gssd...
> > >>>>> 
> > >>>>> After further review I am going going have to disagree with you on this.
> > >>>>> Since all the context is cached on the initial mount the kernel
> > >>>>> should be using the call_usermodehelper() to call up to rpc.gssd 
> > >>>>> to get the context, which means we could put this upcall noise 
> > >>>>> to bed... forever! :-)
> > >>>> 
> > >>>> Ask Al Viro for his comments on whether the kernel should start 
> > >>>> gssd (either a daemon or a script).  Hint: wear your kevlar underpants.
> > >>> I was thinking gssd would become a the gssd-cmd command... Al does not
> > >>> like the call_usermodehelper() interface?
> > >> 
> > >> He doesn't have a problem with call_usermodehelper() in general.  However, the kernel cannot guarantee security if it has to run a fixed command line.  Go ask him to explain.
> > >> 
> > >> 
> > >>> 
> > >>>> 
> > >>>> Have you tried Trond's approach yet?
> > >>> Looking into it... But nothing is trivial in that code... 
> > >>> 
> > >>>> 
> > >>>>> I realize this is not going happen overnight, so I would still
> > >>>>> like to propose my  nfs4_secure_mounts bool patch as bridge
> > >>>>> to the new call_usermodehelper()  since its the cleanest 
> > >>>>> solution so far... 
> > >>>>> 
> > >>>>> Thoughts?
> > >>>> 
> > >>>> We have workarounds already that work on every kernel since 3.8.
> > >>>> 
> > >>> The one that logs 5 to 20 lines (depending on thins are setup or not)
> > >>> per mount? That does work in some environments but no all. ;-)
> > >> 
> > >> When does running rpc.gssd not work?
> > > 
> > > Oohh ooh.. Pick me.  Pick me!! I can answer that one.
> > > 
> > > Running rpc.gssd does not work if you are mounting a filesystem using the IP
> > > address of the server and that IP address doesn't have a matching hostname
> > > anywhere that can be found:
> > > 
> > > In a newly creating minimal kvm install without rpc.gssd running,
> > >   mount 10.0.2.2:/home /mnt
> > > 
> > > sleeps for 15 seconds then succeeds.
> > > If I start rpc.gssd, then the same command takes forever.
> > > 
> > > strace of rpc.gssd shows that it complains about not being able to resolve
> > > the host name and "ERROR: failed to read service info".  Then it keeps the
> > > pipes open but never sends any message on them, so the kernel just keeps on
> > > waiting.
> > > 
> > > If I change "fail_keep_client" to "fail_destroy_client", then it closes the
> > > pipe and we get the 15 second timeout back.
> > > If I change  NI_NAMEREQD to 0, then the mount completes instantly.  (of course
> > > that make serious compromise security so it was just for testing).
> > > (Adding an entry to /etc/hosts also gives instant success).
> > > 
> > > I'm hoping that someone who understands this code will suggest something
> > > clever so I don't have to dig through all of it ;-)
> > 
> > rpc.gssd is supposed to do a downcall with a zero-length window and an error message in any situation where it cannot establish a GSS context. Normally, I’d expect an EACCES for the above scenario.
> > 
> > IOW: that’s a blatant rpc.gssd bug. One that will also affect you when you're doing NFSv3 and add ‘sec=krb5’ to the mount options.
> 
> Also why is gssd trying to do a DNS lookup in this case?  This sounds
> similar to what f9f5450f8f94 "Avoid DNS reverse resolution for server
> names (take 3)" was trying to fix?

It is quite possible that I misunderstand something.  But this is my
understanding.

1/ "mount" allows you to use either an IP address or a host name to mount a
   filesystem.
2/ gss requires a hostname to identify the server and find it's key (IP not
   sufficient).
3/ If you use a host name to mount a filesystem, then that exact same host
   name should be used by gssd to identify the server and its key.
   The above mentioned patch was trying to enforce this.  The idea was to
   collect the name given to the 'mount', see if it looked like an IP address
   or a Server name.  If the later, just use it.  If the former, do a reverse
   lookup because an IP address is no use by itself for gss.
   Previously it would always do a reverse DNS lookup from the IP address
   that was determined from the server-name-or-IP-address.
   Unfortunately this patch was broken - got the test backwards.
   A follow-up patch fixed the test: c93e8d8eeafec3e32

4/ So the above patch was not intended to address the case of mount-by-IP
   address at all - and this is the case that is causing me problems.


But back to my problem:  Following Trond's suggestion I've come up with the
following patch.  Does it look right?

The "fd = -1" is just to stop us trying to close a non-open fd in an error
path.

The change from testing ->servicename to ->prog stops us from repeating the
failed DNS lookup on every request, not that the failure isn't fatal.

The last stanza makes sure we always reply to an upcall, with EINVAL if
nothing else seems appropriate.

The patch seems to work for my particular case but a more general review
would be appreciated.

Thanks,
NeilBrown

Comments

NeilBrown Nov. 13, 2013, 1:13 a.m. UTC | #1
On Wed, 13 Nov 2013 00:30:53 +0000 "Myklebust, Trond"
<Trond.Myklebust@netapp.com> wrote:

> On Wed, 2013-11-13 at 11:23 +1100, NeilBrown wrote:
> > But back to my problem:  Following Trond's suggestion I've come up with the
> > following patch.  Does it look right?
> > 
> > The "fd = -1" is just to stop us trying to close a non-open fd in an error
> > path.
> > 
> > The change from testing ->servicename to ->prog stops us from repeating the
> > failed DNS lookup on every request, not that the failure isn't fatal.
> > 
> > The last stanza makes sure we always reply to an upcall, with EINVAL if
> > nothing else seems appropriate.
> 
> Wouldn't EACCES be more appropriate as a default?
> 

Maybe.  And that is what you suggested before and I mis-remembered - sorry.

However EACCES is "Permission denied" which doesn't quite seem right to me.
It isn't really "you aren't allowed to do that", but "your question doesn't
make sense".

However I'm not fussed.  If you prefer EACCES, then I'll make it EACCES.

Thanks,
NeilBrown
J. Bruce Fields Nov. 13, 2013, 3:46 a.m. UTC | #2
On Wed, Nov 13, 2013 at 11:23:46AM +1100, NeilBrown wrote:
> On Tue, 12 Nov 2013 11:16:34 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Tue, Nov 12, 2013 at 05:29:46AM +0000, Myklebust, Trond wrote:
> > > 
> > > On Nov 12, 2013, at 0:11, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > > On Mon, 11 Nov 2013 15:33:14 -0500 Chuck Lever <chuck.lever@oracle.com> wrote:
> > > > 
> > > >> 
> > > >> On Nov 11, 2013, at 1:59 PM, Steve Dickson <SteveD@redhat.com> wrote:
> > > >> 
> > > >>> On 11/11/13 13:30, Chuck Lever wrote:
> > > >>>> 
> > > >>>> On Nov 11, 2013, at 1:06 PM, Steve Dickson <SteveD@redhat.com> wrote:
> > > >>>> 
> > > >>>>> 
> > > >>>>> 
> > > >>>>> On 09/11/13 18:12, Myklebust, Trond wrote:
> > > >>>>>> One alternative to the above scheme, which I believe that I’ve 
> > > >>>>>> suggested before, is to have a permanent entry in rpc_pipefs 
> > > >>>>>> that rpc.gssd can open and that the kernel can use to detect 
> > > >>>>>> that it is running. If we make it /var/lib/nfs/rpc_pipefs/gssd/clnt00/gssd, 
> > > >>>>>> then AFAICS we don’t need to change nfs-utils at all, since all newer 
> > > >>>>>> versions of rpc.gssd will try to open for read anything of the form 
> > > >>>>>> /var/lib/nfs/rpc_pipefs/*/clntXX/gssd...
> > > >>>>> 
> > > >>>>> After further review I am going going have to disagree with you on this.
> > > >>>>> Since all the context is cached on the initial mount the kernel
> > > >>>>> should be using the call_usermodehelper() to call up to rpc.gssd 
> > > >>>>> to get the context, which means we could put this upcall noise 
> > > >>>>> to bed... forever! :-)
> > > >>>> 
> > > >>>> Ask Al Viro for his comments on whether the kernel should start 
> > > >>>> gssd (either a daemon or a script).  Hint: wear your kevlar underpants.
> > > >>> I was thinking gssd would become a the gssd-cmd command... Al does not
> > > >>> like the call_usermodehelper() interface?
> > > >> 
> > > >> He doesn't have a problem with call_usermodehelper() in general.  However, the kernel cannot guarantee security if it has to run a fixed command line.  Go ask him to explain.
> > > >> 
> > > >> 
> > > >>> 
> > > >>>> 
> > > >>>> Have you tried Trond's approach yet?
> > > >>> Looking into it... But nothing is trivial in that code... 
> > > >>> 
> > > >>>> 
> > > >>>>> I realize this is not going happen overnight, so I would still
> > > >>>>> like to propose my  nfs4_secure_mounts bool patch as bridge
> > > >>>>> to the new call_usermodehelper()  since its the cleanest 
> > > >>>>> solution so far... 
> > > >>>>> 
> > > >>>>> Thoughts?
> > > >>>> 
> > > >>>> We have workarounds already that work on every kernel since 3.8.
> > > >>>> 
> > > >>> The one that logs 5 to 20 lines (depending on thins are setup or not)
> > > >>> per mount? That does work in some environments but no all. ;-)
> > > >> 
> > > >> When does running rpc.gssd not work?
> > > > 
> > > > Oohh ooh.. Pick me.  Pick me!! I can answer that one.
> > > > 
> > > > Running rpc.gssd does not work if you are mounting a filesystem using the IP
> > > > address of the server and that IP address doesn't have a matching hostname
> > > > anywhere that can be found:
> > > > 
> > > > In a newly creating minimal kvm install without rpc.gssd running,
> > > >   mount 10.0.2.2:/home /mnt
> > > > 
> > > > sleeps for 15 seconds then succeeds.
> > > > If I start rpc.gssd, then the same command takes forever.
> > > > 
> > > > strace of rpc.gssd shows that it complains about not being able to resolve
> > > > the host name and "ERROR: failed to read service info".  Then it keeps the
> > > > pipes open but never sends any message on them, so the kernel just keeps on
> > > > waiting.
> > > > 
> > > > If I change "fail_keep_client" to "fail_destroy_client", then it closes the
> > > > pipe and we get the 15 second timeout back.
> > > > If I change  NI_NAMEREQD to 0, then the mount completes instantly.  (of course
> > > > that make serious compromise security so it was just for testing).
> > > > (Adding an entry to /etc/hosts also gives instant success).
> > > > 
> > > > I'm hoping that someone who understands this code will suggest something
> > > > clever so I don't have to dig through all of it ;-)
> > > 
> > > rpc.gssd is supposed to do a downcall with a zero-length window and an error message in any situation where it cannot establish a GSS context. Normally, I’d expect an EACCES for the above scenario.
> > > 
> > > IOW: that’s a blatant rpc.gssd bug. One that will also affect you when you're doing NFSv3 and add ‘sec=krb5’ to the mount options.
> > 
> > Also why is gssd trying to do a DNS lookup in this case?  This sounds
> > similar to what f9f5450f8f94 "Avoid DNS reverse resolution for server
> > names (take 3)" was trying to fix?
> 
> It is quite possible that I misunderstand something.  But this is my
> understanding.
> 
> 1/ "mount" allows you to use either an IP address or a host name to mount a
>    filesystem.
> 2/ gss requires a hostname to identify the server and find it's key (IP not
>    sufficient).
> 3/ If you use a host name to mount a filesystem, then that exact same host
>    name should be used by gssd to identify the server and its key.
>    The above mentioned patch was trying to enforce this.  The idea was to
>    collect the name given to the 'mount', see if it looked like an IP address
>    or a Server name.  If the later, just use it.  If the former, do a reverse
>    lookup because an IP address is no use by itself for gss.
>    Previously it would always do a reverse DNS lookup from the IP address
>    that was determined from the server-name-or-IP-address.
>    Unfortunately this patch was broken - got the test backwards.
>    A follow-up patch fixed the test: c93e8d8eeafec3e32
> 
> 4/ So the above patch was not intended to address the case of mount-by-IP
>    address at all - and this is the case that is causing me problems.

OK, but it still seems dumb to even attempt the reverse lookup: the
lookup probably isn't secure, and the mount commandline should have a
name that we can match to a krb5 principal without needing any other
lookups.

So I'd think reasonable behavior in this case would be to just try the
IP address on the chance there's actually an nfs/x.y.z.w@REALM
principal.  (Or just fail outright if kerberos doesn't allow principals
that look like that.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown Nov. 14, 2013, 1:05 a.m. UTC | #3
On Wed, 13 Nov 2013 01:26:56 +0000 "Myklebust, Trond"
<Trond.Myklebust@netapp.com> wrote:

> On Wed, 2013-11-13 at 12:13 +1100, NeilBrown wrote:
> > On Wed, 13 Nov 2013 00:30:53 +0000 "Myklebust, Trond"
> > <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Wed, 2013-11-13 at 11:23 +1100, NeilBrown wrote:
> > > > But back to my problem:  Following Trond's suggestion I've come up with the
> > > > following patch.  Does it look right?
> > > > 
> > > > The "fd = -1" is just to stop us trying to close a non-open fd in an error
> > > > path.
> > > > 
> > > > The change from testing ->servicename to ->prog stops us from repeating the
> > > > failed DNS lookup on every request, not that the failure isn't fatal.
> > > > 
> > > > The last stanza makes sure we always reply to an upcall, with EINVAL if
> > > > nothing else seems appropriate.
> > > 
> > > Wouldn't EACCES be more appropriate as a default?
> > > 
> > 
> > Maybe.  And that is what you suggested before and I mis-remembered - sorry.
> > 
> > However EACCES is "Permission denied" which doesn't quite seem right to me.
> > It isn't really "you aren't allowed to do that", but "your question doesn't
> > make sense".
> > 
> > However I'm not fussed.  If you prefer EACCES, then I'll make it EACCES.
> 
> If you look at gss_pipe_downcall(), then you'll note that it treats
> EINVAL as a temporary error, and converts it to EAGAIN. That again
> causes call_refreshresult to retry the upcall 2 more times before
> failing with EACCES anyway...
> 

Yes, I see now, thanks.

I also see a 'BUG()' in there if the error code returned from user-space
isn't in the known list.  I suspect that should at most be a WARN, and
probably removed altogether.

Thanks,
NeilBrown
NeilBrown Nov. 14, 2013, 1:10 a.m. UTC | #4
On Wed, 13 Nov 2013 04:15:26 +0000 "Myklebust, Trond"
<Trond.Myklebust@netapp.com> wrote:

> On Tue, 2013-11-12 at 22:46 -0500, J. Bruce Fields wrote:
> 
> > OK, but it still seems dumb to even attempt the reverse lookup: the
> > lookup probably isn't secure, and the mount commandline should have a
> > name that we can match to a krb5 principal without needing any other
> > lookups.
> > 
> > So I'd think reasonable behavior in this case would be to just try the
> > IP address on the chance there's actually an nfs/x.y.z.w@REALM
> > principal.  (Or just fail outright if kerberos doesn't allow principals
> > that look like that.)
> 
> Looking through the krb5.conf manpage etc it looks as if a lot of this
> functionality should be covered by the krb protocol itself without us
> needing to do explicit reverse lookups in rpc.gssd. I'm thinking of the
> 'canonicalize' and 'rdns' options, for instance. Am I wrong?
> 

I suspect there is a good chance that you are correct, though my man page
only mentions "rdns", not "canonicalize" so there may be some version
dependency to think about.

However I think fixing this is a separate (though related) issue to fixing my
current problem and would probably require more code examination and testing
than I feel inclined to at the moment.  So I'll leave this side of the
question alone and just fix the bit that is clearly broken.

Thanks,
NeilBrown
diff mbox

Patch

diff --git a/utils/gssd/gssd_proc.c b/utils/gssd/gssd_proc.c
index b48d1637cd36..00b4bc779b7c 100644
--- a/utils/gssd/gssd_proc.c
+++ b/utils/gssd/gssd_proc.c
@@ -256,6 +256,7 @@  read_service_info(char *info_file_name, char **servicename, char **servername,
 	if ((nbytes = read(fd, buf, INFOBUFLEN)) == -1)
 		goto fail;
 	close(fd);
+	fd = -1;
 	buf[nbytes] = '\0';
 
 	numfields = sscanf(buf,"RPC server: %127s\n"
@@ -403,11 +404,10 @@  process_clnt_dir_files(struct clnt_info * clp)
 		return -1;
 	snprintf(info_file_name, sizeof(info_file_name), "%s/info",
 			clp->dirname);
-	if ((clp->servicename == NULL) &&
-	     read_service_info(info_file_name, &clp->servicename,
-				&clp->servername, &clp->prog, &clp->vers,
-				&clp->protocol, (struct sockaddr *) &clp->addr))
-		return -1;
+	if (clp->prog == 0)
+		read_service_info(info_file_name, &clp->servicename,
+				  &clp->servername, &clp->prog, &clp->vers,
+				  &clp->protocol, (struct sockaddr *) &clp->addr);
 	return 0;
 }
 
@@ -1320,11 +1320,14 @@  handle_gssd_upcall(struct clnt_info *clp)
 		}
 	}
 
-	if (strcmp(mech, "krb5") == 0)
+	if (strcmp(mech, "krb5") == 0 && clp->servername)
 		process_krb5_upcall(clp, uid, clp->gssd_fd, target, service);
-	else
-		printerr(0, "WARNING: handle_gssd_upcall: "
-			    "received unknown gss mech '%s'\n", mech);
+	else {
+		if (clp->servername)
+			printerr(0, "WARNING: handle_gssd_upcall: "
+				 "received unknown gss mech '%s'\n", mech);
+		do_error_downcall(clp->gssd_fd, uid, -EINVAL);
+	}
 
 out:
 	free(lbuf);