diff mbox

nfs: fix false positives in nfs40_walk_client_list()

Message ID 20161128140252.GA28629@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Nov. 28, 2016, 2:02 p.m. UTC
From: "J. Bruce Fields" <bfields@redhat.com>

It's possible that two different servers can return the same (clientid,
verifier) pair purely by coincidence.  Both are 64-bit values, but
depending on the server implementation, they can be highly predictable
and collisions may be quite likely, especially when there are lots of
servers.

So, check for this case.  If the clientid and verifier both match, then
we actually know they *can't* be the same server, since a new
SETCLIENTID to an already-known server should have changed the verifier.

This helps fix a bug that could cause the client to mount a filesystem
from the wrong server.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/nfs4client.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Schumaker, Anna Nov. 28, 2016, 6:46 p.m. UTC | #1
Hi Bruce,

On 11/28/2016 09:02 AM, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> It's possible that two different servers can return the same (clientid,
> verifier) pair purely by coincidence.  Both are 64-bit values, but
> depending on the server implementation, they can be highly predictable
> and collisions may be quite likely, especially when there are lots of
> servers.

How often have you been seeing this?

> 
> So, check for this case.  If the clientid and verifier both match, then
> we actually know they *can't* be the same server, since a new
> SETCLIENTID to an already-known server should have changed the verifier.
> 
> This helps fix a bug that could cause the client to mount a filesystem
> from the wrong server.
> 
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> Tested-by: Yongcheng Yang <yoyang@redhat.com>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/nfs/nfs4client.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 074ac7131459..5e2747644432 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -464,6 +464,11 @@ static bool nfs4_match_client_owner_id(const struct nfs_client *clp1,
>  	return strcmp(clp1->cl_owner_id, clp2->cl_owner_id) == 0;
>  }
>  
> +static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
> +{
> +	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));

Nit:  can you change the order to "memcmp() == 0"?

Thanks,
Anna

> +}
> +
>  /**
>   * nfs40_walk_client_list - Find server that recognizes a client ID
>   *
> @@ -521,7 +526,21 @@ int nfs40_walk_client_list(struct nfs_client *new,
>  
>  		if (!nfs4_match_client_owner_id(pos, new))
>  			continue;
> -
> +		/*
> +		 * We just sent a new SETCLIENTID, which should have
> +		 * caused the server to return a new cl_confirm.  So if
> +		 * cl_confirm is the same, then this is a different
> +		 * server that just returned the same cl_confirm by
> +		 * coincidence:
> +		 */
> +		if ((new != pos) && nfs4_same_verifier(&pos->cl_confirm,
> +						       &new->cl_confirm))
> +			continue;
> +		/*
> +		 * But if the cl_confirm's are different, then the only
> +		 * way that a SETCLIENTID_CONFIRM to pos can succeed is
> +		 * if new and pos point to the same server:
> +		 */
>  		atomic_inc(&pos->cl_count);
>  		spin_unlock(&nn->nfs_client_lock);
>  
> @@ -534,6 +553,7 @@ int nfs40_walk_client_list(struct nfs_client *new,
>  			break;
>  		case 0:
>  			nfs4_swap_callback_idents(pos, new);
> +			pos->cl_confirm = new->cl_confirm;
>  
>  			prev = NULL;
>  			*result = pos;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Nov. 28, 2016, 7:23 p.m. UTC | #2
On Mon, Nov 28, 2016 at 01:46:00PM -0500, Anna Schumaker wrote:
> Hi Bruce,
> 
> On 11/28/2016 09:02 AM, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > It's possible that two different servers can return the same (clientid,
> > verifier) pair purely by coincidence.  Both are 64-bit values, but
> > depending on the server implementation, they can be highly predictable
> > and collisions may be quite likely, especially when there are lots of
> > servers.
> 
> How often have you been seeing this?

This was a RHEL customer bug originally, so somebody actually did run
across this in production, but I don't know how common that is.

On a recent client you'll only hit this if you mount with "migration",
so that's probably the main thing that prevents a lot of people from
running into it.

Given the "migration" mount option: if you have a client that mounts
multiple linux knfsd servers, you'll hit the bug whenever the mounts
occur within the same second, and the servers started within the same
second.  Both occurrences are fairly likely for example if you reboot
all your servers at once for maintenance.

But the bug should be much harder to hit against servers that have
already applied ebd7c72c63 "nfsd: randomize SETCLIENTID reply to help
distinguish servers".

Looking at SETCLIENTID replies at the recent bakeathon, I'm fairly
certain this could happen with other servers too.

The original bug is really in RFC 7931, which recommended the client
use a trunking-detection algorithm that made incorrect assumptions about
server behavior; see

	https://www.rfc-editor.org/errata_search.php?rfc=7931

The Linux client uses a much simpler algorithm than what's described
there, but it has the same fix.

> > +static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
> > +{
> > +	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
> 
> Nit:  can you change the order to "memcmp() == 0"?

Sure, will you fix it up or should I resend?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 074ac7131459..5e2747644432 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -464,6 +464,11 @@  static bool nfs4_match_client_owner_id(const struct nfs_client *clp1,
 	return strcmp(clp1->cl_owner_id, clp2->cl_owner_id) == 0;
 }
 
+static bool nfs4_same_verifier(nfs4_verifier *v1, nfs4_verifier *v2)
+{
+	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
+}
+
 /**
  * nfs40_walk_client_list - Find server that recognizes a client ID
  *
@@ -521,7 +526,21 @@  int nfs40_walk_client_list(struct nfs_client *new,
 
 		if (!nfs4_match_client_owner_id(pos, new))
 			continue;
-
+		/*
+		 * We just sent a new SETCLIENTID, which should have
+		 * caused the server to return a new cl_confirm.  So if
+		 * cl_confirm is the same, then this is a different
+		 * server that just returned the same cl_confirm by
+		 * coincidence:
+		 */
+		if ((new != pos) && nfs4_same_verifier(&pos->cl_confirm,
+						       &new->cl_confirm))
+			continue;
+		/*
+		 * But if the cl_confirm's are different, then the only
+		 * way that a SETCLIENTID_CONFIRM to pos can succeed is
+		 * if new and pos point to the same server:
+		 */
 		atomic_inc(&pos->cl_count);
 		spin_unlock(&nn->nfs_client_lock);
 
@@ -534,6 +553,7 @@  int nfs40_walk_client_list(struct nfs_client *new,
 			break;
 		case 0:
 			nfs4_swap_callback_idents(pos, new);
+			pos->cl_confirm = new->cl_confirm;
 
 			prev = NULL;
 			*result = pos;