mbox series

[4.14,0/2] Two nfsd4 fixes for 4.14 longterm

Message ID 20190205100141.25304-1-buczek@molgen.mpg.de (mailing list archive)
Headers show
Series Two nfsd4 fixes for 4.14 longterm | expand

Message

Donald Buczek Feb. 5, 2019, 10:01 a.m. UTC
Please include the upstream patches

    *  commit 085def3ade52 ("nfsd4: fix cached replies to solo SEQUENCE
compounds")
    *  commit 53da6a53e1d4 ("nfsd4: catch some false session retries")

into longterm kernel 4.14.

Because these patches went upstream in 4.15, they are already
included in longterm 4.19 and stable.

A nfsd4 server failure, where the server sends unrelated replies
to client requests containing only SEQUENCE operations has been
experienced with kernel version 4.14.87 [1]. Because the client
retries endlessly, this completely breaks the nfs mount.

A backport of these patches has been suggested by the patch
author J. Bruce Fields. [1]

Only limited testing has been done (running a single server with
4.14.96 and these two patches for a week and generating some high
load on it). It can not be proven by testing, that the patches
really fix the experienced problem. However, as the patches are
part of upstream and stable kernels for over a year
they should be a save pick.

Backport note: Only line numbers updated.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=202435

J. Bruce Fields (2):
  nfsd4: fix cached replies to solo SEQUENCE compounds
  nfsd4: catch some false session retries

 fs/nfsd/nfs4state.c | 57 ++++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/state.h     |  2 ++
 fs/nfsd/xdr4.h      | 13 +++++++++--
 3 files changed, 64 insertions(+), 8 deletions(-)

Comments

Salvatore Bonaccorso Feb. 5, 2019, 11:59 a.m. UTC | #1
Hi Donald,

On Tue, Feb 05, 2019 at 11:01:39AM +0100, Donald Buczek wrote:
> Please include the upstream patches
> 
>     *  commit 085def3ade52 ("nfsd4: fix cached replies to solo SEQUENCE
> compounds")
>     *  commit 53da6a53e1d4 ("nfsd4: catch some false session retries")
> 
> into longterm kernel 4.14.
> 
> Because these patches went upstream in 4.15, they are already
> included in longterm 4.19 and stable.
> 
> A nfsd4 server failure, where the server sends unrelated replies
> to client requests containing only SEQUENCE operations has been
> experienced with kernel version 4.14.87 [1]. Because the client
> retries endlessly, this completely breaks the nfs mount.
> 
> A backport of these patches has been suggested by the patch
> author J. Bruce Fields. [1]
> 
> Only limited testing has been done (running a single server with
> 4.14.96 and these two patches for a week and generating some high
> load on it). It can not be proven by testing, that the patches
> really fix the experienced problem. However, as the patches are
> part of upstream and stable kernels for over a year
> they should be a save pick.
> 
> Backport note: Only line numbers updated.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=202435
> 
> J. Bruce Fields (2):
>   nfsd4: fix cached replies to solo SEQUENCE compounds
>   nfsd4: catch some false session retries

Are you planning to submit the same as well for 4.9 LTS? The two
commits apply on top of 4.9.154 with line number updated.

Regards,
Salvatore
Donald Buczek Feb. 5, 2019, 2:59 p.m. UTC | #2
On 02/05/19 12:59, Salvatore Bonaccorso wrote:

> Are you planning to submit the same as well for 4.9 LTS? The two
> commits apply on top of 4.9.154 with line number updated.

No, I'm not, because I didn't do any testing with 4.9.

Additionally, I'm unsure about the right procedure for trivial backports
to multiple trees: Individual patch sets, which apply perfectly, a single
patch sets and Greg resolves that for the other trees or maybe no patch
set at all and just a "please cherry-pick .... from upstream" mail.

Best
   Donald
J. Bruce Fields Feb. 5, 2019, 9:31 p.m. UTC | #3
ACK to both, thanks.--b.

On Tue, Feb 05, 2019 at 11:01:39AM +0100, Donald Buczek wrote:
> Please include the upstream patches
> 
>     *  commit 085def3ade52 ("nfsd4: fix cached replies to solo SEQUENCE
> compounds")
>     *  commit 53da6a53e1d4 ("nfsd4: catch some false session retries")
> 
> into longterm kernel 4.14.
> 
> Because these patches went upstream in 4.15, they are already
> included in longterm 4.19 and stable.
> 
> A nfsd4 server failure, where the server sends unrelated replies
> to client requests containing only SEQUENCE operations has been
> experienced with kernel version 4.14.87 [1]. Because the client
> retries endlessly, this completely breaks the nfs mount.
> 
> A backport of these patches has been suggested by the patch
> author J. Bruce Fields. [1]
> 
> Only limited testing has been done (running a single server with
> 4.14.96 and these two patches for a week and generating some high
> load on it). It can not be proven by testing, that the patches
> really fix the experienced problem. However, as the patches are
> part of upstream and stable kernels for over a year
> they should be a save pick.
> 
> Backport note: Only line numbers updated.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=202435
> 
> J. Bruce Fields (2):
>   nfsd4: fix cached replies to solo SEQUENCE compounds
>   nfsd4: catch some false session retries
> 
>  fs/nfsd/nfs4state.c | 57 ++++++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/state.h     |  2 ++
>  fs/nfsd/xdr4.h      | 13 +++++++++--
>  3 files changed, 64 insertions(+), 8 deletions(-)
> 
> -- 
> 2.20.0
Greg KH Feb. 11, 2019, 1:34 p.m. UTC | #4
On Tue, Feb 05, 2019 at 03:59:04PM +0100, Donald Buczek wrote:
> On 02/05/19 12:59, Salvatore Bonaccorso wrote:
> 
> > Are you planning to submit the same as well for 4.9 LTS? The two
> > commits apply on top of 4.9.154 with line number updated.
> 
> No, I'm not, because I didn't do any testing with 4.9.
> 
> Additionally, I'm unsure about the right procedure for trivial backports
> to multiple trees: Individual patch sets, which apply perfectly, a single
> patch sets and Greg resolves that for the other trees or maybe no patch
> set at all and just a "please cherry-pick .... from upstream" mail.

The first patch in this series applies to 4.9.y, but the second does
not.

I'll be glad to take a backported, and tested, series, if someone still
cares about NFS for 4.9.y.  But unless you really care about that tree,
I would not worry about it.

thanks,

greg k-h
Greg KH Feb. 11, 2019, 1:34 p.m. UTC | #5
On Tue, Feb 05, 2019 at 11:01:39AM +0100, Donald Buczek wrote:
> Please include the upstream patches
> 
>     *  commit 085def3ade52 ("nfsd4: fix cached replies to solo SEQUENCE
> compounds")
>     *  commit 53da6a53e1d4 ("nfsd4: catch some false session retries")
> 
> into longterm kernel 4.14.
> 
> Because these patches went upstream in 4.15, they are already
> included in longterm 4.19 and stable.

Both now queued up, thanks.

greg k-h
Salvatore Bonaccorso Feb. 11, 2019, 3:59 p.m. UTC | #6
On Mon, Feb 11, 2019 at 02:34:41PM +0100, Greg KH wrote:
> On Tue, Feb 05, 2019 at 03:59:04PM +0100, Donald Buczek wrote:
> > On 02/05/19 12:59, Salvatore Bonaccorso wrote:
> > 
> > > Are you planning to submit the same as well for 4.9 LTS? The two
> > > commits apply on top of 4.9.154 with line number updated.
> > 
> > No, I'm not, because I didn't do any testing with 4.9.
> > 
> > Additionally, I'm unsure about the right procedure for trivial backports
> > to multiple trees: Individual patch sets, which apply perfectly, a single
> > patch sets and Greg resolves that for the other trees or maybe no patch
> > set at all and just a "please cherry-pick .... from upstream" mail.
> 
> The first patch in this series applies to 4.9.y, but the second does
> not.
> 
> I'll be glad to take a backported, and tested, series, if someone still
> cares about NFS for 4.9.y.  But unless you really care about that tree,
> I would not worry about it.

Hmm, both apply on top of 4.9.155, still but with line numbers
adjusted (I'm attaching the respective rebased patches). Actually my
question on the respective backports was originally triggered due to
https://bugs-devel.debian.org/898060

The problem actually is on the 'tested' part, as Donald did no
testing/reproducing with 4.9 itself.

Regards,
Salvatore
From d54d2762cfa2b01fee19e4641f3f242bebd0aded Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" <bfields@redhat.com>
Date: Wed, 18 Oct 2017 16:17:18 -0400
Subject: [PATCH 1/2] nfsd4: fix cached replies to solo SEQUENCE compounds

commit 085def3ade52f2ffe3e31f42e98c27dcc222dd37 upstream.

Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to.  And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops.  This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id.  If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much.  But we can at
least attempt to return valid xdr!

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4state.c | 20 +++++++++++++++-----
 fs/nfsd/state.h     |  1 +
 fs/nfsd/xdr4.h      | 13 +++++++++++--
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 12d780718b48..88168ce0e882 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2344,14 +2344,16 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
 
 	dprintk("--> %s slot %p\n", __func__, slot);
 
+	slot->sl_flags |= NFSD4_SLOT_INITIALIZED;
 	slot->sl_opcnt = resp->opcnt;
 	slot->sl_status = resp->cstate.status;
 
-	slot->sl_flags |= NFSD4_SLOT_INITIALIZED;
-	if (nfsd4_not_cached(resp)) {
-		slot->sl_datalen = 0;
+	if (!nfsd4_cache_this(resp)) {
+		slot->sl_flags &= ~NFSD4_SLOT_CACHED;
 		return;
 	}
+	slot->sl_flags |= NFSD4_SLOT_CACHED;
+
 	base = resp->cstate.data_offset;
 	slot->sl_datalen = buf->len - base;
 	if (read_bytes_from_xdr_buf(buf, base, slot->sl_data, slot->sl_datalen))
@@ -2378,8 +2380,16 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
 	op = &args->ops[resp->opcnt - 1];
 	nfsd4_encode_operation(resp, op);
 
-	/* Return nfserr_retry_uncached_rep in next operation. */
-	if (args->opcnt > 1 && !(slot->sl_flags & NFSD4_SLOT_CACHETHIS)) {
+	if (slot->sl_flags & NFSD4_SLOT_CACHED)
+		return op->status;
+	if (args->opcnt == 1) {
+		/*
+		 * The original operation wasn't a solo sequence--we
+		 * always cache those--so this retry must not match the
+		 * original:
+		 */
+		op->status = nfserr_seq_false_retry;
+	} else {
 		op = &args->ops[resp->opcnt++];
 		op->status = nfserr_retry_uncached_rep;
 		nfsd4_encode_operation(resp, op);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 005c911b34ac..2488b7df1b35 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -174,6 +174,7 @@ struct nfsd4_slot {
 #define NFSD4_SLOT_INUSE	(1 << 0)
 #define NFSD4_SLOT_CACHETHIS	(1 << 1)
 #define NFSD4_SLOT_INITIALIZED	(1 << 2)
+#define NFSD4_SLOT_CACHED	(1 << 3)
 	u8	sl_flags;
 	char	sl_data[];
 };
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 8fda4abdf3b1..448e74e32344 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -645,9 +645,18 @@ static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)
 	return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE;
 }
 
-static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
+/*
+ * The session reply cache only needs to cache replies that the client
+ * actually asked us to.  But it's almost free for us to cache compounds
+ * consisting of only a SEQUENCE op, so we may as well cache those too.
+ * Also, the protocol doesn't give us a convenient response in the case
+ * of a replay of a solo SEQUENCE op that wasn't cached
+ * (RETRY_UNCACHED_REP can only be returned in the second op of a
+ * compound).
+ */
+static inline bool nfsd4_cache_this(struct nfsd4_compoundres *resp)
 {
-	return !(resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)
+	return (resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)
 		|| nfsd4_is_solo_sequence(resp);
 }
Salvatore Bonaccorso Feb. 13, 2019, 6:49 a.m. UTC | #7
On Mon, Feb 11, 2019 at 04:59:26PM +0100, Salvatore Bonaccorso wrote:
> On Mon, Feb 11, 2019 at 02:34:41PM +0100, Greg KH wrote:
> > On Tue, Feb 05, 2019 at 03:59:04PM +0100, Donald Buczek wrote:
> > > On 02/05/19 12:59, Salvatore Bonaccorso wrote:
> > > 
> > > > Are you planning to submit the same as well for 4.9 LTS? The two
> > > > commits apply on top of 4.9.154 with line number updated.
> > > 
> > > No, I'm not, because I didn't do any testing with 4.9.
> > > 
> > > Additionally, I'm unsure about the right procedure for trivial backports
> > > to multiple trees: Individual patch sets, which apply perfectly, a single
> > > patch sets and Greg resolves that for the other trees or maybe no patch
> > > set at all and just a "please cherry-pick .... from upstream" mail.
> > 
> > The first patch in this series applies to 4.9.y, but the second does
> > not.
> > 
> > I'll be glad to take a backported, and tested, series, if someone still
> > cares about NFS for 4.9.y.  But unless you really care about that tree,
> > I would not worry about it.
> 
> Hmm, both apply on top of 4.9.155, still but with line numbers
> adjusted (I'm attaching the respective rebased patches). Actually my
> question on the respective backports was originally triggered due to
> https://bugs-devel.debian.org/898060
> 
> The problem actually is on the 'tested' part, as Donald did no
> testing/reproducing with 4.9 itself.

Attached rebased to apply on top of 4.9.156.

Regards,
Salvatore
From 435d215dcdf1aba9529cc402f68c913e43676445 Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" <bfields@redhat.com>
Date: Wed, 18 Oct 2017 16:17:18 -0400
Subject: nfsd4: fix cached replies to solo SEQUENCE compounds

[ Upstream commit 085def3ade52f2ffe3e31f42e98c27dcc222dd37 ]

Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to.  And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops.  This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id.  If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much.  But we can at
least attempt to return valid xdr!

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4state.c | 20 +++++++++++++++-----
 fs/nfsd/state.h     |  1 +
 fs/nfsd/xdr4.h      | 13 +++++++++++--
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 12d780718b48..88168ce0e882 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2344,14 +2344,16 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
 
 	dprintk("--> %s slot %p\n", __func__, slot);
 
+	slot->sl_flags |= NFSD4_SLOT_INITIALIZED;
 	slot->sl_opcnt = resp->opcnt;
 	slot->sl_status = resp->cstate.status;
 
-	slot->sl_flags |= NFSD4_SLOT_INITIALIZED;
-	if (nfsd4_not_cached(resp)) {
-		slot->sl_datalen = 0;
+	if (!nfsd4_cache_this(resp)) {
+		slot->sl_flags &= ~NFSD4_SLOT_CACHED;
 		return;
 	}
+	slot->sl_flags |= NFSD4_SLOT_CACHED;
+
 	base = resp->cstate.data_offset;
 	slot->sl_datalen = buf->len - base;
 	if (read_bytes_from_xdr_buf(buf, base, slot->sl_data, slot->sl_datalen))
@@ -2378,8 +2380,16 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
 	op = &args->ops[resp->opcnt - 1];
 	nfsd4_encode_operation(resp, op);
 
-	/* Return nfserr_retry_uncached_rep in next operation. */
-	if (args->opcnt > 1 && !(slot->sl_flags & NFSD4_SLOT_CACHETHIS)) {
+	if (slot->sl_flags & NFSD4_SLOT_CACHED)
+		return op->status;
+	if (args->opcnt == 1) {
+		/*
+		 * The original operation wasn't a solo sequence--we
+		 * always cache those--so this retry must not match the
+		 * original:
+		 */
+		op->status = nfserr_seq_false_retry;
+	} else {
 		op = &args->ops[resp->opcnt++];
 		op->status = nfserr_retry_uncached_rep;
 		nfsd4_encode_operation(resp, op);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 005c911b34ac..2488b7df1b35 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -174,6 +174,7 @@ struct nfsd4_slot {
 #define NFSD4_SLOT_INUSE	(1 << 0)
 #define NFSD4_SLOT_CACHETHIS	(1 << 1)
 #define NFSD4_SLOT_INITIALIZED	(1 << 2)
+#define NFSD4_SLOT_CACHED	(1 << 3)
 	u8	sl_flags;
 	char	sl_data[];
 };
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 8fda4abdf3b1..448e74e32344 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -645,9 +645,18 @@ static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)
 	return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE;
 }
 
-static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
+/*
+ * The session reply cache only needs to cache replies that the client
+ * actually asked us to.  But it's almost free for us to cache compounds
+ * consisting of only a SEQUENCE op, so we may as well cache those too.
+ * Also, the protocol doesn't give us a convenient response in the case
+ * of a replay of a solo SEQUENCE op that wasn't cached
+ * (RETRY_UNCACHED_REP can only be returned in the second op of a
+ * compound).
+ */
+static inline bool nfsd4_cache_this(struct nfsd4_compoundres *resp)
 {
-	return !(resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)
+	return (resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)
 		|| nfsd4_is_solo_sequence(resp);
 }
Greg KH Feb. 13, 2019, 2:15 p.m. UTC | #8
On Wed, Feb 13, 2019 at 07:49:56AM +0100, Salvatore Bonaccorso wrote:
> On Mon, Feb 11, 2019 at 04:59:26PM +0100, Salvatore Bonaccorso wrote:
> > On Mon, Feb 11, 2019 at 02:34:41PM +0100, Greg KH wrote:
> > > On Tue, Feb 05, 2019 at 03:59:04PM +0100, Donald Buczek wrote:
> > > > On 02/05/19 12:59, Salvatore Bonaccorso wrote:
> > > > 
> > > > > Are you planning to submit the same as well for 4.9 LTS? The two
> > > > > commits apply on top of 4.9.154 with line number updated.
> > > > 
> > > > No, I'm not, because I didn't do any testing with 4.9.
> > > > 
> > > > Additionally, I'm unsure about the right procedure for trivial backports
> > > > to multiple trees: Individual patch sets, which apply perfectly, a single
> > > > patch sets and Greg resolves that for the other trees or maybe no patch
> > > > set at all and just a "please cherry-pick .... from upstream" mail.
> > > 
> > > The first patch in this series applies to 4.9.y, but the second does
> > > not.
> > > 
> > > I'll be glad to take a backported, and tested, series, if someone still
> > > cares about NFS for 4.9.y.  But unless you really care about that tree,
> > > I would not worry about it.
> > 
> > Hmm, both apply on top of 4.9.155, still but with line numbers
> > adjusted (I'm attaching the respective rebased patches). Actually my
> > question on the respective backports was originally triggered due to
> > https://bugs-devel.debian.org/898060
> > 
> > The problem actually is on the 'tested' part, as Donald did no
> > testing/reproducing with 4.9 itself.
> 
> Attached rebased to apply on top of 4.9.156.

Thanks, both now queued up.

greg k-h