[3/3] NFSv4.1: Detect and retry after OPEN and CLOSE/DOWNGRADE race
diff mbox

Message ID fb76d86caca02c277eea7cb1d469c209b59d06ab.1508248965.git.bcodding@redhat.com
State New
Headers show

Commit Message

Benjamin Coddington Oct. 17, 2017, 2:46 p.m. UTC
If the client issues two simultaneous OPEN calls, and the response to the
first OPEN call (sequence id 1) is delayed before updating the nfs4_state,
then the client's nfs4_state can transition through a complete lifecycle of
OPEN (state sequence id 2), and CLOSE (state sequence id 3).  When the
first OPEN is finally processed, the nfs4_state is incorrectly transitioned
back to NFS_OPEN_STATE for the first OPEN (sequence id 1).  Subsequent calls
to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state
recovery.

Fix this by passing back the result of need_update_open_stateid() to the
open() path, with the result to try again if the OPEN's stateid should not
be updated.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/nfs4proc.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

Comments

Trond Myklebust Oct. 17, 2017, 3:49 p.m. UTC | #1
T24gVHVlLCAyMDE3LTEwLTE3IGF0IDEwOjQ2IC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBJZiB0aGUgY2xpZW50IGlzc3VlcyB0d28gc2ltdWx0YW5lb3VzIE9QRU4gY2FsbHMs
IGFuZCB0aGUgcmVzcG9uc2UgdG8NCj4gdGhlDQo+IGZpcnN0IE9QRU4gY2FsbCAoc2VxdWVuY2Ug
aWQgMSkgaXMgZGVsYXllZCBiZWZvcmUgdXBkYXRpbmcgdGhlDQo+IG5mczRfc3RhdGUsDQo+IHRo
ZW4gdGhlIGNsaWVudCdzIG5mczRfc3RhdGUgY2FuIHRyYW5zaXRpb24gdGhyb3VnaCBhIGNvbXBs
ZXRlDQo+IGxpZmVjeWNsZSBvZg0KPiBPUEVOIChzdGF0ZSBzZXF1ZW5jZSBpZCAyKSwgYW5kIENM
T1NFIChzdGF0ZSBzZXF1ZW5jZSBpZCAzKS4gIFdoZW4NCj4gdGhlDQo+IGZpcnN0IE9QRU4gaXMg
ZmluYWxseSBwcm9jZXNzZWQsIHRoZSBuZnM0X3N0YXRlIGlzIGluY29ycmVjdGx5DQo+IHRyYW5z
aXRpb25lZA0KPiBiYWNrIHRvIE5GU19PUEVOX1NUQVRFIGZvciB0aGUgZmlyc3QgT1BFTiAoc2Vx
dWVuY2UgaWQNCj4gMSkuICBTdWJzZXF1ZW50IGNhbGxzDQo+IHRvIExPQ0sgb3IgQ0xPU0Ugd2ls
bCByZWNlaXZlIE5GUzRFUlJfQkFEX1NUQVRFSUQsIGFuZCB0cmlnZ2VyIHN0YXRlDQo+IHJlY292
ZXJ5Lg0KPiANCj4gRml4IHRoaXMgYnkgcGFzc2luZyBiYWNrIHRoZSByZXN1bHQgb2YgbmVlZF91
cGRhdGVfb3Blbl9zdGF0ZWlkKCkgdG8NCj4gdGhlDQo+IG9wZW4oKSBwYXRoLCB3aXRoIHRoZSBy
ZXN1bHQgdG8gdHJ5IGFnYWluIGlmIHRoZSBPUEVOJ3Mgc3RhdGVpZA0KPiBzaG91bGQgbm90DQo+
IGJlIHVwZGF0ZWQuDQo+IA0KDQpXaHkgYXJlIHdlIHdvcnJpZWQgYWJvdXQgdGhlIHNwZWNpYWwg
Y2FzZSB3aGVyZSB0aGUgY2xpZW50IGFjdHVhbGx5DQpmaW5kcyB0aGUgY2xvc2VkIHN0YXRlaWQg
aW4gaXRzIGNhY2hlPw0KDQpJbiB0aGUgbW9yZSBnZW5lcmFsIGNhc2Ugb2YgeW91ciByYWNlLCB0
aGUgc3RhdGVpZCBtaWdodCBub3QgYmUgZm91bmQNCmF0IGFsbCBiZWNhdXNlIHRoZSBDTE9TRSBj
b21wbGV0ZXMgYW5kIGlzIHByb2Nlc3NlZCBvbiB0aGUgY2xpZW50DQpiZWZvcmUgaXQgY2FuIHBy
b2Nlc3MgdGhlIHJlcGx5IGZyb20gdGhlIGRlbGF5ZWQgT1BFTi4gSWYgc28sIHdlIHJlYWxseQ0K
aGF2ZSBubyB3YXkgdG8gZGV0ZWN0IHRoYXQgdGhlIGZpbGUgaGFzIGFjdHVhbGx5IGJlZW4gY2xv
c2VkIGJ5IHRoZQ0Kc2VydmVyIHVudGlsIHdlIHNlZSB0aGUgTkZTNEVSUl9CQURfU1RBVEVJRC4N
Cg0KTm90ZSBhbHNvIHRoYXQgc2VjdGlvbiAxOC4yLjQgc2F5cyB0aGF0ICJ0aGUgc2VydmVyIFNI
T1VMRCByZXR1cm4gdGhlDQppbnZhbGlkIHNwZWNpYWwgc3RhdGVpZCAodGhlICJvdGhlciIgdmFs
dWUgaXMgemVybyBhbmQgdGhlICJzZXFpZCINCmZpZWxkIGlzIE5GUzRfVUlOVDMyX01BWCwgc2Vl
IFNlY3Rpb24gOC4yLjMpIiB3aGljaCBmdXJ0aGVyIGVuc3VyZXMNCnRoYXQgd2Ugc2hvdWxkIG5v
dCBiZSBhYmxlIHRvIG1hdGNoIHRoZSBPUEVOIHN0YXRlaWQgb25jZSB0aGUgQ0xPU0UgaXMNCnBy
b2Nlc3NlZCBvbiB0aGUgY2xpZW50Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZT
IGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlk
YXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Coddington Oct. 17, 2017, 5:33 p.m. UTC | #2
On 17 Oct 2017, at 11:49, Trond Myklebust wrote:

> On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:
>> If the client issues two simultaneous OPEN calls, and the response to
>> the
>> first OPEN call (sequence id 1) is delayed before updating the
>> nfs4_state,
>> then the client's nfs4_state can transition through a complete
>> lifecycle of
>> OPEN (state sequence id 2), and CLOSE (state sequence id 3).  When
>> the
>> first OPEN is finally processed, the nfs4_state is incorrectly
>> transitioned
>> back to NFS_OPEN_STATE for the first OPEN (sequence id
>> 1).  Subsequent calls
>> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state
>> recovery.
>>
>> Fix this by passing back the result of need_update_open_stateid() to
>> the
>> open() path, with the result to try again if the OPEN's stateid
>> should not
>> be updated.
>>
>
> Why are we worried about the special case where the client actually
> finds the closed stateid in its cache?

Because I am hitting that case very frequently in generic/089, and I hate
how unnecessary state recovery slows everything down.  I'm also seeing a
problem where generic/089 never completes, and I was trying to get this
behavior out of the way.

> In the more general case of your race, the stateid might not be found
> at all because the CLOSE completes and is processed on the client
> before it can process the reply from the delayed OPEN. If so, we really
> have no way to detect that the file has actually been closed by the
> server until we see the NFS4ERR_BAD_STATEID.

I mentioned this case in the cover letter.  It's possible that the client
could retain a record of a closed stateid in order to retry an OPEN in that
case.  Another approach may be to detect 'holes' in the state id sequence
and not call CLOSE until each id is processed.  I think there's an existing
problem now where this race (without the CLOSE) can incorrectly update the
state flags with the result of the delayed OPEN's mode.

> Note also that section 18.2.4 says that "the server SHOULD return the
> invalid special stateid (the "other" value is zero and the "seqid"
> field is NFS4_UINT32_MAX, see Section 8.2.3)" which further ensures
> that we should not be able to match the OPEN stateid once the CLOSE is
> processed on the client.

Ah, right, so need_update_open_stateid() should handle that case.  But that
doesn't mean there's no way to match the OPEN statid after a CLOSE.  We
could still keep a record of the closed state with a flag instead of an
incremented sequence id.

So perhaps keeping track of holes in the sequence would be a better
approach, but then the OPEN response handling needs to call CLOSE in case
the OPEN fails.. that seems more complicated.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 17, 2017, 5:42 p.m. UTC | #3
On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote:
> On 17 Oct 2017, at 11:49, Trond Myklebust wrote:

> 

> > On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:

> > > If the client issues two simultaneous OPEN calls, and the

> > > response to

> > > the

> > > first OPEN call (sequence id 1) is delayed before updating the

> > > nfs4_state,

> > > then the client's nfs4_state can transition through a complete

> > > lifecycle of

> > > OPEN (state sequence id 2), and CLOSE (state sequence id

> > > 3).  When

> > > the

> > > first OPEN is finally processed, the nfs4_state is incorrectly

> > > transitioned

> > > back to NFS_OPEN_STATE for the first OPEN (sequence id

> > > 1).  Subsequent calls

> > > to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger

> > > state

> > > recovery.

> > > 

> > > Fix this by passing back the result of need_update_open_stateid()

> > > to

> > > the

> > > open() path, with the result to try again if the OPEN's stateid

> > > should not

> > > be updated.

> > > 

> > 

> > Why are we worried about the special case where the client actually

> > finds the closed stateid in its cache?

> 

> Because I am hitting that case very frequently in generic/089, and I

> hate

> how unnecessary state recovery slows everything down.  I'm also 


Why is it being hit. Is the client processing stuff out of order?

> seeing a

> problem where generic/089 never completes, and I was trying to get

> this

> behavior out of the way.

> 

> > In the more general case of your race, the stateid might not be

> > found

> > at all because the CLOSE completes and is processed on the client

> > before it can process the reply from the delayed OPEN. If so, we

> > really

> > have no way to detect that the file has actually been closed by the

> > server until we see the NFS4ERR_BAD_STATEID.

> 

> I mentioned this case in the cover letter.  It's possible that the

> client

> could retain a record of a closed stateid in order to retry an OPEN

> in that


That would require us to retain all stateids until there are no more
pending OPEN calls.

> case.  Another approach may be to detect 'holes' in the state id

> sequence

> and not call CLOSE until each id is processed.  I think there's an

> existing


We can't know we have a hole until we know the starting value of the
seqid, which is undefined according to RFC 5661 section 3.3.12.

> problem now where this race (without the CLOSE) can incorrectly

> update the

> state flags with the result of the delayed OPEN's mode.

> 

> > Note also that section 18.2.4 says that "the server SHOULD return

> > the

> > invalid special stateid (the "other" value is zero and the "seqid"

> > field is NFS4_UINT32_MAX, see Section 8.2.3)" which further ensures

> > that we should not be able to match the OPEN stateid once the CLOSE

> > is

> > processed on the client.

> 

> Ah, right, so need_update_open_stateid() should handle that

> case.  But that

> doesn't mean there's no way to match the OPEN statid after a

> CLOSE.  We

> could still keep a record of the closed state with a flag instead of

> an

> incremented sequence id.

> 

> So perhaps keeping track of holes in the sequence would be a better

> approach, but then the OPEN response handling needs to call CLOSE in

> case

> the OPEN fails.. that seems more complicated.

> 

> Ben

> 

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
Benjamin Coddington Oct. 17, 2017, 5:52 p.m. UTC | #4
On 17 Oct 2017, at 13:42, Trond Myklebust wrote:

> On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote:
>> On 17 Oct 2017, at 11:49, Trond Myklebust wrote:
>>
>>> On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:
>>>> If the client issues two simultaneous OPEN calls, and the
>>>> response to
>>>> the
>>>> first OPEN call (sequence id 1) is delayed before updating the
>>>> nfs4_state,
>>>> then the client's nfs4_state can transition through a complete
>>>> lifecycle of
>>>> OPEN (state sequence id 2), and CLOSE (state sequence id
>>>> 3).  When
>>>> the
>>>> first OPEN is finally processed, the nfs4_state is incorrectly
>>>> transitioned
>>>> back to NFS_OPEN_STATE for the first OPEN (sequence id
>>>> 1).  Subsequent calls
>>>> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger
>>>> state
>>>> recovery.
>>>>
>>>> Fix this by passing back the result of need_update_open_stateid()
>>>> to
>>>> the
>>>> open() path, with the result to try again if the OPEN's stateid
>>>> should not
>>>> be updated.
>>>>
>>>
>>> Why are we worried about the special case where the client actually
>>> finds the closed stateid in its cache?
>>
>> Because I am hitting that case very frequently in generic/089, and I
>> hate
>> how unnecessary state recovery slows everything down.  I'm also
>
> Why is it being hit. Is the client processing stuff out of order?

Yes.

>>> In the more general case of your race, the stateid might not be
>>> found
>>> at all because the CLOSE completes and is processed on the client
>>> before it can process the reply from the delayed OPEN. If so, we
>>> really
>>> have no way to detect that the file has actually been closed by the
>>> server until we see the NFS4ERR_BAD_STATEID.
>>
>> I mentioned this case in the cover letter.  It's possible that the
>> client
>> could retain a record of a closed stateid in order to retry an OPEN
>> in that
>
> That would require us to retain all stateids until there are no more
> pending OPEN calls.
>
>> case.  Another approach may be to detect 'holes' in the state id
>> sequence
>> and not call CLOSE until each id is processed.  I think there's an
>> existing
>
> We can't know we have a hole until we know the starting value of the
> seqid, which is undefined according to RFC 5661 section 3.3.12.

Ah, yuck.  I read 8.2.2:

   When such a set of locks is first created, the server returns a
   stateid with seqid value of one.

.. and went from there.  Is this a conflict in the spec?

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 17, 2017, 6:26 p.m. UTC | #5
On Tue, 2017-10-17 at 13:52 -0400, Benjamin Coddington wrote:
> On 17 Oct 2017, at 13:42, Trond Myklebust wrote:

> 

> > On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote:

> > > On 17 Oct 2017, at 11:49, Trond Myklebust wrote:

> > > 

> > > > On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:

> > > > > If the client issues two simultaneous OPEN calls, and the

> > > > > response to

> > > > > the

> > > > > first OPEN call (sequence id 1) is delayed before updating

> > > > > the

> > > > > nfs4_state,

> > > > > then the client's nfs4_state can transition through a

> > > > > complete

> > > > > lifecycle of

> > > > > OPEN (state sequence id 2), and CLOSE (state sequence id

> > > > > 3).  When

> > > > > the

> > > > > first OPEN is finally processed, the nfs4_state is

> > > > > incorrectly

> > > > > transitioned

> > > > > back to NFS_OPEN_STATE for the first OPEN (sequence id

> > > > > 1).  Subsequent calls

> > > > > to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and

> > > > > trigger

> > > > > state

> > > > > recovery.

> > > > > 

> > > > > Fix this by passing back the result of

> > > > > need_update_open_stateid()

> > > > > to

> > > > > the

> > > > > open() path, with the result to try again if the OPEN's

> > > > > stateid

> > > > > should not

> > > > > be updated.

> > > > > 

> > > > 

> > > > Why are we worried about the special case where the client

> > > > actually

> > > > finds the closed stateid in its cache?

> > > 

> > > Because I am hitting that case very frequently in generic/089,

> > > and I

> > > hate

> > > how unnecessary state recovery slows everything down.  I'm also

> > 

> > Why is it being hit. Is the client processing stuff out of order?

> 

> Yes.

> 

> > > > In the more general case of your race, the stateid might not be

> > > > found

> > > > at all because the CLOSE completes and is processed on the

> > > > client

> > > > before it can process the reply from the delayed OPEN. If so,

> > > > we

> > > > really

> > > > have no way to detect that the file has actually been closed by

> > > > the

> > > > server until we see the NFS4ERR_BAD_STATEID.

> > > 

> > > I mentioned this case in the cover letter.  It's possible that

> > > the

> > > client

> > > could retain a record of a closed stateid in order to retry an

> > > OPEN

> > > in that

> > 

> > That would require us to retain all stateids until there are no

> > more

> > pending OPEN calls.

> > 

> > > case.  Another approach may be to detect 'holes' in the state id

> > > sequence

> > > and not call CLOSE until each id is processed.  I think there's

> > > an

> > > existing

> > 

> > We can't know we have a hole until we know the starting value of

> > the

> > seqid, which is undefined according to RFC 5661 section 3.3.12.

> 

> Ah, yuck.  I read 8.2.2:

> 

>    When such a set of locks is first created, the server returns a

>    stateid with seqid value of one.

> 

> .. and went from there.  Is this a conflict in the spec?

> 


Hmm... I note that the equivalent section 2.2.16 in RFC7530 has been
changed to remove the bit about the starting value of seqid being
undefined.

So, if the spec allows us to rely on the seqid always being initialised
to 1, then we might at least be able to detect that we have to replay
the open.
One way to do so might be to keep a count of the number of outstanding
seqids, and then force OPEN_DOWNGRADE/CLOSE to wait until that number
hits 0 (or until a state recovery is started).

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
Benjamin Coddington Oct. 17, 2017, 8:29 p.m. UTC | #6
On 17 Oct 2017, at 14:26, Trond Myklebust wrote:
> On Tue, 2017-10-17 at 13:52 -0400, Benjamin Coddington wrote:
>> Ah, yuck.  I read 8.2.2:
>>
>>    When such a set of locks is first created, the server returns a
>>    stateid with seqid value of one.
>>
>> .. and went from there.  Is this a conflict in the spec?
>>
>
> Hmm... I note that the equivalent section 2.2.16 in RFC7530 has been
> changed to remove the bit about the starting value of seqid being
> undefined.
>
> So, if the spec allows us to rely on the seqid always being initialised
> to 1, then we might at least be able to detect that we have to replay
> the open.
> One way to do so might be to keep a count of the number of outstanding
> seqids, and then force OPEN_DOWNGRADE/CLOSE to wait until that number
> hits 0 (or until a state recovery is started).

I can take that approach for another spin, but due to other priorities, I'll
likely not come back to this until next week.  Thanks for your comments.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 40e431ea1bdc..632136bfd3b3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1352,9 +1352,9 @@  static void nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
 static bool nfs_need_update_open_stateid(struct nfs4_state *state,
 		const nfs4_stateid *stateid, nfs4_stateid *freeme)
 {
-	if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0)
-		return true;
 	if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) {
+		if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0)
+			return true;
 		nfs4_stateid_copy(freeme, &state->open_stateid);
 		nfs_test_and_clear_all_open_stateid(state);
 		return true;
@@ -1468,7 +1468,8 @@  static int update_open_stateid(struct nfs4_state *state,
 		ret = 1;
 	}
 
-	if (open_stateid) {
+	if (open_stateid &&
+		nfs_need_update_open_stateid(state, open_stateid, &freeme)) {
 		switch (fmode) {
 			case FMODE_READ:
 				set_bit(NFS_O_RDONLY_STATE, &state->flags);
@@ -1479,11 +1480,10 @@  static int update_open_stateid(struct nfs4_state *state,
 			case FMODE_READ|FMODE_WRITE:
 				set_bit(NFS_O_RDWR_STATE, &state->flags);
 		}
-		if (nfs_need_update_open_stateid(state, open_stateid, &freeme)) {
-			if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
-				nfs4_stateid_copy(&state->stateid, open_stateid);
-			nfs4_stateid_copy(&state->open_stateid, open_stateid);
-		}
+
+		if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
+			nfs4_stateid_copy(&state->stateid, open_stateid);
+		nfs4_stateid_copy(&state->open_stateid, open_stateid);
 		ret = 1;
 	}
 
@@ -1675,8 +1675,12 @@  _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
 		goto err_put_inode;
 	if (data->o_res.delegation_type != 0)
 		nfs4_opendata_check_deleg(data, state);
-	update_open_stateid(state, &data->o_res.stateid, NULL,
-			data->o_arg.fmode);
+	ret = -EAGAIN;
+	if (!update_open_stateid(state, &data->o_res.stateid, NULL,
+			data->o_arg.fmode)) {
+		nfs4_put_open_state(state);
+		goto err_put_inode;
+	}
 	iput(inode);
 out:
 	nfs_release_seqid(data->o_arg.seqid);