Message ID | fb76d86caca02c277eea7cb1d469c209b59d06ab.1508248965.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
T24gVHVlLCAyMDE3LTEwLTE3IGF0IDEwOjQ2IC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy b3RlOg0KPiBJZiB0aGUgY2xpZW50IGlzc3VlcyB0d28gc2ltdWx0YW5lb3VzIE9QRU4gY2FsbHMs IGFuZCB0aGUgcmVzcG9uc2UgdG8NCj4gdGhlDQo+IGZpcnN0IE9QRU4gY2FsbCAoc2VxdWVuY2Ug aWQgMSkgaXMgZGVsYXllZCBiZWZvcmUgdXBkYXRpbmcgdGhlDQo+IG5mczRfc3RhdGUsDQo+IHRo ZW4gdGhlIGNsaWVudCdzIG5mczRfc3RhdGUgY2FuIHRyYW5zaXRpb24gdGhyb3VnaCBhIGNvbXBs ZXRlDQo+IGxpZmVjeWNsZSBvZg0KPiBPUEVOIChzdGF0ZSBzZXF1ZW5jZSBpZCAyKSwgYW5kIENM T1NFIChzdGF0ZSBzZXF1ZW5jZSBpZCAzKS4gIFdoZW4NCj4gdGhlDQo+IGZpcnN0IE9QRU4gaXMg ZmluYWxseSBwcm9jZXNzZWQsIHRoZSBuZnM0X3N0YXRlIGlzIGluY29ycmVjdGx5DQo+IHRyYW5z aXRpb25lZA0KPiBiYWNrIHRvIE5GU19PUEVOX1NUQVRFIGZvciB0aGUgZmlyc3QgT1BFTiAoc2Vx dWVuY2UgaWQNCj4gMSkuICBTdWJzZXF1ZW50IGNhbGxzDQo+IHRvIExPQ0sgb3IgQ0xPU0Ugd2ls bCByZWNlaXZlIE5GUzRFUlJfQkFEX1NUQVRFSUQsIGFuZCB0cmlnZ2VyIHN0YXRlDQo+IHJlY292 ZXJ5Lg0KPiANCj4gRml4IHRoaXMgYnkgcGFzc2luZyBiYWNrIHRoZSByZXN1bHQgb2YgbmVlZF91 cGRhdGVfb3Blbl9zdGF0ZWlkKCkgdG8NCj4gdGhlDQo+IG9wZW4oKSBwYXRoLCB3aXRoIHRoZSBy ZXN1bHQgdG8gdHJ5IGFnYWluIGlmIHRoZSBPUEVOJ3Mgc3RhdGVpZA0KPiBzaG91bGQgbm90DQo+ IGJlIHVwZGF0ZWQuDQo+IA0KDQpXaHkgYXJlIHdlIHdvcnJpZWQgYWJvdXQgdGhlIHNwZWNpYWwg Y2FzZSB3aGVyZSB0aGUgY2xpZW50IGFjdHVhbGx5DQpmaW5kcyB0aGUgY2xvc2VkIHN0YXRlaWQg aW4gaXRzIGNhY2hlPw0KDQpJbiB0aGUgbW9yZSBnZW5lcmFsIGNhc2Ugb2YgeW91ciByYWNlLCB0 aGUgc3RhdGVpZCBtaWdodCBub3QgYmUgZm91bmQNCmF0IGFsbCBiZWNhdXNlIHRoZSBDTE9TRSBj b21wbGV0ZXMgYW5kIGlzIHByb2Nlc3NlZCBvbiB0aGUgY2xpZW50DQpiZWZvcmUgaXQgY2FuIHBy b2Nlc3MgdGhlIHJlcGx5IGZyb20gdGhlIGRlbGF5ZWQgT1BFTi4gSWYgc28sIHdlIHJlYWxseQ0K aGF2ZSBubyB3YXkgdG8gZGV0ZWN0IHRoYXQgdGhlIGZpbGUgaGFzIGFjdHVhbGx5IGJlZW4gY2xv c2VkIGJ5IHRoZQ0Kc2VydmVyIHVudGlsIHdlIHNlZSB0aGUgTkZTNEVSUl9CQURfU1RBVEVJRC4N Cg0KTm90ZSBhbHNvIHRoYXQgc2VjdGlvbiAxOC4yLjQgc2F5cyB0aGF0ICJ0aGUgc2VydmVyIFNI T1VMRCByZXR1cm4gdGhlDQppbnZhbGlkIHNwZWNpYWwgc3RhdGVpZCAodGhlICJvdGhlciIgdmFs dWUgaXMgemVybyBhbmQgdGhlICJzZXFpZCINCmZpZWxkIGlzIE5GUzRfVUlOVDMyX01BWCwgc2Vl IFNlY3Rpb24gOC4yLjMpIiB3aGljaCBmdXJ0aGVyIGVuc3VyZXMNCnRoYXQgd2Ugc2hvdWxkIG5v dCBiZSBhYmxlIHRvIG1hdGNoIHRoZSBPUEVOIHN0YXRlaWQgb25jZSB0aGUgQ0xPU0UgaXMNCnBy b2Nlc3NlZCBvbiB0aGUgY2xpZW50Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZT IGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlk YXRhLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17 Oct 2017, at 11:49, Trond Myklebust wrote: > On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote: >> If the client issues two simultaneous OPEN calls, and the response to >> the >> first OPEN call (sequence id 1) is delayed before updating the >> nfs4_state, >> then the client's nfs4_state can transition through a complete >> lifecycle of >> OPEN (state sequence id 2), and CLOSE (state sequence id 3). When >> the >> first OPEN is finally processed, the nfs4_state is incorrectly >> transitioned >> back to NFS_OPEN_STATE for the first OPEN (sequence id >> 1). Subsequent calls >> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state >> recovery. >> >> Fix this by passing back the result of need_update_open_stateid() to >> the >> open() path, with the result to try again if the OPEN's stateid >> should not >> be updated. >> > > Why are we worried about the special case where the client actually > finds the closed stateid in its cache? Because I am hitting that case very frequently in generic/089, and I hate how unnecessary state recovery slows everything down. I'm also seeing a problem where generic/089 never completes, and I was trying to get this behavior out of the way. > In the more general case of your race, the stateid might not be found > at all because the CLOSE completes and is processed on the client > before it can process the reply from the delayed OPEN. If so, we really > have no way to detect that the file has actually been closed by the > server until we see the NFS4ERR_BAD_STATEID. I mentioned this case in the cover letter. It's possible that the client could retain a record of a closed stateid in order to retry an OPEN in that case. Another approach may be to detect 'holes' in the state id sequence and not call CLOSE until each id is processed. I think there's an existing problem now where this race (without the CLOSE) can incorrectly update the state flags with the result of the delayed OPEN's mode. > Note also that section 18.2.4 says that "the server SHOULD return the > invalid special stateid (the "other" value is zero and the "seqid" > field is NFS4_UINT32_MAX, see Section 8.2.3)" which further ensures > that we should not be able to match the OPEN stateid once the CLOSE is > processed on the client. Ah, right, so need_update_open_stateid() should handle that case. But that doesn't mean there's no way to match the OPEN statid after a CLOSE. We could still keep a record of the closed state with a flag instead of an incremented sequence id. So perhaps keeping track of holes in the sequence would be a better approach, but then the OPEN response handling needs to call CLOSE in case the OPEN fails.. that seems more complicated. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote: > On 17 Oct 2017, at 11:49, Trond Myklebust wrote: > > > On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote: > > > If the client issues two simultaneous OPEN calls, and the > > > response to > > > the > > > first OPEN call (sequence id 1) is delayed before updating the > > > nfs4_state, > > > then the client's nfs4_state can transition through a complete > > > lifecycle of > > > OPEN (state sequence id 2), and CLOSE (state sequence id > > > 3). When > > > the > > > first OPEN is finally processed, the nfs4_state is incorrectly > > > transitioned > > > back to NFS_OPEN_STATE for the first OPEN (sequence id > > > 1). Subsequent calls > > > to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger > > > state > > > recovery. > > > > > > Fix this by passing back the result of need_update_open_stateid() > > > to > > > the > > > open() path, with the result to try again if the OPEN's stateid > > > should not > > > be updated. > > > > > > > Why are we worried about the special case where the client actually > > finds the closed stateid in its cache? > > Because I am hitting that case very frequently in generic/089, and I > hate > how unnecessary state recovery slows everything down. I'm also Why is it being hit. Is the client processing stuff out of order? > seeing a > problem where generic/089 never completes, and I was trying to get > this > behavior out of the way. > > > In the more general case of your race, the stateid might not be > > found > > at all because the CLOSE completes and is processed on the client > > before it can process the reply from the delayed OPEN. If so, we > > really > > have no way to detect that the file has actually been closed by the > > server until we see the NFS4ERR_BAD_STATEID. > > I mentioned this case in the cover letter. It's possible that the > client > could retain a record of a closed stateid in order to retry an OPEN > in that That would require us to retain all stateids until there are no more pending OPEN calls. > case. Another approach may be to detect 'holes' in the state id > sequence > and not call CLOSE until each id is processed. I think there's an > existing We can't know we have a hole until we know the starting value of the seqid, which is undefined according to RFC 5661 section 3.3.12. > problem now where this race (without the CLOSE) can incorrectly > update the > state flags with the result of the delayed OPEN's mode. > > > Note also that section 18.2.4 says that "the server SHOULD return > > the > > invalid special stateid (the "other" value is zero and the "seqid" > > field is NFS4_UINT32_MAX, see Section 8.2.3)" which further ensures > > that we should not be able to match the OPEN stateid once the CLOSE > > is > > processed on the client. > > Ah, right, so need_update_open_stateid() should handle that > case. But that > doesn't mean there's no way to match the OPEN statid after a > CLOSE. We > could still keep a record of the closed state with a flag instead of > an > incremented sequence id. > > So perhaps keeping track of holes in the sequence would be a better > approach, but then the OPEN response handling needs to call CLOSE in > case > the OPEN fails.. that seems more complicated. > > Ben > -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On 17 Oct 2017, at 13:42, Trond Myklebust wrote: > On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote: >> On 17 Oct 2017, at 11:49, Trond Myklebust wrote: >> >>> On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote: >>>> If the client issues two simultaneous OPEN calls, and the >>>> response to >>>> the >>>> first OPEN call (sequence id 1) is delayed before updating the >>>> nfs4_state, >>>> then the client's nfs4_state can transition through a complete >>>> lifecycle of >>>> OPEN (state sequence id 2), and CLOSE (state sequence id >>>> 3). When >>>> the >>>> first OPEN is finally processed, the nfs4_state is incorrectly >>>> transitioned >>>> back to NFS_OPEN_STATE for the first OPEN (sequence id >>>> 1). Subsequent calls >>>> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger >>>> state >>>> recovery. >>>> >>>> Fix this by passing back the result of need_update_open_stateid() >>>> to >>>> the >>>> open() path, with the result to try again if the OPEN's stateid >>>> should not >>>> be updated. >>>> >>> >>> Why are we worried about the special case where the client actually >>> finds the closed stateid in its cache? >> >> Because I am hitting that case very frequently in generic/089, and I >> hate >> how unnecessary state recovery slows everything down. I'm also > > Why is it being hit. Is the client processing stuff out of order? Yes. >>> In the more general case of your race, the stateid might not be >>> found >>> at all because the CLOSE completes and is processed on the client >>> before it can process the reply from the delayed OPEN. If so, we >>> really >>> have no way to detect that the file has actually been closed by the >>> server until we see the NFS4ERR_BAD_STATEID. >> >> I mentioned this case in the cover letter. It's possible that the >> client >> could retain a record of a closed stateid in order to retry an OPEN >> in that > > That would require us to retain all stateids until there are no more > pending OPEN calls. > >> case. Another approach may be to detect 'holes' in the state id >> sequence >> and not call CLOSE until each id is processed. I think there's an >> existing > > We can't know we have a hole until we know the starting value of the > seqid, which is undefined according to RFC 5661 section 3.3.12. Ah, yuck. I read 8.2.2: When such a set of locks is first created, the server returns a stateid with seqid value of one. .. and went from there. Is this a conflict in the spec? Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-10-17 at 13:52 -0400, Benjamin Coddington wrote: > On 17 Oct 2017, at 13:42, Trond Myklebust wrote: > > > On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote: > > > On 17 Oct 2017, at 11:49, Trond Myklebust wrote: > > > > > > > On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote: > > > > > If the client issues two simultaneous OPEN calls, and the > > > > > response to > > > > > the > > > > > first OPEN call (sequence id 1) is delayed before updating > > > > > the > > > > > nfs4_state, > > > > > then the client's nfs4_state can transition through a > > > > > complete > > > > > lifecycle of > > > > > OPEN (state sequence id 2), and CLOSE (state sequence id > > > > > 3). When > > > > > the > > > > > first OPEN is finally processed, the nfs4_state is > > > > > incorrectly > > > > > transitioned > > > > > back to NFS_OPEN_STATE for the first OPEN (sequence id > > > > > 1). Subsequent calls > > > > > to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and > > > > > trigger > > > > > state > > > > > recovery. > > > > > > > > > > Fix this by passing back the result of > > > > > need_update_open_stateid() > > > > > to > > > > > the > > > > > open() path, with the result to try again if the OPEN's > > > > > stateid > > > > > should not > > > > > be updated. > > > > > > > > > > > > > Why are we worried about the special case where the client > > > > actually > > > > finds the closed stateid in its cache? > > > > > > Because I am hitting that case very frequently in generic/089, > > > and I > > > hate > > > how unnecessary state recovery slows everything down. I'm also > > > > Why is it being hit. Is the client processing stuff out of order? > > Yes. > > > > > In the more general case of your race, the stateid might not be > > > > found > > > > at all because the CLOSE completes and is processed on the > > > > client > > > > before it can process the reply from the delayed OPEN. If so, > > > > we > > > > really > > > > have no way to detect that the file has actually been closed by > > > > the > > > > server until we see the NFS4ERR_BAD_STATEID. > > > > > > I mentioned this case in the cover letter. It's possible that > > > the > > > client > > > could retain a record of a closed stateid in order to retry an > > > OPEN > > > in that > > > > That would require us to retain all stateids until there are no > > more > > pending OPEN calls. > > > > > case. Another approach may be to detect 'holes' in the state id > > > sequence > > > and not call CLOSE until each id is processed. I think there's > > > an > > > existing > > > > We can't know we have a hole until we know the starting value of > > the > > seqid, which is undefined according to RFC 5661 section 3.3.12. > > Ah, yuck. I read 8.2.2: > > When such a set of locks is first created, the server returns a > stateid with seqid value of one. > > .. and went from there. Is this a conflict in the spec? > Hmm... I note that the equivalent section 2.2.16 in RFC7530 has been changed to remove the bit about the starting value of seqid being undefined. So, if the spec allows us to rely on the seqid always being initialised to 1, then we might at least be able to detect that we have to replay the open. One way to do so might be to keep a count of the number of outstanding seqids, and then force OPEN_DOWNGRADE/CLOSE to wait until that number hits 0 (or until a state recovery is started). -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On 17 Oct 2017, at 14:26, Trond Myklebust wrote: > On Tue, 2017-10-17 at 13:52 -0400, Benjamin Coddington wrote: >> Ah, yuck. I read 8.2.2: >> >> When such a set of locks is first created, the server returns a >> stateid with seqid value of one. >> >> .. and went from there. Is this a conflict in the spec? >> > > Hmm... I note that the equivalent section 2.2.16 in RFC7530 has been > changed to remove the bit about the starting value of seqid being > undefined. > > So, if the spec allows us to rely on the seqid always being initialised > to 1, then we might at least be able to detect that we have to replay > the open. > One way to do so might be to keep a count of the number of outstanding > seqids, and then force OPEN_DOWNGRADE/CLOSE to wait until that number > hits 0 (or until a state recovery is started). I can take that approach for another spin, but due to other priorities, I'll likely not come back to this until next week. Thanks for your comments. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 40e431ea1bdc..632136bfd3b3 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1352,9 +1352,9 @@ static void nfs_test_and_clear_all_open_stateid(struct nfs4_state *state) static bool nfs_need_update_open_stateid(struct nfs4_state *state, const nfs4_stateid *stateid, nfs4_stateid *freeme) { - if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0) - return true; if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) { + if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0) + return true; nfs4_stateid_copy(freeme, &state->open_stateid); nfs_test_and_clear_all_open_stateid(state); return true; @@ -1468,7 +1468,8 @@ static int update_open_stateid(struct nfs4_state *state, ret = 1; } - if (open_stateid) { + if (open_stateid && + nfs_need_update_open_stateid(state, open_stateid, &freeme)) { switch (fmode) { case FMODE_READ: set_bit(NFS_O_RDONLY_STATE, &state->flags); @@ -1479,11 +1480,10 @@ static int update_open_stateid(struct nfs4_state *state, case FMODE_READ|FMODE_WRITE: set_bit(NFS_O_RDWR_STATE, &state->flags); } - if (nfs_need_update_open_stateid(state, open_stateid, &freeme)) { - if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0) - nfs4_stateid_copy(&state->stateid, open_stateid); - nfs4_stateid_copy(&state->open_stateid, open_stateid); - } + + if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0) + nfs4_stateid_copy(&state->stateid, open_stateid); + nfs4_stateid_copy(&state->open_stateid, open_stateid); ret = 1; } @@ -1675,8 +1675,12 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data) goto err_put_inode; if (data->o_res.delegation_type != 0) nfs4_opendata_check_deleg(data, state); - update_open_stateid(state, &data->o_res.stateid, NULL, - data->o_arg.fmode); + ret = -EAGAIN; + if (!update_open_stateid(state, &data->o_res.stateid, NULL, + data->o_arg.fmode)) { + nfs4_put_open_state(state); + goto err_put_inode; + } iput(inode); out: nfs_release_seqid(data->o_arg.seqid);
If the client issues two simultaneous OPEN calls, and the response to the first OPEN call (sequence id 1) is delayed before updating the nfs4_state, then the client's nfs4_state can transition through a complete lifecycle of OPEN (state sequence id 2), and CLOSE (state sequence id 3). When the first OPEN is finally processed, the nfs4_state is incorrectly transitioned back to NFS_OPEN_STATE for the first OPEN (sequence id 1). Subsequent calls to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state recovery. Fix this by passing back the result of need_update_open_stateid() to the open() path, with the result to try again if the OPEN's stateid should not be updated. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/nfs4proc.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)