diff mbox

[1/1] PNFS dont retry some error when MDS=DS

Message ID 20170310213715.38258-1-kolga@netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Olga Kornievskaia March 10, 2017, 9:37 p.m. UTC
If we sent an operation to the "DS" and got an error, the code resends
to "MDS" but when they are the same, it gets the same error and goes
into the infinite loop. Example was COMMIT getting EACCES.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/filelayout/filelayout.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Trond Myklebust March 11, 2017, 3:46 p.m. UTC | #1
T24gRnJpLCAyMDE3LTAzLTEwIGF0IDE2OjM3IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gSWYgd2Ugc2VudCBhbiBvcGVyYXRpb24gdG8gdGhlICJEUyIgYW5kIGdvdCBhbiBlcnJv
ciwgdGhlIGNvZGUNCj4gcmVzZW5kcw0KPiB0byAiTURTIiBidXQgd2hlbiB0aGV5IGFyZSB0aGUg
c2FtZSwgaXQgZ2V0cyB0aGUgc2FtZSBlcnJvciBhbmQgZ29lcw0KPiBpbnRvIHRoZSBpbmZpbml0
ZSBsb29wLiBFeGFtcGxlIHdhcyBDT01NSVQgZ2V0dGluZyBFQUNDRVMuDQo+IA0KDQpUaGUgY29y
cmVjdCBiZWhhdmlvdXIgd2hlbiBnZXR0aW5nIEVBQ0NFUyBmcm9tIGEgQ09NTUlUIHRvIHRoZSBN
RFMNCndvdWxkIGJlIHRvIHJldHJ5IHRoZSBlbnRpcmUgc2VyaWVzIG9mIFdSSVRFIGNhbGxzIHdp
dGggc3RhYmxlIHdyaXRlcy4NCklmIHRoZSBzZXJ2ZXIgdGhlbiByZXR1cm5zIHdpdGggYW55dGhp
bmcgb3RoZXIgdGhhbiBGSUxFX1NZTkMsIHRoZW4NCkVJTy4NCg0KV2h5IGlzIHRoZSBzZXJ2ZXIg
ZmFpbGluZyB0aGUgQ09NTUlUIGhlcmUgaWYgdGhlIGNsaWVudCB0aGlua3MgaXQgc2VudA0KdW5z
dGFibGUgd3JpdGVzPw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBt
YWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 12, 2017, 6:40 p.m. UTC | #2
> On Mar 11, 2017, at 10:46 AM, Trond Myklebust <trondmy@primarydata.com> wrote:
> 
> On Fri, 2017-03-10 at 16:37 -0500, Olga Kornievskaia wrote:
>> If we sent an operation to the "DS" and got an error, the code
>> resends
>> to "MDS" but when they are the same, it gets the same error and goes
>> into the infinite loop. Example was COMMIT getting EACCES.
>> 
> 
> The correct behaviour when getting EACCES from a COMMIT to the MDS
> would be to retry the entire series of WRITE calls with stable writes.
> If the server then returns with anything other than FILE_SYNC, then
> EIO.
> 
> Why is the server failing the COMMIT here if the client thinks it sent
> unstable writes?

This looping was discovered during connectathon testing against the desy server. While I believe Tigran thinks there is an error in his code and EACCES from COMMIT was unexpected, we thought that it would be best if the client didn’t go into infinite loop in this condition given that EACCES is a valid error for COMMIT. If you think this shouldn’t happen in real life, then I’m ok with not pursuing this. 

On a different but related note, in case when MDS != DS, when the writes are retried they are not retried against the MDS but again are sent to the DS, is that a bug?

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust March 12, 2017, 9 p.m. UTC | #3
T24gU3VuLCAyMDE3LTAzLTEyIGF0IDE0OjQwIC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gPiBPbiBNYXIgMTEsIDIwMTcsIGF0IDEwOjQ2IEFNLCBUcm9uZCBNeWtsZWJ1c3QgPHRy
b25kbXlAcHJpbWFyeWRhdGEuDQo+ID4gY29tPiB3cm90ZToNCj4gPiANCj4gPiBPbiBGcmksIDIw
MTctMDMtMTAgYXQgMTY6MzcgLTA1MDAsIE9sZ2EgS29ybmlldnNrYWlhIHdyb3RlOg0KPiA+ID4g
SWYgd2Ugc2VudCBhbiBvcGVyYXRpb24gdG8gdGhlICJEUyIgYW5kIGdvdCBhbiBlcnJvciwgdGhl
IGNvZGUNCj4gPiA+IHJlc2VuZHMNCj4gPiA+IHRvICJNRFMiIGJ1dCB3aGVuIHRoZXkgYXJlIHRo
ZSBzYW1lLCBpdCBnZXRzIHRoZSBzYW1lIGVycm9yIGFuZA0KPiA+ID4gZ29lcw0KPiA+ID4gaW50
byB0aGUgaW5maW5pdGUgbG9vcC4gRXhhbXBsZSB3YXMgQ09NTUlUIGdldHRpbmcgRUFDQ0VTLg0K
PiA+ID4gDQo+ID4gDQo+ID4gVGhlIGNvcnJlY3QgYmVoYXZpb3VyIHdoZW4gZ2V0dGluZyBFQUND
RVMgZnJvbSBhIENPTU1JVCB0byB0aGUgTURTDQo+ID4gd291bGQgYmUgdG8gcmV0cnkgdGhlIGVu
dGlyZSBzZXJpZXMgb2YgV1JJVEUgY2FsbHMgd2l0aCBzdGFibGUNCj4gPiB3cml0ZXMuDQo+ID4g
SWYgdGhlIHNlcnZlciB0aGVuIHJldHVybnMgd2l0aCBhbnl0aGluZyBvdGhlciB0aGFuIEZJTEVf
U1lOQywgdGhlbg0KPiA+IEVJTy4NCj4gPiANCj4gPiBXaHkgaXMgdGhlIHNlcnZlciBmYWlsaW5n
IHRoZSBDT01NSVQgaGVyZSBpZiB0aGUgY2xpZW50IHRoaW5rcyBpdA0KPiA+IHNlbnQNCj4gPiB1
bnN0YWJsZSB3cml0ZXM/DQo+IA0KPiBUaGlzIGxvb3Bpbmcgd2FzIGRpc2NvdmVyZWQgZHVyaW5n
IGNvbm5lY3RhdGhvbiB0ZXN0aW5nIGFnYWluc3QgdGhlDQo+IGRlc3kgc2VydmVyLiBXaGlsZSBJ
IGJlbGlldmUgVGlncmFuIHRoaW5rcyB0aGVyZSBpcyBhbiBlcnJvciBpbiBoaXMNCj4gY29kZSBh
bmQgRUFDQ0VTIGZyb20gQ09NTUlUIHdhcyB1bmV4cGVjdGVkLCB3ZSB0aG91Z2h0IHRoYXQgaXQg
d291bGQNCj4gYmUgYmVzdCBpZiB0aGUgY2xpZW50IGRpZG7igJl0IGdvIGludG8gaW5maW5pdGUg
bG9vcCBpbiB0aGlzIGNvbmRpdGlvbg0KPiBnaXZlbiB0aGF0IEVBQ0NFUyBpcyBhIHZhbGlkIGVy
cm9yIGZvciBDT01NSVQuIElmIHlvdSB0aGluayB0aGlzDQo+IHNob3VsZG7igJl0IGhhcHBlbiBp
biByZWFsIGxpZmUsIHRoZW4gSeKAmW0gb2sgd2l0aCBub3QgcHVyc3VpbmcgdGhpcy7CoA0KDQpJ
dCBpcyBsZWdhbCwgYW5kIEkgY2FuIHNvcnQgb2Ygc2VlIGhvdyBpdCBtaWdodCBiZSBpbnZva2Vk
IGlmIGEgU0VUQVRUUg0KY2hhbmdlcyB0aGUgbW9kZSBiZXR3ZWVuIHRoZSBXUklURSBhbmQgdGhl
IENPTU1JVCwgYnV0IGl0IGlzIGEgY29ybmVyDQpjYXNlLg0KDQo+IE9uIGEgZGlmZmVyZW50IGJ1
dCByZWxhdGVkIG5vdGUsIGluIGNhc2Ugd2hlbiBNRFMgIT0gRFMsIHdoZW4gdGhlDQo+IHdyaXRl
cyBhcmUgcmV0cmllZCB0aGV5IGFyZSBub3QgcmV0cmllZCBhZ2FpbnN0IHRoZSBNRFMgYnV0IGFn
YWluIGFyZQ0KPiBzZW50IHRvIHRoZSBEUywgaXMgdGhhdCBhIGJ1Zz8NCg0KSSB3b3VsZCBleHBl
Y3QgdGhhdCBvbmNlIHlvdSBkZWNpZGUgdG8gZmFsbCBiYWNrIHRvIGRvaW5nIEkvTyB0aHJvdWdo
DQp0aGUgTURTLCB0aGVuIHlvdSdsbCB3YW50IGFueSByZXNlbmRzIHRvIGFsc28gZ28gdG8gdGhl
IE1EUy4NCg0KPiA+IA0KPiA+IC0twqANCj4gPiBUcm9uZCBNeWtsZWJ1c3QNCj4gPiBMaW51eCBO
RlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQo+ID4gdHJvbmQubXlrbGVidXN0QHBy
aW1hcnlkYXRhLmNvbQ0KPiANCj4gDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh
LmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 13, 2017, 2:42 p.m. UTC | #4
On Sun, Mar 12, 2017 at 5:00 PM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Sun, 2017-03-12 at 14:40 -0400, Olga Kornievskaia wrote:
>> > On Mar 11, 2017, at 10:46 AM, Trond Myklebust <trondmy@primarydata.
>> > com> wrote:
>> >
>> > On Fri, 2017-03-10 at 16:37 -0500, Olga Kornievskaia wrote:
>> > > If we sent an operation to the "DS" and got an error, the code
>> > > resends
>> > > to "MDS" but when they are the same, it gets the same error and
>> > > goes
>> > > into the infinite loop. Example was COMMIT getting EACCES.
>> > >
>> >
>> > The correct behaviour when getting EACCES from a COMMIT to the MDS
>> > would be to retry the entire series of WRITE calls with stable
>> > writes.
>> > If the server then returns with anything other than FILE_SYNC, then
>> > EIO.
>> >

Going back to this comment. Then in non-pnfs (4.1 or 4.0), is the
expected behavior is to re-send the writes with FILE_SYNC. This is not
what the code does now.

>> > Why is the server failing the COMMIT here if the client thinks it
>> > sent
>> > unstable writes?
>>
>> This looping was discovered during connectathon testing against the
>> desy server. While I believe Tigran thinks there is an error in his
>> code and EACCES from COMMIT was unexpected, we thought that it would
>> be best if the client didn’t go into infinite loop in this condition
>> given that EACCES is a valid error for COMMIT. If you think this
>> shouldn’t happen in real life, then I’m ok with not pursuing this.
>
> It is legal, and I can sort of see how it might be invoked if a SETATTR
> changes the mode between the WRITE and the COMMIT, but it is a corner
> case.
>
>> On a different but related note, in case when MDS != DS, when the
>> writes are retried they are not retried against the MDS but again are
>> sent to the DS, is that a bug?
>
> I would expect that once you decide to fall back to doing I/O through
> the MDS, then you'll want any resends to also go to the MDS.

I guess I'm not seeing there is a fallback happening if COMMIT fails.
I think if writes fail then fallback happens but not the COMMIT.

Again I'm not sure if this is worth pursing as you say it's a corner case.

>> >
>> > --
>> > Trond Myklebust
>> > Linux NFS client maintainer, PrimaryData
>> > trond.myklebust@primarydata.com
>>
>>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 15, 2017, 3:08 p.m. UTC | #5
On Mon, Mar 13, 2017 at 10:42 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Sun, Mar 12, 2017 at 5:00 PM, Trond Myklebust
> <trondmy@primarydata.com> wrote:
>> On Sun, 2017-03-12 at 14:40 -0400, Olga Kornievskaia wrote:
>>> > On Mar 11, 2017, at 10:46 AM, Trond Myklebust <trondmy@primarydata.
>>> > com> wrote:
>>> >
>>> > On Fri, 2017-03-10 at 16:37 -0500, Olga Kornievskaia wrote:
>>> > > If we sent an operation to the "DS" and got an error, the code
>>> > > resends
>>> > > to "MDS" but when they are the same, it gets the same error and
>>> > > goes
>>> > > into the infinite loop. Example was COMMIT getting EACCES.
>>> > >
>>> >
>>> > The correct behaviour when getting EACCES from a COMMIT to the MDS
>>> > would be to retry the entire series of WRITE calls with stable
>>> > writes.
>>> > If the server then returns with anything other than FILE_SYNC, then
>>> > EIO.
>>> >
>
> Going back to this comment. Then in non-pnfs (4.1 or 4.0), is the
> expected behavior is to re-send the writes with FILE_SYNC. This is not
> what the code does now.
>
>>> > Why is the server failing the COMMIT here if the client thinks it
>>> > sent
>>> > unstable writes?
>>>
>>> This looping was discovered during connectathon testing against the
>>> desy server. While I believe Tigran thinks there is an error in his
>>> code and EACCES from COMMIT was unexpected, we thought that it would
>>> be best if the client didn’t go into infinite loop in this condition
>>> given that EACCES is a valid error for COMMIT. If you think this
>>> shouldn’t happen in real life, then I’m ok with not pursuing this.
>>
>> It is legal, and I can sort of see how it might be invoked if a SETATTR
>> changes the mode between the WRITE and the COMMIT, but it is a corner
>> case.
>>
>>> On a different but related note, in case when MDS != DS, when the
>>> writes are retried they are not retried against the MDS but again are
>>> sent to the DS, is that a bug?
>>
>> I would expect that once you decide to fall back to doing I/O through
>> the MDS, then you'll want any resends to also go to the MDS.
>
> I guess I'm not seeing there is a fallback happening if COMMIT fails.
> I think if writes fail then fallback happens but not the COMMIT.
>
> Again I'm not sure if this is worth pursing as you say it's a corner case.

More thoughts about your comment about retrying the WRITEs. If this is
indeed a legitimate case of SETATTR changing mode, then WRITEs that
are retried will fail with EACCES as well. So why bother retrying?
Might as well just return the error to the application.

That aside, I still think my patch is needed as it fixes the infinite
loop that will exist in the MDS=DS pnfs case.

Andy suggested that instead of checking for the same rpc client (which
might fail if there is trunking going on in the future), instead to
check the flags from the exchange_id from the server.

>
>>> >
>>> > --
>>> > Trond Myklebust
>>> > Linux NFS client maintainer, PrimaryData
>>> > trond.myklebust@primarydata.com
>>>
>>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer, PrimaryData
>> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 44347f4..772be38 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -207,6 +207,8 @@  static int filelayout_async_handle_error(struct rpc_task *task,
 		/* fall through */
 	default:
 reset:
+		if (mds_client->cl_rpcclient == clp->cl_rpcclient)
+			return task->tk_status;
 		dprintk("%s Retry through MDS. Error %d\n", __func__,
 			task->tk_status);
 		return -NFS4ERR_RESET_TO_MDS;
@@ -384,9 +386,10 @@  static int filelayout_commit_done_cb(struct rpc_task *task,
 		return -EAGAIN;
 	}
 
-	pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
+	if (!err)
+		pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
 
-	return 0;
+	return err;
 }
 
 static void filelayout_write_prepare(struct rpc_task *task, void *data)