Message ID | 20171204001726.5747-1-trond.myklebust@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Trond- > On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot > race with the call to xprt_complete_rqst(). > > Reported-by: Chuck Lever <chuck.lever@oracle.com> > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 > Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") > Cc: stable@vger.kernel.org # 4.14+ > Reviewed-by: Chuck Lever <chuck.lever@oracle.com> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > net/sunrpc/xprt.c | 26 +++++++++++++++++--------- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 333b9d697ae5..5e4278e9ce37 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) > } else if (!req->rq_bytes_sent) > return; > > + req->rq_connect_cookie = xprt->connect_cookie; > req->rq_xtime = ktime_get(); > status = xprt->ops->send_request(task); > trace_xprt_transmit(xprt, req->rq_xid, status); > @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task *task) > xprt->stat.bklog_u += xprt->backlog.qlen; > xprt->stat.sending_u += xprt->sending.qlen; > xprt->stat.pending_u += xprt->pending.qlen; > + spin_unlock_bh(&xprt->transport_lock); > > - /* Don't race with disconnect */ > - if (!xprt_connected(xprt)) > - task->tk_status = -ENOTCONN; > - else { > + if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { > /* > - * Sleep on the pending queue since > - * we're expecting a reply. > + * Sleep on the pending queue if we're expecting a reply. > + * The spinlock ensures atomicity between the test of > + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). > */ > - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) > + spin_lock(&xprt->recv_lock); > + if (!req->rq_reply_bytes_recvd) { > rpc_sleep_on(&xprt->pending, task, xprt_timer); > - req->rq_connect_cookie = xprt->connect_cookie; > + /* > + * Send an extra queue wakeup call if the > + * connection was dropped in case the call to > + * rpc_sleep_on() raced. > + */ > + if (!xprt_connected(xprt)) > + xprt_wake_pending_tasks(xprt, -ENOTCONN); > + } > + spin_unlock(&xprt->recv_lock); > } > - spin_unlock_bh(&xprt->transport_lock); > } > > static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) > -- > 2.14.3 I've run into a problem with this version of the patch (on v4.15-rc3). With dbench on NFSv4.0 (on RDMA) I no longer see constant throughput and a temporarily climbing latency followed by a recovery, but this can happen on occasion: releasing clients 16 584 322.55 MB/sec warmup 1 sec latency 807.121 ms 16 584 161.28 MB/sec warmup 2 sec latency 1807.175 ms 16 584 107.52 MB/sec warmup 3 sec latency 2807.198 ms 16 584 80.64 MB/sec warmup 4 sec latency 3807.217 ms 16 584 64.51 MB/sec warmup 5 sec latency 4807.235 ms 16 584 53.76 MB/sec warmup 6 sec latency 5807.251 ms 16 584 46.08 MB/sec warmup 7 sec latency 6807.269 ms 16 584 40.32 MB/sec warmup 8 sec latency 7807.285 ms 16 584 35.84 MB/sec warmup 9 sec latency 8807.303 ms 16 584 32.26 MB/sec warmup 10 sec latency 9807.318 ms 16 584 29.32 MB/sec warmup 11 sec latency 10807.334 ms 16 584 26.88 MB/sec warmup 12 sec latency 11807.355 ms 16 584 24.81 MB/sec warmup 13 sec latency 12807.372 ms 16 584 23.04 MB/sec warmup 14 sec latency 13807.387 ms 16 584 21.50 MB/sec warmup 15 sec latency 14807.406 ms 16 584 20.16 MB/sec warmup 16 sec latency 15807.423 ms No recovery. The latency number keeps climbing, and throughput drops. "kernel: nfs: server klimt-ib not responding, still trying" appears in the client's /var/log/messages. Without this patch applied, the NFSv4.0 behavior is the same as I reported with NFSv3: every once in a while, one RPC completion is lost, but when the timer fires, the client notices the reply actually did arrive and the RPC completes normally. I'm looking into it. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Trond- > >> On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >> We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot >> race with the call to xprt_complete_rqst(). >> >> Reported-by: Chuck Lever <chuck.lever@oracle.com> >> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 >> Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") >> Cc: stable@vger.kernel.org # 4.14+ >> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> --- >> net/sunrpc/xprt.c | 26 +++++++++++++++++--------- >> 1 file changed, 17 insertions(+), 9 deletions(-) >> >> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >> index 333b9d697ae5..5e4278e9ce37 100644 >> --- a/net/sunrpc/xprt.c >> +++ b/net/sunrpc/xprt.c >> @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) >> } else if (!req->rq_bytes_sent) >> return; >> >> + req->rq_connect_cookie = xprt->connect_cookie; >> req->rq_xtime = ktime_get(); >> status = xprt->ops->send_request(task); >> trace_xprt_transmit(xprt, req->rq_xid, status); >> @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task *task) >> xprt->stat.bklog_u += xprt->backlog.qlen; >> xprt->stat.sending_u += xprt->sending.qlen; >> xprt->stat.pending_u += xprt->pending.qlen; >> + spin_unlock_bh(&xprt->transport_lock); >> >> - /* Don't race with disconnect */ >> - if (!xprt_connected(xprt)) >> - task->tk_status = -ENOTCONN; >> - else { >> + if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { >> /* >> - * Sleep on the pending queue since >> - * we're expecting a reply. >> + * Sleep on the pending queue if we're expecting a reply. >> + * The spinlock ensures atomicity between the test of >> + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). >> */ >> - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) >> + spin_lock(&xprt->recv_lock); >> + if (!req->rq_reply_bytes_recvd) { >> rpc_sleep_on(&xprt->pending, task, xprt_timer); >> - req->rq_connect_cookie = xprt->connect_cookie; >> + /* >> + * Send an extra queue wakeup call if the >> + * connection was dropped in case the call to >> + * rpc_sleep_on() raced. >> + */ >> + if (!xprt_connected(xprt)) >> + xprt_wake_pending_tasks(xprt, -ENOTCONN); >> + } >> + spin_unlock(&xprt->recv_lock); >> } >> - spin_unlock_bh(&xprt->transport_lock); >> } >> >> static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) >> -- >> 2.14.3 > > I've run into a problem with this version of the patch (on v4.15-rc3). > > With dbench on NFSv4.0 (on RDMA) I no longer see constant throughput > and a temporarily climbing latency followed by a recovery, but this > can happen on occasion: > > releasing clients > 16 584 322.55 MB/sec warmup 1 sec latency 807.121 ms > 16 584 161.28 MB/sec warmup 2 sec latency 1807.175 ms > 16 584 107.52 MB/sec warmup 3 sec latency 2807.198 ms > 16 584 80.64 MB/sec warmup 4 sec latency 3807.217 ms > 16 584 64.51 MB/sec warmup 5 sec latency 4807.235 ms > 16 584 53.76 MB/sec warmup 6 sec latency 5807.251 ms > 16 584 46.08 MB/sec warmup 7 sec latency 6807.269 ms > 16 584 40.32 MB/sec warmup 8 sec latency 7807.285 ms > 16 584 35.84 MB/sec warmup 9 sec latency 8807.303 ms > 16 584 32.26 MB/sec warmup 10 sec latency 9807.318 ms > 16 584 29.32 MB/sec warmup 11 sec latency 10807.334 ms > 16 584 26.88 MB/sec warmup 12 sec latency 11807.355 ms > 16 584 24.81 MB/sec warmup 13 sec latency 12807.372 ms > 16 584 23.04 MB/sec warmup 14 sec latency 13807.387 ms > 16 584 21.50 MB/sec warmup 15 sec latency 14807.406 ms > 16 584 20.16 MB/sec warmup 16 sec latency 15807.423 ms > > No recovery. The latency number keeps climbing, and throughput > drops. > > "kernel: nfs: server klimt-ib not responding, still trying" > appears in the client's /var/log/messages. > > Without this patch applied, the NFSv4.0 behavior is the same as I > reported with NFSv3: every once in a while, one RPC completion is > lost, but when the timer fires, the client notices the reply > actually did arrive and the RPC completes normally. > > I'm looking into it. Root cause: The transport is temporarily out of resources and returns -ENOBUFS from ->send_request. The FSM schedules a delay then calls call_transmit again, which invokes xprt_prepare_transmit. On NFSv3 mounts, xprt_prepare_transmit tries to acquire the transport write lock. On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This causes xprt_prepare_transmit to see that the transport is connected and rq_connect_cookie is the same as xprt->connect_cookie, so it queues the task on ->pending and returns false. The transport never sends the Call, but still waits for a Reply, forever. This is not a bug in your patch, it is a pre-existing issue. There's a simple fix, which is to set rq_connect_cookie = xprt->connect_cookie - 1; if RPC-over-RDMA header marshaling fails. This is a one-line fix, appropriate for -rc and backport to stable (to at least v4.8). A more robust fix might be to have ->send_request return -EAGAIN instead, and let xprtrdma call xprt_write_space when the resource exhaustion is relieved. That might be more involved, and thus is future-facing. I'm investigating both fixes. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Dec 13, 2017, at 6:42 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >> Hi Trond- >> >>> On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>> We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot >>> race with the call to xprt_complete_rqst(). >>> >>> Reported-by: Chuck Lever <chuck.lever@oracle.com> >>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 >>> Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") >>> Cc: stable@vger.kernel.org # 4.14+ >>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> >>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >>> --- >>> net/sunrpc/xprt.c | 26 +++++++++++++++++--------- >>> 1 file changed, 17 insertions(+), 9 deletions(-) >>> >>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>> index 333b9d697ae5..5e4278e9ce37 100644 >>> --- a/net/sunrpc/xprt.c >>> +++ b/net/sunrpc/xprt.c >>> @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) >>> } else if (!req->rq_bytes_sent) >>> return; >>> >>> + req->rq_connect_cookie = xprt->connect_cookie; >>> req->rq_xtime = ktime_get(); >>> status = xprt->ops->send_request(task); >>> trace_xprt_transmit(xprt, req->rq_xid, status); >>> @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task *task) >>> xprt->stat.bklog_u += xprt->backlog.qlen; >>> xprt->stat.sending_u += xprt->sending.qlen; >>> xprt->stat.pending_u += xprt->pending.qlen; >>> + spin_unlock_bh(&xprt->transport_lock); >>> >>> - /* Don't race with disconnect */ >>> - if (!xprt_connected(xprt)) >>> - task->tk_status = -ENOTCONN; >>> - else { >>> + if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { >>> /* >>> - * Sleep on the pending queue since >>> - * we're expecting a reply. >>> + * Sleep on the pending queue if we're expecting a reply. >>> + * The spinlock ensures atomicity between the test of >>> + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). >>> */ >>> - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) >>> + spin_lock(&xprt->recv_lock); >>> + if (!req->rq_reply_bytes_recvd) { >>> rpc_sleep_on(&xprt->pending, task, xprt_timer); >>> - req->rq_connect_cookie = xprt->connect_cookie; >>> + /* >>> + * Send an extra queue wakeup call if the >>> + * connection was dropped in case the call to >>> + * rpc_sleep_on() raced. >>> + */ >>> + if (!xprt_connected(xprt)) >>> + xprt_wake_pending_tasks(xprt, -ENOTCONN); >>> + } >>> + spin_unlock(&xprt->recv_lock); >>> } >>> - spin_unlock_bh(&xprt->transport_lock); >>> } >>> >>> static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) >>> -- >>> 2.14.3 >> >> I've run into a problem with this version of the patch (on v4.15-rc3). >> >> With dbench on NFSv4.0 (on RDMA) I no longer see constant throughput >> and a temporarily climbing latency followed by a recovery, but this >> can happen on occasion: >> >> releasing clients >> 16 584 322.55 MB/sec warmup 1 sec latency 807.121 ms >> 16 584 161.28 MB/sec warmup 2 sec latency 1807.175 ms >> 16 584 107.52 MB/sec warmup 3 sec latency 2807.198 ms >> 16 584 80.64 MB/sec warmup 4 sec latency 3807.217 ms >> 16 584 64.51 MB/sec warmup 5 sec latency 4807.235 ms >> 16 584 53.76 MB/sec warmup 6 sec latency 5807.251 ms >> 16 584 46.08 MB/sec warmup 7 sec latency 6807.269 ms >> 16 584 40.32 MB/sec warmup 8 sec latency 7807.285 ms >> 16 584 35.84 MB/sec warmup 9 sec latency 8807.303 ms >> 16 584 32.26 MB/sec warmup 10 sec latency 9807.318 ms >> 16 584 29.32 MB/sec warmup 11 sec latency 10807.334 ms >> 16 584 26.88 MB/sec warmup 12 sec latency 11807.355 ms >> 16 584 24.81 MB/sec warmup 13 sec latency 12807.372 ms >> 16 584 23.04 MB/sec warmup 14 sec latency 13807.387 ms >> 16 584 21.50 MB/sec warmup 15 sec latency 14807.406 ms >> 16 584 20.16 MB/sec warmup 16 sec latency 15807.423 ms >> >> No recovery. The latency number keeps climbing, and throughput >> drops. >> >> "kernel: nfs: server klimt-ib not responding, still trying" >> appears in the client's /var/log/messages. >> >> Without this patch applied, the NFSv4.0 behavior is the same as I >> reported with NFSv3: every once in a while, one RPC completion is >> lost, but when the timer fires, the client notices the reply >> actually did arrive and the RPC completes normally. >> >> I'm looking into it. > > Root cause: > > The transport is temporarily out of resources and returns -ENOBUFS > from ->send_request. > > The FSM schedules a delay then calls call_transmit again, which > invokes xprt_prepare_transmit. > > On NFSv3 mounts, xprt_prepare_transmit tries to acquire the > transport write lock. > > On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This causes > xprt_prepare_transmit to see that the transport is connected and > rq_connect_cookie is the same as xprt->connect_cookie, so it > queues the task on ->pending and returns false. > > The transport never sends the Call, but still waits for a Reply, > forever. This is not a bug in your patch, it is a pre-existing > issue. Sorry for one more reply... I think this last statement is incorrect. It is a bug in your patch after all. I forgot that your patch moves the line that sets req->rq_connect_cookie = xprt->connect_cookie; to before the call to ->send_request. But rq_connect_cookie must be set iff ->send_request is successful. Moving this line seems to fix the issue: 1027 req->rq_xtime = ktime_get(); 1028 status = xprt->ops->send_request(task); 1029 trace_xprt_transmit(xprt, req->rq_xid, status); 1030 if (status != 0) { 1031 task->tk_status = status; 1032 return; 1033 } 1034 req->rq_connect_cookie = xprt->connect_cookie; <<<< 1035 xprt_inject_disconnect(xprt); > There's a simple fix, which is to set > > rq_connect_cookie = xprt->connect_cookie - 1; > > if RPC-over-RDMA header marshaling fails. This is a one-line fix, > appropriate for -rc and backport to stable (to at least v4.8). This fix does exactly the same thing. > A more robust fix might be to have ->send_request return -EAGAIN > instead, and let xprtrdma call xprt_write_space when the resource > exhaustion is relieved. That might be more involved, and thus is > future-facing. > > I'm investigating both fixes. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gV2VkLCAyMDE3LTEyLTEzIGF0IDIwOjEzIC0wNTAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g PiBPbiBEZWMgMTMsIDIwMTcsIGF0IDY6NDIgUE0sIENodWNrIExldmVyIDxjaHVjay5sZXZlckBv cmFjbGUuY29tPg0KPiA+IHdyb3RlOg0KPiA+IA0KPiA+ID4gDQo+ID4gPiBPbiBEZWMgMTMsIDIw MTcsIGF0IDExOjE0IEFNLCBDaHVjayBMZXZlciA8Y2h1Y2subGV2ZXJAb3JhY2xlLmNvbQ0KPiA+ ID4gPiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gSGkgVHJvbmQtDQo+ID4gPiANCj4gPiA+ID4gT24g RGVjIDMsIDIwMTcsIGF0IDc6MTcgUE0sIFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0 QHByDQo+ID4gPiA+IGltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4gV2Ug bXVzdCBlbnN1cmUgdGhhdCB0aGUgY2FsbCB0byBycGNfc2xlZXBfb24oKSBpbg0KPiA+ID4gPiB4 cHJ0X3RyYW5zbWl0KCkgY2Fubm90DQo+ID4gPiA+IHJhY2Ugd2l0aCB0aGUgY2FsbCB0byB4cHJ0 X2NvbXBsZXRlX3Jxc3QoKS4NCj4gPiA+ID4gDQo+ID4gPiA+IFJlcG9ydGVkLWJ5OiBDaHVjayBM ZXZlciA8Y2h1Y2subGV2ZXJAb3JhY2xlLmNvbT4NCj4gPiA+ID4gTGluazogaHR0cHM6Ly9idWd6 aWxsYS5saW51eC1uZnMub3JnL3Nob3dfYnVnLmNnaT9pZD0zMTcNCj4gPiA+ID4gRml4ZXM6IGNl N2MyNTJhOGM3NCAoIlNVTlJQQzogQWRkIGEgc2VwYXJhdGUgc3BpbmxvY2sgdG8NCj4gPiA+ID4g cHJvdGVjdC4uIikNCj4gPiA+ID4gQ2M6IHN0YWJsZUB2Z2VyLmtlcm5lbC5vcmcgIyA0LjE0Kw0K PiA+ID4gPiBSZXZpZXdlZC1ieTogQ2h1Y2sgTGV2ZXIgPGNodWNrLmxldmVyQG9yYWNsZS5jb20+ DQo+ID4gPiA+IFNpZ25lZC1vZmYtYnk6IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0 QHByaW1hcnlkYXRhLmNvbQ0KPiA+ID4gPiA+DQo+ID4gPiA+IC0tLQ0KPiA+ID4gPiBuZXQvc3Vu cnBjL3hwcnQuYyB8IDI2ICsrKysrKysrKysrKysrKysrLS0tLS0tLS0tDQo+ID4gPiA+IDEgZmls ZSBjaGFuZ2VkLCAxNyBpbnNlcnRpb25zKCspLCA5IGRlbGV0aW9ucygtKQ0KPiA+ID4gPiANCj4g PiA+ID4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydC5jIGIvbmV0L3N1bnJwYy94cHJ0LmMN Cj4gPiA+ID4gaW5kZXggMzMzYjlkNjk3YWU1Li41ZTQyNzhlOWNlMzcgMTAwNjQ0DQo+ID4gPiA+ IC0tLSBhL25ldC9zdW5ycGMveHBydC5jDQo+ID4gPiA+ICsrKyBiL25ldC9zdW5ycGMveHBydC5j DQo+ID4gPiA+IEBAIC0xMDI0LDYgKzEwMjQsNyBAQCB2b2lkIHhwcnRfdHJhbnNtaXQoc3RydWN0 IHJwY190YXNrICp0YXNrKQ0KPiA+ID4gPiAJfSBlbHNlIGlmICghcmVxLT5ycV9ieXRlc19zZW50 KQ0KPiA+ID4gPiAJCXJldHVybjsNCj4gPiA+ID4gDQo+ID4gPiA+ICsJcmVxLT5ycV9jb25uZWN0 X2Nvb2tpZSA9IHhwcnQtPmNvbm5lY3RfY29va2llOw0KPiA+ID4gPiAJcmVxLT5ycV94dGltZSA9 IGt0aW1lX2dldCgpOw0KPiA+ID4gPiAJc3RhdHVzID0geHBydC0+b3BzLT5zZW5kX3JlcXVlc3Qo dGFzayk7DQo+ID4gPiA+IAl0cmFjZV94cHJ0X3RyYW5zbWl0KHhwcnQsIHJlcS0+cnFfeGlkLCBz dGF0dXMpOw0KPiA+ID4gPiBAQCAtMTA0NywyMCArMTA0OCwyNyBAQCB2b2lkIHhwcnRfdHJhbnNt aXQoc3RydWN0IHJwY190YXNrDQo+ID4gPiA+ICp0YXNrKQ0KPiA+ID4gPiAJeHBydC0+c3RhdC5i a2xvZ191ICs9IHhwcnQtPmJhY2tsb2cucWxlbjsNCj4gPiA+ID4gCXhwcnQtPnN0YXQuc2VuZGlu Z191ICs9IHhwcnQtPnNlbmRpbmcucWxlbjsNCj4gPiA+ID4gCXhwcnQtPnN0YXQucGVuZGluZ191 ICs9IHhwcnQtPnBlbmRpbmcucWxlbjsNCj4gPiA+ID4gKwlzcGluX3VubG9ja19iaCgmeHBydC0+ dHJhbnNwb3J0X2xvY2spOw0KPiA+ID4gPiANCj4gPiA+ID4gLQkvKiBEb24ndCByYWNlIHdpdGgg ZGlzY29ubmVjdCAqLw0KPiA+ID4gPiAtCWlmICgheHBydF9jb25uZWN0ZWQoeHBydCkpDQo+ID4g PiA+IC0JCXRhc2stPnRrX3N0YXR1cyA9IC1FTk9UQ09OTjsNCj4gPiA+ID4gLQllbHNlIHsNCj4g PiA+ID4gKwlpZiAocnBjX3JlcGx5X2V4cGVjdGVkKHRhc2spICYmICFSRUFEX09OQ0UocmVxLQ0K PiA+ID4gPiA+cnFfcmVwbHlfYnl0ZXNfcmVjdmQpKSB7DQo+ID4gPiA+IAkJLyoNCj4gPiA+ID4g LQkJICogU2xlZXAgb24gdGhlIHBlbmRpbmcgcXVldWUgc2luY2UNCj4gPiA+ID4gLQkJICogd2Un cmUgZXhwZWN0aW5nIGEgcmVwbHkuDQo+ID4gPiA+ICsJCSAqIFNsZWVwIG9uIHRoZSBwZW5kaW5n IHF1ZXVlIGlmIHdlJ3JlDQo+ID4gPiA+IGV4cGVjdGluZyBhIHJlcGx5Lg0KPiA+ID4gPiArCQkg KiBUaGUgc3BpbmxvY2sgZW5zdXJlcyBhdG9taWNpdHkgYmV0d2VlbiB0aGUNCj4gPiA+ID4gdGVz dCBvZg0KPiA+ID4gPiArCQkgKiByZXEtPnJxX3JlcGx5X2J5dGVzX3JlY3ZkLCBhbmQgdGhlIGNh bGwgdG8NCj4gPiA+ID4gcnBjX3NsZWVwX29uKCkuDQo+ID4gPiA+IAkJICovDQo+ID4gPiA+IC0J CWlmICghcmVxLT5ycV9yZXBseV9ieXRlc19yZWN2ZCAmJg0KPiA+ID4gPiBycGNfcmVwbHlfZXhw ZWN0ZWQodGFzaykpDQo+ID4gPiA+ICsJCXNwaW5fbG9jaygmeHBydC0+cmVjdl9sb2NrKTsNCj4g PiA+ID4gKwkJaWYgKCFyZXEtPnJxX3JlcGx5X2J5dGVzX3JlY3ZkKSB7DQo+ID4gPiA+IAkJCXJw Y19zbGVlcF9vbigmeHBydC0+cGVuZGluZywgdGFzaywNCj4gPiA+ID4geHBydF90aW1lcik7DQo+ ID4gPiA+IC0JCXJlcS0+cnFfY29ubmVjdF9jb29raWUgPSB4cHJ0LT5jb25uZWN0X2Nvb2tpZTsN Cj4gPiA+ID4gKwkJCS8qDQo+ID4gPiA+ICsJCQkgKiBTZW5kIGFuIGV4dHJhIHF1ZXVlIHdha2V1 cCBjYWxsIGlmDQo+ID4gPiA+IHRoZQ0KPiA+ID4gPiArCQkJICogY29ubmVjdGlvbiB3YXMgZHJv cHBlZCBpbiBjYXNlIHRoZQ0KPiA+ID4gPiBjYWxsIHRvDQo+ID4gPiA+ICsJCQkgKiBycGNfc2xl ZXBfb24oKSByYWNlZC4NCj4gPiA+ID4gKwkJCSAqLw0KPiA+ID4gPiArCQkJaWYgKCF4cHJ0X2Nv bm5lY3RlZCh4cHJ0KSkNCj4gPiA+ID4gKwkJCQl4cHJ0X3dha2VfcGVuZGluZ190YXNrcyh4cHJ0 LA0KPiA+ID4gPiAtRU5PVENPTk4pOw0KPiA+ID4gPiArCQl9DQo+ID4gPiA+ICsJCXNwaW5fdW5s b2NrKCZ4cHJ0LT5yZWN2X2xvY2spOw0KPiA+ID4gPiAJfQ0KPiA+ID4gPiAtCXNwaW5fdW5sb2Nr X2JoKCZ4cHJ0LT50cmFuc3BvcnRfbG9jayk7DQo+ID4gPiA+IH0NCj4gPiA+ID4gDQo+ID4gPiA+ IHN0YXRpYyB2b2lkIHhwcnRfYWRkX2JhY2tsb2coc3RydWN0IHJwY194cHJ0ICp4cHJ0LCBzdHJ1 Y3QNCj4gPiA+ID4gcnBjX3Rhc2sgKnRhc2spDQo+ID4gPiA+IC0tIA0KPiA+ID4gPiAyLjE0LjMN Cj4gPiA+IA0KPiA+ID4gSSd2ZSBydW4gaW50byBhIHByb2JsZW0gd2l0aCB0aGlzIHZlcnNpb24g b2YgdGhlIHBhdGNoIChvbiB2NC4xNS0NCj4gPiA+IHJjMykuDQo+ID4gPiANCj4gPiA+IFdpdGgg ZGJlbmNoIG9uIE5GU3Y0LjAgKG9uIFJETUEpIEkgbm8gbG9uZ2VyIHNlZSBjb25zdGFudA0KPiA+ ID4gdGhyb3VnaHB1dA0KPiA+ID4gYW5kIGEgdGVtcG9yYXJpbHkgY2xpbWJpbmcgbGF0ZW5jeSBm b2xsb3dlZCBieSBhIHJlY292ZXJ5LCBidXQNCj4gPiA+IHRoaXMNCj4gPiA+IGNhbiBoYXBwZW4g b24gb2NjYXNpb246DQo+ID4gPiANCj4gPiA+IHJlbGVhc2luZyBjbGllbnRzDQo+ID4gPiAxNiAg ICAgICA1ODQgICAzMjIuNTUgTUIvc2VjICB3YXJtdXAgICAxIHNlYyAgbGF0ZW5jeSA4MDcuMTIx IG1zDQo+ID4gPiAxNiAgICAgICA1ODQgICAxNjEuMjggTUIvc2VjICB3YXJtdXAgICAyIHNlYyAg bGF0ZW5jeSAxODA3LjE3NSBtcw0KPiA+ID4gMTYgICAgICAgNTg0ICAgMTA3LjUyIE1CL3NlYyAg d2FybXVwICAgMyBzZWMgIGxhdGVuY3kgMjgwNy4xOTggbXMNCj4gPiA+IDE2ICAgICAgIDU4NCAg ICA4MC42NCBNQi9zZWMgIHdhcm11cCAgIDQgc2VjICBsYXRlbmN5IDM4MDcuMjE3IG1zDQo+ID4g PiAxNiAgICAgICA1ODQgICAgNjQuNTEgTUIvc2VjICB3YXJtdXAgICA1IHNlYyAgbGF0ZW5jeSA0 ODA3LjIzNSBtcw0KPiA+ID4gMTYgICAgICAgNTg0ICAgIDUzLjc2IE1CL3NlYyAgd2FybXVwICAg NiBzZWMgIGxhdGVuY3kgNTgwNy4yNTEgbXMNCj4gPiA+IDE2ICAgICAgIDU4NCAgICA0Ni4wOCBN Qi9zZWMgIHdhcm11cCAgIDcgc2VjICBsYXRlbmN5IDY4MDcuMjY5IG1zDQo+ID4gPiAxNiAgICAg ICA1ODQgICAgNDAuMzIgTUIvc2VjICB3YXJtdXAgICA4IHNlYyAgbGF0ZW5jeSA3ODA3LjI4NSBt cw0KPiA+ID4gMTYgICAgICAgNTg0ICAgIDM1Ljg0IE1CL3NlYyAgd2FybXVwICAgOSBzZWMgIGxh dGVuY3kgODgwNy4zMDMgbXMNCj4gPiA+IDE2ICAgICAgIDU4NCAgICAzMi4yNiBNQi9zZWMgIHdh cm11cCAgMTAgc2VjICBsYXRlbmN5IDk4MDcuMzE4IG1zDQo+ID4gPiAxNiAgICAgICA1ODQgICAg MjkuMzIgTUIvc2VjICB3YXJtdXAgIDExIHNlYyAgbGF0ZW5jeSAxMDgwNy4zMzQNCj4gPiA+IG1z DQo+ID4gPiAxNiAgICAgICA1ODQgICAgMjYuODggTUIvc2VjICB3YXJtdXAgIDEyIHNlYyAgbGF0 ZW5jeSAxMTgwNy4zNTUNCj4gPiA+IG1zDQo+ID4gPiAxNiAgICAgICA1ODQgICAgMjQuODEgTUIv c2VjICB3YXJtdXAgIDEzIHNlYyAgbGF0ZW5jeSAxMjgwNy4zNzINCj4gPiA+IG1zDQo+ID4gPiAx NiAgICAgICA1ODQgICAgMjMuMDQgTUIvc2VjICB3YXJtdXAgIDE0IHNlYyAgbGF0ZW5jeSAxMzgw Ny4zODcNCj4gPiA+IG1zDQo+ID4gPiAxNiAgICAgICA1ODQgICAgMjEuNTAgTUIvc2VjICB3YXJt dXAgIDE1IHNlYyAgbGF0ZW5jeSAxNDgwNy40MDYNCj4gPiA+IG1zDQo+ID4gPiAxNiAgICAgICA1 ODQgICAgMjAuMTYgTUIvc2VjICB3YXJtdXAgIDE2IHNlYyAgbGF0ZW5jeSAxNTgwNy40MjMNCj4g PiA+IG1zDQo+ID4gPiANCj4gPiA+IE5vIHJlY292ZXJ5LiBUaGUgbGF0ZW5jeSBudW1iZXIga2Vl cHMgY2xpbWJpbmcsIGFuZCB0aHJvdWdocHV0DQo+ID4gPiBkcm9wcy4NCj4gPiA+IA0KPiA+ID4g Imtlcm5lbDogbmZzOiBzZXJ2ZXIga2xpbXQtaWIgbm90IHJlc3BvbmRpbmcsIHN0aWxsIHRyeWlu ZyINCj4gPiA+IGFwcGVhcnMgaW4gdGhlIGNsaWVudCdzIC92YXIvbG9nL21lc3NhZ2VzLg0KPiA+ ID4gDQo+ID4gPiBXaXRob3V0IHRoaXMgcGF0Y2ggYXBwbGllZCwgdGhlIE5GU3Y0LjAgYmVoYXZp b3IgaXMgdGhlIHNhbWUgYXMgSQ0KPiA+ID4gcmVwb3J0ZWQgd2l0aCBORlN2MzogZXZlcnkgb25j ZSBpbiBhIHdoaWxlLCBvbmUgUlBDIGNvbXBsZXRpb24gaXMNCj4gPiA+IGxvc3QsIGJ1dCB3aGVu IHRoZSB0aW1lciBmaXJlcywgdGhlIGNsaWVudCBub3RpY2VzIHRoZSByZXBseQ0KPiA+ID4gYWN0 dWFsbHkgZGlkIGFycml2ZSBhbmQgdGhlIFJQQyBjb21wbGV0ZXMgbm9ybWFsbHkuDQo+ID4gPiAN Cj4gPiA+IEknbSBsb29raW5nIGludG8gaXQuDQo+ID4gDQo+ID4gUm9vdCBjYXVzZToNCj4gPiAN Cj4gPiBUaGUgdHJhbnNwb3J0IGlzIHRlbXBvcmFyaWx5IG91dCBvZiByZXNvdXJjZXMgYW5kIHJl dHVybnMgLUVOT0JVRlMNCj4gPiBmcm9tIC0+c2VuZF9yZXF1ZXN0Lg0KPiA+IA0KPiA+IFRoZSBG U00gc2NoZWR1bGVzIGEgZGVsYXkgdGhlbiBjYWxscyBjYWxsX3RyYW5zbWl0IGFnYWluLCB3aGlj aA0KPiA+IGludm9rZXMgeHBydF9wcmVwYXJlX3RyYW5zbWl0Lg0KPiA+IA0KPiA+IE9uIE5GU3Yz IG1vdW50cywgeHBydF9wcmVwYXJlX3RyYW5zbWl0IHRyaWVzIHRvIGFjcXVpcmUgdGhlDQo+ID4g dHJhbnNwb3J0IHdyaXRlIGxvY2suDQo+ID4gDQo+ID4gT24gTkZTdjQgbW91bnRzLCBSUENfVEFT S19OT19SRVRSQU5TX1RJTUVPVVQgaXMgc2V0LiBUaGlzIGNhdXNlcw0KPiA+IHhwcnRfcHJlcGFy ZV90cmFuc21pdCB0byBzZWUgdGhhdCB0aGUgdHJhbnNwb3J0IGlzIGNvbm5lY3RlZCBhbmQNCj4g PiBycV9jb25uZWN0X2Nvb2tpZSBpcyB0aGUgc2FtZSBhcyB4cHJ0LT5jb25uZWN0X2Nvb2tpZSwg c28gaXQNCj4gPiBxdWV1ZXMgdGhlIHRhc2sgb24gLT5wZW5kaW5nIGFuZCByZXR1cm5zIGZhbHNl Lg0KPiA+IA0KPiA+IFRoZSB0cmFuc3BvcnQgbmV2ZXIgc2VuZHMgdGhlIENhbGwsIGJ1dCBzdGls bCB3YWl0cyBmb3IgYSBSZXBseSwNCj4gPiBmb3JldmVyLiBUaGlzIGlzIG5vdCBhIGJ1ZyBpbiB5 b3VyIHBhdGNoLCBpdCBpcyBhIHByZS1leGlzdGluZw0KPiA+IGlzc3VlLg0KPiANCj4gU29ycnkg Zm9yIG9uZSBtb3JlIHJlcGx5Li4uDQo+IA0KPiBJIHRoaW5rIHRoaXMgbGFzdCBzdGF0ZW1lbnQg aXMgaW5jb3JyZWN0LiBJdCBpcyBhIGJ1ZyBpbiB5b3VyDQo+IHBhdGNoIGFmdGVyIGFsbC4NCj4g DQo+IEkgZm9yZ290IHRoYXQgeW91ciBwYXRjaCBtb3ZlcyB0aGUgbGluZSB0aGF0IHNldHMNCj4g DQo+ICAgICByZXEtPnJxX2Nvbm5lY3RfY29va2llID0geHBydC0+Y29ubmVjdF9jb29raWU7DQo+ IA0KPiB0byBiZWZvcmUgdGhlIGNhbGwgdG8gLT5zZW5kX3JlcXVlc3QuIEJ1dCBycV9jb25uZWN0 X2Nvb2tpZQ0KPiBtdXN0IGJlIHNldCBpZmYgLT5zZW5kX3JlcXVlc3QgaXMgc3VjY2Vzc2Z1bC4g TW92aW5nIHRoaXMNCj4gbGluZSBzZWVtcyB0byBmaXggdGhlIGlzc3VlOg0KPiANCj4gMTAyNyAg ICAgICAgIHJlcS0+cnFfeHRpbWUgPSBrdGltZV9nZXQoKTsNCj4gMTAyOCAgICAgICAgIHN0YXR1 cyA9IHhwcnQtPm9wcy0+c2VuZF9yZXF1ZXN0KHRhc2spOw0KPiAxMDI5ICAgICAgICAgdHJhY2Vf eHBydF90cmFuc21pdCh4cHJ0LCByZXEtPnJxX3hpZCwgc3RhdHVzKTsNCj4gMTAzMCAgICAgICAg IGlmIChzdGF0dXMgIT0gMCkgew0KPiAxMDMxICAgICAgICAgICAgICAgICB0YXNrLT50a19zdGF0 dXMgPSBzdGF0dXM7DQo+IDEwMzIgICAgICAgICAgICAgICAgIHJldHVybjsNCj4gMTAzMyAgICAg ICAgIH0NCj4gMTAzNCAgICAgICAgIHJlcS0+cnFfY29ubmVjdF9jb29raWUgPSB4cHJ0LT5jb25u ZWN0X2Nvb2tpZTsgPDw8PA0KPiAxMDM1ICAgICAgICAgeHBydF9pbmplY3RfZGlzY29ubmVjdCh4 cHJ0KTsNCj4gDQoNCkxldCdzIHJhdGhlciBhcHBseSB0aGUgYXR0YWNoZWQgZml4IChpbmNyZW1l bnRhbCB0byB0aGUgb3JpZ2luYWwgcGF0Y2gNCmFuZCBpbnRlbmRlZCB0byBiZSBhcHBsaWVkIHVz aW5nICdnaXQgcmViYXNlIC0tYXV0b3NxdWFzaCcpLiBUaGUgcmVhc29uDQpmb3Igd2FudGluZyB0 byBzZXQgdGhlIGNvbm5lY3QgY29va2llIGJlZm9yZSB3ZSBjYWxsIHNlbmRfcmVxdWVzdCgpIGlz DQp0byBhdm9pZCByYWNlcyB3aXRoIGRpc2Nvbm5lY3RzIHRoYXQgaGFwcGVuIGltbWVkaWF0ZWx5 IGFmdGVyIHdlIHNlbnQNCnRoZSBkYXRhIChidXQgYmVmb3JlIHdlIGdldCByb3VuZCB0byBzZXR0 aW5nIHRoZSBjb29raWUpLg0KDQo4PC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tDQpGcm9tIDMyNmMxMjcyN2IxNzlkNWMyZmQ0MmYzNzRhNGMw ZTJkNTZlZTkwYzUgTW9uIFNlcCAxNyAwMDowMDowMCAyMDAxDQpGcm9tOiBUcm9uZCBNeWtsZWJ1 c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQpEYXRlOiBUaHUsIDE0IERlYyAy MDE3IDA3OjA1OjI3IC0wNTAwDQpTdWJqZWN0OiBbUEFUQ0hdIGZpeHVwISBTVU5SUEM6IEZpeCBh IHJhY2UgaW4gdGhlIHJlY2VpdmUgY29kZSBwYXRoDQoNCi0tLQ0KIG5ldC9zdW5ycGMveHBydC5j IHwgMyArKy0NCiAxIGZpbGUgY2hhbmdlZCwgMiBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0p DQoNCmRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3hwcnQuYyBiL25ldC9zdW5ycGMveHBydC5jDQpp bmRleCA1ZTQyNzhlOWNlMzcuLjJjZjRjZjIxYTg3OCAxMDA2NDQNCi0tLSBhL25ldC9zdW5ycGMv eHBydC5jDQorKysgYi9uZXQvc3VucnBjL3hwcnQuYw0KQEAgLTEwMjEsMTAgKzEwMjEsMTEgQEAg dm9pZCB4cHJ0X3RyYW5zbWl0KHN0cnVjdCBycGNfdGFzayAqdGFzaykNCiAJCQkvKiBUdXJuIG9m ZiBhdXRvZGlzY29ubmVjdCAqLw0KIAkJCWRlbF9zaW5nbGVzaG90X3RpbWVyX3N5bmMoJnhwcnQt PnRpbWVyKTsNCiAJCX0NCisJCWlmICghcmVxLT5ycV9ieXRlc19zZW50KQ0KKwkJCXJlcS0+cnFf Y29ubmVjdF9jb29raWUgPSB4cHJ0LT5jb25uZWN0X2Nvb2tpZTsNCiAJfSBlbHNlIGlmICghcmVx LT5ycV9ieXRlc19zZW50KQ0KIAkJcmV0dXJuOw0KIA0KLQlyZXEtPnJxX2Nvbm5lY3RfY29va2ll ID0geHBydC0+Y29ubmVjdF9jb29raWU7DQogCXJlcS0+cnFfeHRpbWUgPSBrdGltZV9nZXQoKTsN CiAJc3RhdHVzID0geHBydC0+b3BzLT5zZW5kX3JlcXVlc3QodGFzayk7DQogCXRyYWNlX3hwcnRf dHJhbnNtaXQoeHBydCwgcmVxLT5ycV94aWQsIHN0YXR1cyk7DQotLSANCjIuMTQuMw0KDQotLSAN ClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0 YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Dec 14, 2017, at 7:16 AM, Trond Myklebust <trondmy@primarydata.com> wrote: > > On Wed, 2017-12-13 at 20:13 -0500, Chuck Lever wrote: >>> On Dec 13, 2017, at 6:42 PM, Chuck Lever <chuck.lever@oracle.com> >>> wrote: >>> >>>> >>>> On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@oracle.com >>>>> wrote: >>>> >>>> Hi Trond- >>>> >>>>> On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebust@pr >>>>> imarydata.com> wrote: >>>>> >>>>> We must ensure that the call to rpc_sleep_on() in >>>>> xprt_transmit() cannot >>>>> race with the call to xprt_complete_rqst(). >>>>> >>>>> Reported-by: Chuck Lever <chuck.lever@oracle.com> >>>>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 >>>>> Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to >>>>> protect..") >>>>> Cc: stable@vger.kernel.org # 4.14+ >>>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> >>>>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com >>>>>> >>>>> --- >>>>> net/sunrpc/xprt.c | 26 +++++++++++++++++--------- >>>>> 1 file changed, 17 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>>> index 333b9d697ae5..5e4278e9ce37 100644 >>>>> --- a/net/sunrpc/xprt.c >>>>> +++ b/net/sunrpc/xprt.c >>>>> @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) >>>>> } else if (!req->rq_bytes_sent) >>>>> return; >>>>> >>>>> + req->rq_connect_cookie = xprt->connect_cookie; >>>>> req->rq_xtime = ktime_get(); >>>>> status = xprt->ops->send_request(task); >>>>> trace_xprt_transmit(xprt, req->rq_xid, status); >>>>> @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task >>>>> *task) >>>>> xprt->stat.bklog_u += xprt->backlog.qlen; >>>>> xprt->stat.sending_u += xprt->sending.qlen; >>>>> xprt->stat.pending_u += xprt->pending.qlen; >>>>> + spin_unlock_bh(&xprt->transport_lock); >>>>> >>>>> - /* Don't race with disconnect */ >>>>> - if (!xprt_connected(xprt)) >>>>> - task->tk_status = -ENOTCONN; >>>>> - else { >>>>> + if (rpc_reply_expected(task) && !READ_ONCE(req- >>>>>> rq_reply_bytes_recvd)) { >>>>> /* >>>>> - * Sleep on the pending queue since >>>>> - * we're expecting a reply. >>>>> + * Sleep on the pending queue if we're >>>>> expecting a reply. >>>>> + * The spinlock ensures atomicity between the >>>>> test of >>>>> + * req->rq_reply_bytes_recvd, and the call to >>>>> rpc_sleep_on(). >>>>> */ >>>>> - if (!req->rq_reply_bytes_recvd && >>>>> rpc_reply_expected(task)) >>>>> + spin_lock(&xprt->recv_lock); >>>>> + if (!req->rq_reply_bytes_recvd) { >>>>> rpc_sleep_on(&xprt->pending, task, >>>>> xprt_timer); >>>>> - req->rq_connect_cookie = xprt->connect_cookie; >>>>> + /* >>>>> + * Send an extra queue wakeup call if >>>>> the >>>>> + * connection was dropped in case the >>>>> call to >>>>> + * rpc_sleep_on() raced. >>>>> + */ >>>>> + if (!xprt_connected(xprt)) >>>>> + xprt_wake_pending_tasks(xprt, >>>>> -ENOTCONN); >>>>> + } >>>>> + spin_unlock(&xprt->recv_lock); >>>>> } >>>>> - spin_unlock_bh(&xprt->transport_lock); >>>>> } >>>>> >>>>> static void xprt_add_backlog(struct rpc_xprt *xprt, struct >>>>> rpc_task *task) >>>>> -- >>>>> 2.14.3 >>>> >>>> I've run into a problem with this version of the patch (on v4.15- >>>> rc3). >>>> >>>> With dbench on NFSv4.0 (on RDMA) I no longer see constant >>>> throughput >>>> and a temporarily climbing latency followed by a recovery, but >>>> this >>>> can happen on occasion: >>>> >>>> releasing clients >>>> 16 584 322.55 MB/sec warmup 1 sec latency 807.121 ms >>>> 16 584 161.28 MB/sec warmup 2 sec latency 1807.175 ms >>>> 16 584 107.52 MB/sec warmup 3 sec latency 2807.198 ms >>>> 16 584 80.64 MB/sec warmup 4 sec latency 3807.217 ms >>>> 16 584 64.51 MB/sec warmup 5 sec latency 4807.235 ms >>>> 16 584 53.76 MB/sec warmup 6 sec latency 5807.251 ms >>>> 16 584 46.08 MB/sec warmup 7 sec latency 6807.269 ms >>>> 16 584 40.32 MB/sec warmup 8 sec latency 7807.285 ms >>>> 16 584 35.84 MB/sec warmup 9 sec latency 8807.303 ms >>>> 16 584 32.26 MB/sec warmup 10 sec latency 9807.318 ms >>>> 16 584 29.32 MB/sec warmup 11 sec latency 10807.334 >>>> ms >>>> 16 584 26.88 MB/sec warmup 12 sec latency 11807.355 >>>> ms >>>> 16 584 24.81 MB/sec warmup 13 sec latency 12807.372 >>>> ms >>>> 16 584 23.04 MB/sec warmup 14 sec latency 13807.387 >>>> ms >>>> 16 584 21.50 MB/sec warmup 15 sec latency 14807.406 >>>> ms >>>> 16 584 20.16 MB/sec warmup 16 sec latency 15807.423 >>>> ms >>>> >>>> No recovery. The latency number keeps climbing, and throughput >>>> drops. >>>> >>>> "kernel: nfs: server klimt-ib not responding, still trying" >>>> appears in the client's /var/log/messages. >>>> >>>> Without this patch applied, the NFSv4.0 behavior is the same as I >>>> reported with NFSv3: every once in a while, one RPC completion is >>>> lost, but when the timer fires, the client notices the reply >>>> actually did arrive and the RPC completes normally. >>>> >>>> I'm looking into it. >>> >>> Root cause: >>> >>> The transport is temporarily out of resources and returns -ENOBUFS >>> from ->send_request. >>> >>> The FSM schedules a delay then calls call_transmit again, which >>> invokes xprt_prepare_transmit. >>> >>> On NFSv3 mounts, xprt_prepare_transmit tries to acquire the >>> transport write lock. >>> >>> On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This causes >>> xprt_prepare_transmit to see that the transport is connected and >>> rq_connect_cookie is the same as xprt->connect_cookie, so it >>> queues the task on ->pending and returns false. >>> >>> The transport never sends the Call, but still waits for a Reply, >>> forever. This is not a bug in your patch, it is a pre-existing >>> issue. >> >> Sorry for one more reply... >> >> I think this last statement is incorrect. It is a bug in your >> patch after all. >> >> I forgot that your patch moves the line that sets >> >> req->rq_connect_cookie = xprt->connect_cookie; >> >> to before the call to ->send_request. But rq_connect_cookie >> must be set iff ->send_request is successful. Moving this >> line seems to fix the issue: >> >> 1027 req->rq_xtime = ktime_get(); >> 1028 status = xprt->ops->send_request(task); >> 1029 trace_xprt_transmit(xprt, req->rq_xid, status); >> 1030 if (status != 0) { >> 1031 task->tk_status = status; >> 1032 return; >> 1033 } >> 1034 req->rq_connect_cookie = xprt->connect_cookie; <<<< >> 1035 xprt_inject_disconnect(xprt); >> > > Let's rather apply the attached fix (incremental to the original patch > and intended to be applied using 'git rebase --autosquash'). The reason > for wanting to set the connect cookie before we call send_request() is > to avoid races with disconnects that happen immediately after we sent > the data (but before we get round to setting the cookie). > > 8<-------------------------------------------------------- > From 326c12727b179d5c2fd42f374a4c0e2d56ee90c5 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 14 Dec 2017 07:05:27 -0500 > Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path > > --- > net/sunrpc/xprt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 5e4278e9ce37..2cf4cf21a878 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1021,10 +1021,11 @@ void xprt_transmit(struct rpc_task *task) > /* Turn off autodisconnect */ > del_singleshot_timer_sync(&xprt->timer); > } > + if (!req->rq_bytes_sent) > + req->rq_connect_cookie = xprt->connect_cookie; > } else if (!req->rq_bytes_sent) > return; > > - req->rq_connect_cookie = xprt->connect_cookie; > req->rq_xtime = ktime_get(); > status = xprt->ops->send_request(task); > trace_xprt_transmit(xprt, req->rq_xid, status); > -- > 2.14.3 Tested with a client that returns -ENOBUFS from ->send_request with a little extra frequency. This doesn't appear to address the problem. The transport deadlocks as before when send_request() fails to send the RPC. I imagine this issue could also impact write buffer exhaustion on TCP. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-12-14 at 10:49 -0500, Chuck Lever wrote: > > On Dec 14, 2017, at 7:16 AM, Trond Myklebust <trondmy@primarydata.c > > om> wrote: > > > > On Wed, 2017-12-13 at 20:13 -0500, Chuck Lever wrote: > > > > On Dec 13, 2017, at 6:42 PM, Chuck Lever <chuck.lever@oracle.co > > > > m> > > > > wrote: > > > > > > > > > > > > > > On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@oracle > > > > > .com > > > > > > wrote: > > > > > > > > > > Hi Trond- > > > > > > > > > > > On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebus > > > > > > t@pr > > > > > > imarydata.com> wrote: > > > > > > > > > > > > We must ensure that the call to rpc_sleep_on() in > > > > > > xprt_transmit() cannot > > > > > > race with the call to xprt_complete_rqst(). > > > > > > > > > > > > Reported-by: Chuck Lever <chuck.lever@oracle.com> > > > > > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 > > > > > > Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to > > > > > > protect..") > > > > > > Cc: stable@vger.kernel.org # 4.14+ > > > > > > Reviewed-by: Chuck Lever <chuck.lever@oracle.com> > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata > > > > > > .com > > > > > > > > > > > > > > > > > > > --- > > > > > > net/sunrpc/xprt.c | 26 +++++++++++++++++--------- > > > > > > 1 file changed, 17 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > > > > > index 333b9d697ae5..5e4278e9ce37 100644 > > > > > > --- a/net/sunrpc/xprt.c > > > > > > +++ b/net/sunrpc/xprt.c > > > > > > @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task > > > > > > *task) > > > > > > } else if (!req->rq_bytes_sent) > > > > > > return; > > > > > > > > > > > > + req->rq_connect_cookie = xprt->connect_cookie; > > > > > > req->rq_xtime = ktime_get(); > > > > > > status = xprt->ops->send_request(task); > > > > > > trace_xprt_transmit(xprt, req->rq_xid, status); > > > > > > @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task > > > > > > *task) > > > > > > xprt->stat.bklog_u += xprt->backlog.qlen; > > > > > > xprt->stat.sending_u += xprt->sending.qlen; > > > > > > xprt->stat.pending_u += xprt->pending.qlen; > > > > > > + spin_unlock_bh(&xprt->transport_lock); > > > > > > > > > > > > - /* Don't race with disconnect */ > > > > > > - if (!xprt_connected(xprt)) > > > > > > - task->tk_status = -ENOTCONN; > > > > > > - else { > > > > > > + if (rpc_reply_expected(task) && !READ_ONCE(req- > > > > > > > rq_reply_bytes_recvd)) { > > > > > > > > > > > > /* > > > > > > - * Sleep on the pending queue since > > > > > > - * we're expecting a reply. > > > > > > + * Sleep on the pending queue if we're > > > > > > expecting a reply. > > > > > > + * The spinlock ensures atomicity between > > > > > > the > > > > > > test of > > > > > > + * req->rq_reply_bytes_recvd, and the call > > > > > > to > > > > > > rpc_sleep_on(). > > > > > > */ > > > > > > - if (!req->rq_reply_bytes_recvd && > > > > > > rpc_reply_expected(task)) > > > > > > + spin_lock(&xprt->recv_lock); > > > > > > + if (!req->rq_reply_bytes_recvd) { > > > > > > rpc_sleep_on(&xprt->pending, task, > > > > > > xprt_timer); > > > > > > - req->rq_connect_cookie = xprt- > > > > > > >connect_cookie; > > > > > > + /* > > > > > > + * Send an extra queue wakeup call > > > > > > if > > > > > > the > > > > > > + * connection was dropped in case > > > > > > the > > > > > > call to > > > > > > + * rpc_sleep_on() raced. > > > > > > + */ > > > > > > + if (!xprt_connected(xprt)) > > > > > > + xprt_wake_pending_tasks(xp > > > > > > rt, > > > > > > -ENOTCONN); > > > > > > + } > > > > > > + spin_unlock(&xprt->recv_lock); > > > > > > } > > > > > > - spin_unlock_bh(&xprt->transport_lock); > > > > > > } > > > > > > > > > > > > static void xprt_add_backlog(struct rpc_xprt *xprt, struct > > > > > > rpc_task *task) > > > > > > -- > > > > > > 2.14.3 > > > > > > > > > > I've run into a problem with this version of the patch (on > > > > > v4.15- > > > > > rc3). > > > > > > > > > > With dbench on NFSv4.0 (on RDMA) I no longer see constant > > > > > throughput > > > > > and a temporarily climbing latency followed by a recovery, > > > > > but > > > > > this > > > > > can happen on occasion: > > > > > > > > > > releasing clients > > > > > 16 584 322.55 MB/sec warmup 1 sec latency 807.121 > > > > > ms > > > > > 16 584 161.28 MB/sec warmup 2 sec latency > > > > > 1807.175 ms > > > > > 16 584 107.52 MB/sec warmup 3 sec latency > > > > > 2807.198 ms > > > > > 16 584 80.64 MB/sec warmup 4 sec latency > > > > > 3807.217 ms > > > > > 16 584 64.51 MB/sec warmup 5 sec latency > > > > > 4807.235 ms > > > > > 16 584 53.76 MB/sec warmup 6 sec latency > > > > > 5807.251 ms > > > > > 16 584 46.08 MB/sec warmup 7 sec latency > > > > > 6807.269 ms > > > > > 16 584 40.32 MB/sec warmup 8 sec latency > > > > > 7807.285 ms > > > > > 16 584 35.84 MB/sec warmup 9 sec latency > > > > > 8807.303 ms > > > > > 16 584 32.26 MB/sec warmup 10 sec latency > > > > > 9807.318 ms > > > > > 16 584 29.32 MB/sec warmup 11 sec latency > > > > > 10807.334 > > > > > ms > > > > > 16 584 26.88 MB/sec warmup 12 sec latency > > > > > 11807.355 > > > > > ms > > > > > 16 584 24.81 MB/sec warmup 13 sec latency > > > > > 12807.372 > > > > > ms > > > > > 16 584 23.04 MB/sec warmup 14 sec latency > > > > > 13807.387 > > > > > ms > > > > > 16 584 21.50 MB/sec warmup 15 sec latency > > > > > 14807.406 > > > > > ms > > > > > 16 584 20.16 MB/sec warmup 16 sec latency > > > > > 15807.423 > > > > > ms > > > > > > > > > > No recovery. The latency number keeps climbing, and > > > > > throughput > > > > > drops. > > > > > > > > > > "kernel: nfs: server klimt-ib not responding, still trying" > > > > > appears in the client's /var/log/messages. > > > > > > > > > > Without this patch applied, the NFSv4.0 behavior is the same > > > > > as I > > > > > reported with NFSv3: every once in a while, one RPC > > > > > completion is > > > > > lost, but when the timer fires, the client notices the reply > > > > > actually did arrive and the RPC completes normally. > > > > > > > > > > I'm looking into it. > > > > > > > > Root cause: > > > > > > > > The transport is temporarily out of resources and returns > > > > -ENOBUFS > > > > from ->send_request. > > > > > > > > The FSM schedules a delay then calls call_transmit again, which > > > > invokes xprt_prepare_transmit. > > > > > > > > On NFSv3 mounts, xprt_prepare_transmit tries to acquire the > > > > transport write lock. > > > > > > > > On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This > > > > causes > > > > xprt_prepare_transmit to see that the transport is connected > > > > and > > > > rq_connect_cookie is the same as xprt->connect_cookie, so it > > > > queues the task on ->pending and returns false. > > > > > > > > The transport never sends the Call, but still waits for a > > > > Reply, > > > > forever. This is not a bug in your patch, it is a pre-existing > > > > issue. > > > > > > Sorry for one more reply... > > > > > > I think this last statement is incorrect. It is a bug in your > > > patch after all. > > > > > > I forgot that your patch moves the line that sets > > > > > > req->rq_connect_cookie = xprt->connect_cookie; > > > > > > to before the call to ->send_request. But rq_connect_cookie > > > must be set iff ->send_request is successful. Moving this > > > line seems to fix the issue: > > > > > > 1027 req->rq_xtime = ktime_get(); > > > 1028 status = xprt->ops->send_request(task); > > > 1029 trace_xprt_transmit(xprt, req->rq_xid, status); > > > 1030 if (status != 0) { > > > 1031 task->tk_status = status; > > > 1032 return; > > > 1033 } > > > 1034 req->rq_connect_cookie = xprt->connect_cookie; <<<< > > > 1035 xprt_inject_disconnect(xprt); > > > > > > > Let's rather apply the attached fix (incremental to the original > > patch > > and intended to be applied using 'git rebase --autosquash'). The > > reason > > for wanting to set the connect cookie before we call send_request() > > is > > to avoid races with disconnects that happen immediately after we > > sent > > the data (but before we get round to setting the cookie). > > > > 8<-------------------------------------------------------- > > From 326c12727b179d5c2fd42f374a4c0e2d56ee90c5 Mon Sep 17 00:00:00 > > 2001 > > From: Trond Myklebust <trond.myklebust@primarydata.com> > > Date: Thu, 14 Dec 2017 07:05:27 -0500 > > Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path > > > > --- > > net/sunrpc/xprt.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > index 5e4278e9ce37..2cf4cf21a878 100644 > > --- a/net/sunrpc/xprt.c > > +++ b/net/sunrpc/xprt.c > > @@ -1021,10 +1021,11 @@ void xprt_transmit(struct rpc_task *task) > > /* Turn off autodisconnect */ > > del_singleshot_timer_sync(&xprt->timer); > > } > > + if (!req->rq_bytes_sent) > > + req->rq_connect_cookie = xprt- > > >connect_cookie; > > } else if (!req->rq_bytes_sent) > > return; > > > > - req->rq_connect_cookie = xprt->connect_cookie; > > req->rq_xtime = ktime_get(); > > status = xprt->ops->send_request(task); > > trace_xprt_transmit(xprt, req->rq_xid, status); > > -- > > 2.14.3 > > Tested with a client that returns -ENOBUFS from ->send_request > with a little extra frequency. > > This doesn't appear to address the problem. The transport > deadlocks as before when send_request() fails to send the RPC. Does the RDMA code update the connect cookie when the connection breaks? It looks to me as if it only does that when the connection is re-established. We really want both. > I imagine this issue could also impact write buffer exhaustion > on TCP. See net/sunrpc/xprtsock.c:xs_tcp_state_change() -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
> On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > > On Thu, 2017-12-14 at 10:49 -0500, Chuck Lever wrote: >>> On Dec 14, 2017, at 7:16 AM, Trond Myklebust <trondmy@primarydata.c >>> om> wrote: >>> >>> On Wed, 2017-12-13 at 20:13 -0500, Chuck Lever wrote: >>>>> On Dec 13, 2017, at 6:42 PM, Chuck Lever <chuck.lever@oracle.co >>>>> m> >>>>> wrote: >>>>> >>>>>> >>>>>> On Dec 13, 2017, at 11:14 AM, Chuck Lever <chuck.lever@oracle >>>>>> .com >>>>>>> wrote: >>>>>> >>>>>> Hi Trond- >>>>>> >>>>>>> On Dec 3, 2017, at 7:17 PM, Trond Myklebust <trond.myklebus >>>>>>> t@pr >>>>>>> imarydata.com> wrote: >>>>>>> >>>>>>> We must ensure that the call to rpc_sleep_on() in >>>>>>> xprt_transmit() cannot >>>>>>> race with the call to xprt_complete_rqst(). >>>>>>> >>>>>>> Reported-by: Chuck Lever <chuck.lever@oracle.com> >>>>>>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 >>>>>>> Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to >>>>>>> protect..") >>>>>>> Cc: stable@vger.kernel.org # 4.14+ >>>>>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> >>>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata >>>>>>> .com >>>>>>>> >>>>>>> >>>>>>> --- >>>>>>> net/sunrpc/xprt.c | 26 +++++++++++++++++--------- >>>>>>> 1 file changed, 17 insertions(+), 9 deletions(-) >>>>>>> >>>>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>>>>> index 333b9d697ae5..5e4278e9ce37 100644 >>>>>>> --- a/net/sunrpc/xprt.c >>>>>>> +++ b/net/sunrpc/xprt.c >>>>>>> @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task >>>>>>> *task) >>>>>>> } else if (!req->rq_bytes_sent) >>>>>>> return; >>>>>>> >>>>>>> + req->rq_connect_cookie = xprt->connect_cookie; >>>>>>> req->rq_xtime = ktime_get(); >>>>>>> status = xprt->ops->send_request(task); >>>>>>> trace_xprt_transmit(xprt, req->rq_xid, status); >>>>>>> @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task >>>>>>> *task) >>>>>>> xprt->stat.bklog_u += xprt->backlog.qlen; >>>>>>> xprt->stat.sending_u += xprt->sending.qlen; >>>>>>> xprt->stat.pending_u += xprt->pending.qlen; >>>>>>> + spin_unlock_bh(&xprt->transport_lock); >>>>>>> >>>>>>> - /* Don't race with disconnect */ >>>>>>> - if (!xprt_connected(xprt)) >>>>>>> - task->tk_status = -ENOTCONN; >>>>>>> - else { >>>>>>> + if (rpc_reply_expected(task) && !READ_ONCE(req- >>>>>>>> rq_reply_bytes_recvd)) { >>>>>>> >>>>>>> /* >>>>>>> - * Sleep on the pending queue since >>>>>>> - * we're expecting a reply. >>>>>>> + * Sleep on the pending queue if we're >>>>>>> expecting a reply. >>>>>>> + * The spinlock ensures atomicity between >>>>>>> the >>>>>>> test of >>>>>>> + * req->rq_reply_bytes_recvd, and the call >>>>>>> to >>>>>>> rpc_sleep_on(). >>>>>>> */ >>>>>>> - if (!req->rq_reply_bytes_recvd && >>>>>>> rpc_reply_expected(task)) >>>>>>> + spin_lock(&xprt->recv_lock); >>>>>>> + if (!req->rq_reply_bytes_recvd) { >>>>>>> rpc_sleep_on(&xprt->pending, task, >>>>>>> xprt_timer); >>>>>>> - req->rq_connect_cookie = xprt- >>>>>>>> connect_cookie; >>>>>>> + /* >>>>>>> + * Send an extra queue wakeup call >>>>>>> if >>>>>>> the >>>>>>> + * connection was dropped in case >>>>>>> the >>>>>>> call to >>>>>>> + * rpc_sleep_on() raced. >>>>>>> + */ >>>>>>> + if (!xprt_connected(xprt)) >>>>>>> + xprt_wake_pending_tasks(xp >>>>>>> rt, >>>>>>> -ENOTCONN); >>>>>>> + } >>>>>>> + spin_unlock(&xprt->recv_lock); >>>>>>> } >>>>>>> - spin_unlock_bh(&xprt->transport_lock); >>>>>>> } >>>>>>> >>>>>>> static void xprt_add_backlog(struct rpc_xprt *xprt, struct >>>>>>> rpc_task *task) >>>>>>> -- >>>>>>> 2.14.3 >>>>>> >>>>>> I've run into a problem with this version of the patch (on >>>>>> v4.15- >>>>>> rc3). >>>>>> >>>>>> With dbench on NFSv4.0 (on RDMA) I no longer see constant >>>>>> throughput >>>>>> and a temporarily climbing latency followed by a recovery, >>>>>> but >>>>>> this >>>>>> can happen on occasion: >>>>>> >>>>>> releasing clients >>>>>> 16 584 322.55 MB/sec warmup 1 sec latency 807.121 >>>>>> ms >>>>>> 16 584 161.28 MB/sec warmup 2 sec latency >>>>>> 1807.175 ms >>>>>> 16 584 107.52 MB/sec warmup 3 sec latency >>>>>> 2807.198 ms >>>>>> 16 584 80.64 MB/sec warmup 4 sec latency >>>>>> 3807.217 ms >>>>>> 16 584 64.51 MB/sec warmup 5 sec latency >>>>>> 4807.235 ms >>>>>> 16 584 53.76 MB/sec warmup 6 sec latency >>>>>> 5807.251 ms >>>>>> 16 584 46.08 MB/sec warmup 7 sec latency >>>>>> 6807.269 ms >>>>>> 16 584 40.32 MB/sec warmup 8 sec latency >>>>>> 7807.285 ms >>>>>> 16 584 35.84 MB/sec warmup 9 sec latency >>>>>> 8807.303 ms >>>>>> 16 584 32.26 MB/sec warmup 10 sec latency >>>>>> 9807.318 ms >>>>>> 16 584 29.32 MB/sec warmup 11 sec latency >>>>>> 10807.334 >>>>>> ms >>>>>> 16 584 26.88 MB/sec warmup 12 sec latency >>>>>> 11807.355 >>>>>> ms >>>>>> 16 584 24.81 MB/sec warmup 13 sec latency >>>>>> 12807.372 >>>>>> ms >>>>>> 16 584 23.04 MB/sec warmup 14 sec latency >>>>>> 13807.387 >>>>>> ms >>>>>> 16 584 21.50 MB/sec warmup 15 sec latency >>>>>> 14807.406 >>>>>> ms >>>>>> 16 584 20.16 MB/sec warmup 16 sec latency >>>>>> 15807.423 >>>>>> ms >>>>>> >>>>>> No recovery. The latency number keeps climbing, and >>>>>> throughput >>>>>> drops. >>>>>> >>>>>> "kernel: nfs: server klimt-ib not responding, still trying" >>>>>> appears in the client's /var/log/messages. >>>>>> >>>>>> Without this patch applied, the NFSv4.0 behavior is the same >>>>>> as I >>>>>> reported with NFSv3: every once in a while, one RPC >>>>>> completion is >>>>>> lost, but when the timer fires, the client notices the reply >>>>>> actually did arrive and the RPC completes normally. >>>>>> >>>>>> I'm looking into it. >>>>> >>>>> Root cause: >>>>> >>>>> The transport is temporarily out of resources and returns >>>>> -ENOBUFS >>>>> from ->send_request. >>>>> >>>>> The FSM schedules a delay then calls call_transmit again, which >>>>> invokes xprt_prepare_transmit. >>>>> >>>>> On NFSv3 mounts, xprt_prepare_transmit tries to acquire the >>>>> transport write lock. >>>>> >>>>> On NFSv4 mounts, RPC_TASK_NO_RETRANS_TIMEOUT is set. This >>>>> causes >>>>> xprt_prepare_transmit to see that the transport is connected >>>>> and >>>>> rq_connect_cookie is the same as xprt->connect_cookie, so it >>>>> queues the task on ->pending and returns false. >>>>> >>>>> The transport never sends the Call, but still waits for a >>>>> Reply, >>>>> forever. This is not a bug in your patch, it is a pre-existing >>>>> issue. >>>> >>>> Sorry for one more reply... >>>> >>>> I think this last statement is incorrect. It is a bug in your >>>> patch after all. >>>> >>>> I forgot that your patch moves the line that sets >>>> >>>> req->rq_connect_cookie = xprt->connect_cookie; >>>> >>>> to before the call to ->send_request. But rq_connect_cookie >>>> must be set iff ->send_request is successful. Moving this >>>> line seems to fix the issue: >>>> >>>> 1027 req->rq_xtime = ktime_get(); >>>> 1028 status = xprt->ops->send_request(task); >>>> 1029 trace_xprt_transmit(xprt, req->rq_xid, status); >>>> 1030 if (status != 0) { >>>> 1031 task->tk_status = status; >>>> 1032 return; >>>> 1033 } >>>> 1034 req->rq_connect_cookie = xprt->connect_cookie; <<<< >>>> 1035 xprt_inject_disconnect(xprt); >>>> >>> >>> Let's rather apply the attached fix (incremental to the original >>> patch >>> and intended to be applied using 'git rebase --autosquash'). The >>> reason >>> for wanting to set the connect cookie before we call send_request() >>> is >>> to avoid races with disconnects that happen immediately after we >>> sent >>> the data (but before we get round to setting the cookie). >>> >>> 8<-------------------------------------------------------- >>> From 326c12727b179d5c2fd42f374a4c0e2d56ee90c5 Mon Sep 17 00:00:00 >>> 2001 >>> From: Trond Myklebust <trond.myklebust@primarydata.com> >>> Date: Thu, 14 Dec 2017 07:05:27 -0500 >>> Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path >>> >>> --- >>> net/sunrpc/xprt.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>> index 5e4278e9ce37..2cf4cf21a878 100644 >>> --- a/net/sunrpc/xprt.c >>> +++ b/net/sunrpc/xprt.c >>> @@ -1021,10 +1021,11 @@ void xprt_transmit(struct rpc_task *task) >>> /* Turn off autodisconnect */ >>> del_singleshot_timer_sync(&xprt->timer); >>> } >>> + if (!req->rq_bytes_sent) >>> + req->rq_connect_cookie = xprt- >>>> connect_cookie; >>> } else if (!req->rq_bytes_sent) >>> return; >>> >>> - req->rq_connect_cookie = xprt->connect_cookie; >>> req->rq_xtime = ktime_get(); >>> status = xprt->ops->send_request(task); >>> trace_xprt_transmit(xprt, req->rq_xid, status); >>> -- >>> 2.14.3 >> >> Tested with a client that returns -ENOBUFS from ->send_request >> with a little extra frequency. >> >> This doesn't appear to address the problem. The transport >> deadlocks as before when send_request() fails to send the RPC. > > Does the RDMA code update the connect cookie when the connection > breaks? It looks to me as if it only does that when the connection is > re-established. We really want both. > >> I imagine this issue could also impact write buffer exhaustion >> on TCP. > > See net/sunrpc/xprtsock.c:xs_tcp_state_change() xprtrdma manipulates the connect_cookie in its connect worker, see rpcrdma_connect_worker. This was added by: commit 575448bd36208f99fe0dd554a43518d798966740 Author: Tom Talpey <talpey@netapp.com> AuthorDate: Thu Oct 9 15:00:40 2008 -0400 Commit: Trond Myklebust <Trond.Myklebust@netapp.com> CommitDate: Fri Oct 10 15:10:36 2008 -0400 RPC/RDMA: suppress retransmit on RPC/RDMA clients. Would it be more correct to bump the cookie in rpcrdma_conn_upcall, which is the equivalent to xs_tcp_state_change? (if so, why, so I can compose a reasonable patch description) It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure RDMA provides a distinction between "server disconnected" and "client disconnected" although that probably does not matter for this purpose. But, why would the additional cookie update help? The transport is not disconnecting before the deadlock. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gVGh1LCAyMDE3LTEyLTE0IGF0IDE0OjIyIC0wNTAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g PiBPbiBEZWMgMTQsIDIwMTcsIGF0IDI6MDMgUE0sIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jDQo+ID4gb20+IHdyb3RlOg0KPiA+IA0KPiA+IERvZXMgdGhlIFJETUEgY29k ZSB1cGRhdGUgdGhlIGNvbm5lY3QgY29va2llIHdoZW4gdGhlIGNvbm5lY3Rpb24NCj4gPiBicmVh a3M/IEl0IGxvb2tzIHRvIG1lIGFzIGlmIGl0IG9ubHkgZG9lcyB0aGF0IHdoZW4gdGhlIGNvbm5l Y3Rpb24NCj4gPiBpcw0KPiA+IHJlLWVzdGFibGlzaGVkLiBXZSByZWFsbHkgd2FudCBib3RoLg0K PiA+IA0KPiA+ID4gSSBpbWFnaW5lIHRoaXMgaXNzdWUgY291bGQgYWxzbyBpbXBhY3Qgd3JpdGUg YnVmZmVyIGV4aGF1c3Rpb24NCj4gPiA+IG9uIFRDUC4NCj4gPiANCj4gPiBTZWUgbmV0L3N1bnJw Yy94cHJ0c29jay5jOnhzX3RjcF9zdGF0ZV9jaGFuZ2UoKQ0KPiANCj4geHBydHJkbWEgbWFuaXB1 bGF0ZXMgdGhlIGNvbm5lY3RfY29va2llIGluIGl0cyBjb25uZWN0IHdvcmtlciwNCj4gc2VlIHJw Y3JkbWFfY29ubmVjdF93b3JrZXIuIFRoaXMgd2FzIGFkZGVkIGJ5Og0KPiANCj4gY29tbWl0IDU3 NTQ0OGJkMzYyMDhmOTlmZTBkZDU1NGE0MzUxOGQ3OTg5NjY3NDANCj4gQXV0aG9yOiAgICAgVG9t IFRhbHBleSA8dGFscGV5QG5ldGFwcC5jb20+DQo+IEF1dGhvckRhdGU6IFRodSBPY3QgOSAxNTow MDo0MCAyMDA4IC0wNDAwDQo+IENvbW1pdDogICAgIFRyb25kIE15a2xlYnVzdCA8VHJvbmQuTXlr bGVidXN0QG5ldGFwcC5jb20+DQo+IENvbW1pdERhdGU6IEZyaSBPY3QgMTAgMTU6MTA6MzYgMjAw OCAtMDQwMA0KPiANCj4gICAgIFJQQy9SRE1BOiBzdXBwcmVzcyByZXRyYW5zbWl0IG9uIFJQQy9S RE1BIGNsaWVudHMuDQo+IA0KPiBXb3VsZCBpdCBiZSBtb3JlIGNvcnJlY3QgdG8gYnVtcCB0aGUg Y29va2llIGluIHJwY3JkbWFfY29ubl91cGNhbGwsDQo+IHdoaWNoIGlzIHRoZSBlcXVpdmFsZW50 IHRvIHhzX3RjcF9zdGF0ZV9jaGFuZ2U/IChpZiBzbywgd2h5LCBzbw0KPiBJIGNhbiBjb21wb3Nl IGEgcmVhc29uYWJsZSBwYXRjaCBkZXNjcmlwdGlvbikNCj4gDQo+IEl0IGNvdWxkIGJlIGJ1bXBl ZCBpbiB0aGUgUkRNQV9DTV9FVkVOVF9FU1RBQkxJU0hFRCBhbmQgdGhlDQo+IFJETUFfQ01fRVZF TlRfRElTQ09OTkVDVEVEIGNhc2VzLCBmb3IgZXhhbXBsZS4gSSdtIG5vdCBzdXJlDQo+IFJETUEg cHJvdmlkZXMgYSBkaXN0aW5jdGlvbiBiZXR3ZWVuICJzZXJ2ZXIgZGlzY29ubmVjdGVkIg0KPiBh bmQgImNsaWVudCBkaXNjb25uZWN0ZWQiIGFsdGhvdWdoIHRoYXQgcHJvYmFibHkgZG9lcyBub3QN Cj4gbWF0dGVyIGZvciB0aGlzIHB1cnBvc2UuDQo+IA0KPiBCdXQsIHdoeSB3b3VsZCB0aGUgYWRk aXRpb25hbCBjb29raWUgdXBkYXRlIGhlbHA/IFRoZSB0cmFuc3BvcnQNCj4gaXMgbm90IGRpc2Nv bm5lY3RpbmcgYmVmb3JlIHRoZSBkZWFkbG9jay4NCj4gDQoNClRoZSBjb25uZWN0aW9uIGNvb2tp ZSdzIHB1cnBvc2UgaXMgdHdvZm9sZDoNCg0KMSkgSXQgdHJhY2tzIHdoZXRoZXIgb3Igbm90IGEg cmVxdWVzdCBoYXMgYmVlbiB0cmFuc21pdHRlZCBvbiB0aGUNCmN1cnJlbnQgY29ubmVjdGlvbiBv ciBub3QuDQoNCjIpIEl0IGVuc3VyZXMgdGhhdCB3aGVuIHNldmVyYWwgcmVxdWVzdHMgd2l0aCB0 aGUgc2FtZSBjb25uZWN0aW9uDQpjb29raWUgYWxsIGNhbGwgeHBydF9jb25kaXRpb25hbF9kaXNj b25uZWN0KCksIHRoZW4gdGhhdCByZXN1bHRzIGluIGENCnNpbmdsZSBkaXNjb25uZWN0aW9uIGV2 ZW50LiBUbyBkbyBzbywgaXQgYXNzdW1lcyB0aGF0IHhwcnRfYXV0b2Nsb3NlKCkNCndpbGwgY2hh bmdlIHRoZSBjb29raWUgaWYgdGhlIGRpc2Nvbm5lY3Rpb24gYXR0ZW1wdCBpcyBzdWNjZXNzZnVs Lg0KDQpJbiBUQ1Agd2UgZG8gc28gaW4gdGhlIHhzX3RjcF9zdGF0ZV9jaGFuZ2UoKS4gSWYgdGhl IFJETUEgdHJhbnNwb3J0IGNhbg0KZ3VhcmFudGVlIHRoYXQgdGhlIGNhbGwgdG8geHBydC0+b3Bz LT5jbG9zZSh4cHJ0KSBpcyBhbHdheXMgc3VjY2Vzc2Z1bCwNCnRoZW4geW91IGNvdWxkIGRvIHNv IHRoZXJlLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFp bmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Dec 14, 2017, at 3:37 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > > On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote: >>> On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primarydata.c >>> om> wrote: >>> >>> Does the RDMA code update the connect cookie when the connection >>> breaks? It looks to me as if it only does that when the connection >>> is >>> re-established. We really want both. >>> >>>> I imagine this issue could also impact write buffer exhaustion >>>> on TCP. >>> >>> See net/sunrpc/xprtsock.c:xs_tcp_state_change() >> >> xprtrdma manipulates the connect_cookie in its connect worker, >> see rpcrdma_connect_worker. This was added by: >> >> commit 575448bd36208f99fe0dd554a43518d798966740 >> Author: Tom Talpey <talpey@netapp.com> >> AuthorDate: Thu Oct 9 15:00:40 2008 -0400 >> Commit: Trond Myklebust <Trond.Myklebust@netapp.com> >> CommitDate: Fri Oct 10 15:10:36 2008 -0400 >> >> RPC/RDMA: suppress retransmit on RPC/RDMA clients. >> >> Would it be more correct to bump the cookie in rpcrdma_conn_upcall, >> which is the equivalent to xs_tcp_state_change? (if so, why, so >> I can compose a reasonable patch description) >> >> It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the >> RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure >> RDMA provides a distinction between "server disconnected" >> and "client disconnected" although that probably does not >> matter for this purpose. >> >> But, why would the additional cookie update help? The transport >> is not disconnecting before the deadlock. >> > > The connection cookie's purpose is twofold: > > 1) It tracks whether or not a request has been transmitted on the > current connection or not. That's broken by setting the cookie unconditionally outside the transport_lock, isn't it? > 2) It ensures that when several requests with the same connection > cookie all call xprt_conditional_disconnect(), then that results in a > single disconnection event. To do so, it assumes that xprt_autoclose() > will change the cookie if the disconnection attempt is successful. > > In TCP we do so in the xs_tcp_state_change(). If the RDMA transport can > guarantee that the call to xprt->ops->close(xprt) is always successful, > then you could do so there. I don't mind moving the cookie bump to rpcrdma_conn_upcall, but I'm not sure I understand the locking requirements. Currently, xprt_transmit sets the connect_cookie while holding the transport_lock. xprt_conditional_disconnect compares the cookie while holding the transport_lock. For TCP, the transport_lock is held when bumping the cookie in the ESTABLISHED case, but _not_ in the two CLOSE cases? xprtrdma holds the transport_lock when bumping the cookie, which it does in its connect worker. It has to hold the lock because it skips the value 0. xprtrdma needs to guarantee that an RPC is never transmitted on the same connection twice (and maybe it could use rq_connect_cookie instead of its own cookie). xprt_reserve_init is holding the reserve_lock but not the transport_lock when it grabs the cookie. Maybe it should not be initializing the rqst's cookie there? Seems to me that xprt_transmit needs to update the rqst's cookie while holding the transport_lock, especially if xprtrdma needs to skip a cookie value? I'm sure I'm missing something. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gVGh1LCAyMDE3LTEyLTE0IGF0IDE1OjU5IC0wNTAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g PiBPbiBEZWMgMTQsIDIwMTcsIGF0IDM6MzcgUE0sIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jDQo+ID4gb20+IHdyb3RlOg0KPiA+IA0KPiA+IE9uIFRodSwgMjAxNy0xMi0x NCBhdCAxNDoyMiAtMDUwMCwgQ2h1Y2sgTGV2ZXIgd3JvdGU6DQo+ID4gPiA+IE9uIERlYyAxNCwg MjAxNywgYXQgMjowMyBQTSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlkYQ0KPiA+ ID4gPiB0YS5jDQo+ID4gPiA+IG9tPiB3cm90ZToNCj4gPiA+ID4gDQo+ID4gPiA+IERvZXMgdGhl IFJETUEgY29kZSB1cGRhdGUgdGhlIGNvbm5lY3QgY29va2llIHdoZW4gdGhlDQo+ID4gPiA+IGNv bm5lY3Rpb24NCj4gPiA+ID4gYnJlYWtzPyBJdCBsb29rcyB0byBtZSBhcyBpZiBpdCBvbmx5IGRv ZXMgdGhhdCB3aGVuIHRoZQ0KPiA+ID4gPiBjb25uZWN0aW9uDQo+ID4gPiA+IGlzDQo+ID4gPiA+ IHJlLWVzdGFibGlzaGVkLiBXZSByZWFsbHkgd2FudCBib3RoLg0KPiA+ID4gPiANCj4gPiA+ID4g PiBJIGltYWdpbmUgdGhpcyBpc3N1ZSBjb3VsZCBhbHNvIGltcGFjdCB3cml0ZSBidWZmZXINCj4g PiA+ID4gPiBleGhhdXN0aW9uDQo+ID4gPiA+ID4gb24gVENQLg0KPiA+ID4gPiANCj4gPiA+ID4g U2VlIG5ldC9zdW5ycGMveHBydHNvY2suYzp4c190Y3Bfc3RhdGVfY2hhbmdlKCkNCj4gPiA+IA0K PiA+ID4geHBydHJkbWEgbWFuaXB1bGF0ZXMgdGhlIGNvbm5lY3RfY29va2llIGluIGl0cyBjb25u ZWN0IHdvcmtlciwNCj4gPiA+IHNlZSBycGNyZG1hX2Nvbm5lY3Rfd29ya2VyLiBUaGlzIHdhcyBh ZGRlZCBieToNCj4gPiA+IA0KPiA+ID4gY29tbWl0IDU3NTQ0OGJkMzYyMDhmOTlmZTBkZDU1NGE0 MzUxOGQ3OTg5NjY3NDANCj4gPiA+IEF1dGhvcjogICAgIFRvbSBUYWxwZXkgPHRhbHBleUBuZXRh cHAuY29tPg0KPiA+ID4gQXV0aG9yRGF0ZTogVGh1IE9jdCA5IDE1OjAwOjQwIDIwMDggLTA0MDAN Cj4gPiA+IENvbW1pdDogICAgIFRyb25kIE15a2xlYnVzdCA8VHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20+DQo+ID4gPiBDb21taXREYXRlOiBGcmkgT2N0IDEwIDE1OjEwOjM2IDIwMDggLTA0MDAN Cj4gPiA+IA0KPiA+ID4gICAgUlBDL1JETUE6IHN1cHByZXNzIHJldHJhbnNtaXQgb24gUlBDL1JE TUEgY2xpZW50cy4NCj4gPiA+IA0KPiA+ID4gV291bGQgaXQgYmUgbW9yZSBjb3JyZWN0IHRvIGJ1 bXAgdGhlIGNvb2tpZSBpbg0KPiA+ID4gcnBjcmRtYV9jb25uX3VwY2FsbCwNCj4gPiA+IHdoaWNo IGlzIHRoZSBlcXVpdmFsZW50IHRvIHhzX3RjcF9zdGF0ZV9jaGFuZ2U/IChpZiBzbywgd2h5LCBz bw0KPiA+ID4gSSBjYW4gY29tcG9zZSBhIHJlYXNvbmFibGUgcGF0Y2ggZGVzY3JpcHRpb24pDQo+ ID4gPiANCj4gPiA+IEl0IGNvdWxkIGJlIGJ1bXBlZCBpbiB0aGUgUkRNQV9DTV9FVkVOVF9FU1RB QkxJU0hFRCBhbmQgdGhlDQo+ID4gPiBSRE1BX0NNX0VWRU5UX0RJU0NPTk5FQ1RFRCBjYXNlcywg Zm9yIGV4YW1wbGUuIEknbSBub3Qgc3VyZQ0KPiA+ID4gUkRNQSBwcm92aWRlcyBhIGRpc3RpbmN0 aW9uIGJldHdlZW4gInNlcnZlciBkaXNjb25uZWN0ZWQiDQo+ID4gPiBhbmQgImNsaWVudCBkaXNj b25uZWN0ZWQiIGFsdGhvdWdoIHRoYXQgcHJvYmFibHkgZG9lcyBub3QNCj4gPiA+IG1hdHRlciBm b3IgdGhpcyBwdXJwb3NlLg0KPiA+ID4gDQo+ID4gPiBCdXQsIHdoeSB3b3VsZCB0aGUgYWRkaXRp b25hbCBjb29raWUgdXBkYXRlIGhlbHA/IFRoZSB0cmFuc3BvcnQNCj4gPiA+IGlzIG5vdCBkaXNj b25uZWN0aW5nIGJlZm9yZSB0aGUgZGVhZGxvY2suDQo+ID4gPiANCj4gPiANCj4gPiBUaGUgY29u bmVjdGlvbiBjb29raWUncyBwdXJwb3NlIGlzIHR3b2ZvbGQ6DQo+ID4gDQo+ID4gMSkgSXQgdHJh Y2tzIHdoZXRoZXIgb3Igbm90IGEgcmVxdWVzdCBoYXMgYmVlbiB0cmFuc21pdHRlZCBvbiB0aGUN Cj4gPiBjdXJyZW50IGNvbm5lY3Rpb24gb3Igbm90Lg0KPiANCj4gVGhhdCdzIGJyb2tlbiBieSBz ZXR0aW5nIHRoZSBjb29raWUgdW5jb25kaXRpb25hbGx5IG91dHNpZGUNCj4gdGhlIHRyYW5zcG9y dF9sb2NrLCBpc24ndCBpdD8NCj4gDQo+IA0KPiA+IDIpIEl0IGVuc3VyZXMgdGhhdCB3aGVuIHNl dmVyYWwgcmVxdWVzdHMgd2l0aCB0aGUgc2FtZSBjb25uZWN0aW9uDQo+ID4gY29va2llIGFsbCBj YWxsIHhwcnRfY29uZGl0aW9uYWxfZGlzY29ubmVjdCgpLCB0aGVuIHRoYXQgcmVzdWx0cyBpbg0K PiA+IGENCj4gPiBzaW5nbGUgZGlzY29ubmVjdGlvbiBldmVudC4gVG8gZG8gc28sIGl0IGFzc3Vt ZXMgdGhhdA0KPiA+IHhwcnRfYXV0b2Nsb3NlKCkNCj4gPiB3aWxsIGNoYW5nZSB0aGUgY29va2ll IGlmIHRoZSBkaXNjb25uZWN0aW9uIGF0dGVtcHQgaXMgc3VjY2Vzc2Z1bC4NCj4gPiANCj4gPiBJ biBUQ1Agd2UgZG8gc28gaW4gdGhlIHhzX3RjcF9zdGF0ZV9jaGFuZ2UoKS4gSWYgdGhlIFJETUEg dHJhbnNwb3J0DQo+ID4gY2FuDQo+ID4gZ3VhcmFudGVlIHRoYXQgdGhlIGNhbGwgdG8geHBydC0+ b3BzLT5jbG9zZSh4cHJ0KSBpcyBhbHdheXMNCj4gPiBzdWNjZXNzZnVsLA0KPiA+IHRoZW4geW91 IGNvdWxkIGRvIHNvIHRoZXJlLg0KPiANCj4gSSBkb24ndCBtaW5kIG1vdmluZyB0aGUgY29va2ll IGJ1bXAgdG8gcnBjcmRtYV9jb25uX3VwY2FsbCwNCj4gYnV0IEknbSBub3Qgc3VyZSBJIHVuZGVy c3RhbmQgdGhlIGxvY2tpbmcgcmVxdWlyZW1lbnRzLg0KPiANCj4gQ3VycmVudGx5LCB4cHJ0X3Ry YW5zbWl0IHNldHMgdGhlIGNvbm5lY3RfY29va2llIHdoaWxlIGhvbGRpbmcNCj4gdGhlIHRyYW5z cG9ydF9sb2NrLg0KPiANCj4geHBydF9jb25kaXRpb25hbF9kaXNjb25uZWN0IGNvbXBhcmVzIHRo ZSBjb29raWUgd2hpbGUgaG9sZGluZw0KPiB0aGUgdHJhbnNwb3J0X2xvY2suDQo+IA0KPiBGb3Ig VENQLCB0aGUgdHJhbnNwb3J0X2xvY2sgaXMgaGVsZCB3aGVuIGJ1bXBpbmcgdGhlIGNvb2tpZQ0K PiBpbiB0aGUgRVNUQUJMSVNIRUQgY2FzZSwgYnV0IF9ub3RfIGluIHRoZSB0d28gQ0xPU0UgY2Fz ZXM/DQoNClRoYXQgc2hvdWxkIGJlIE9LLiBUaGUgbmV0d29ya2luZyBsYXllciBzaG91bGQgcHJv dmlkZSBzdWZmaWNpZW50DQpzZXJpYWxpc2F0aW9uIHRoYXQgd2UgZG9uJ3QgaGF2ZSB0byB3b3Jy eSBhYm91dCBjb2xsaXNpb25zLg0KDQo+IA0KPiB4cHJ0cmRtYSBob2xkcyB0aGUgdHJhbnNwb3J0 X2xvY2sgd2hlbiBidW1waW5nIHRoZSBjb29raWUsDQo+IHdoaWNoIGl0IGRvZXMgaW4gaXRzIGNv bm5lY3Qgd29ya2VyLiBJdCBoYXMgdG8gaG9sZCB0aGUgbG9jaw0KPiBiZWNhdXNlIGl0IHNraXBz IHRoZSB2YWx1ZSAwLiB4cHJ0cmRtYSBuZWVkcyB0byBndWFyYW50ZWUNCj4gdGhhdCBhbiBSUEMg aXMgbmV2ZXIgdHJhbnNtaXR0ZWQgb24gdGhlIHNhbWUgY29ubmVjdGlvbg0KPiB0d2ljZSAoYW5k IG1heWJlIGl0IGNvdWxkIHVzZSBycV9jb25uZWN0X2Nvb2tpZSBpbnN0ZWFkIG9mDQo+IGl0cyBv d24gY29va2llKS4NCj4gDQo+IHhwcnRfcmVzZXJ2ZV9pbml0IGlzIGhvbGRpbmcgdGhlIHJlc2Vy dmVfbG9jayBidXQgbm90IHRoZQ0KPiB0cmFuc3BvcnRfbG9jayB3aGVuIGl0IGdyYWJzIHRoZSBj b29raWUuIE1heWJlIGl0IHNob3VsZA0KPiBub3QgYmUgaW5pdGlhbGl6aW5nIHRoZSBycXN0J3Mg Y29va2llIHRoZXJlPw0KPiANCj4gU2VlbXMgdG8gbWUgdGhhdCB4cHJ0X3RyYW5zbWl0IG5lZWRz IHRvIHVwZGF0ZSB0aGUgcnFzdCdzDQo+IGNvb2tpZSB3aGlsZSBob2xkaW5nIHRoZSB0cmFuc3Bv cnRfbG9jaywgZXNwZWNpYWxseSBpZg0KPiB4cHJ0cmRtYSBuZWVkcyB0byBza2lwIGEgY29va2ll IHZhbHVlPyBJJ20gc3VyZSBJJ20gbWlzc2luZw0KPiBzb21ldGhpbmcuDQo+IA0KDQpJdCBzaG91 bGQgYmUgT0ssIGdpdmVuIHRoYXQgdGhlIGNvbm5lY3Rpb24gaXMgYSBzdGF0ZSBtYWNoaW5lLg0K SG93ZXZlciwgSSBtaXNzZWQgc29tZXRoaW5nIHRoYXQgeW91IHNhaWQgZWFybGllciBhYm91dA0K eHBydF9wcmVwYXJlX3RyYW5zbWl0KCkuDQoNCk9LLiBIb3cgYWJvdXQgdGhlIGZvbGxvd2luZyBm aXh1cCBwYXRjaCBpbnN0ZWFkIG9mIHRoZSBlYXJsaWVyIG9uZT8NCg0KODwtLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCkZyb20gMjFjZGIyODAyZDlk OGI3MTU1Mzk5OGU2YmU1YWFmZWZmMDc0MjE0MiBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCkZy b206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkRh dGU6IFRodSwgMTQgRGVjIDIwMTcgMDc6MDU6MjcgLTA1MDANClN1YmplY3Q6IFtQQVRDSF0gZml4 dXAhIFNVTlJQQzogRml4IGEgcmFjZSBpbiB0aGUgcmVjZWl2ZSBjb2RlIHBhdGgNCg0KLS0tDQog bmV0L3N1bnJwYy94cHJ0LmMgfCA0ICsrKy0NCiAxIGZpbGUgY2hhbmdlZCwgMyBpbnNlcnRpb25z KCspLCAxIGRlbGV0aW9uKC0pDQoNCmRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3hwcnQuYyBiL25l dC9zdW5ycGMveHBydC5jDQppbmRleCA1ZTQyNzhlOWNlMzcuLjMzYjc0ZmQ4NDA1MSAxMDA2NDQN Ci0tLSBhL25ldC9zdW5ycGMveHBydC5jDQorKysgYi9uZXQvc3VucnBjL3hwcnQuYw0KQEAgLTEw MDEsNiArMTAwMSw3IEBAIHZvaWQgeHBydF90cmFuc21pdChzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2sp DQogew0KIAlzdHJ1Y3QgcnBjX3Jxc3QJKnJlcSA9IHRhc2stPnRrX3Jxc3RwOw0KIAlzdHJ1Y3Qg cnBjX3hwcnQJKnhwcnQgPSByZXEtPnJxX3hwcnQ7DQorCXVuc2lnbmVkIGludCBjb25uZWN0X2Nv b2tpZTsNCiAJaW50IHN0YXR1cywgbnVtcmVxczsNCiANCiAJZHByaW50aygiUlBDOiAlNXUgeHBy dF90cmFuc21pdCgldSlcbiIsIHRhc2stPnRrX3BpZCwgcmVxLT5ycV9zbGVuKTsNCkBAIC0xMDI0 LDcgKzEwMjUsNyBAQCB2b2lkIHhwcnRfdHJhbnNtaXQoc3RydWN0IHJwY190YXNrICp0YXNrKQ0K IAl9IGVsc2UgaWYgKCFyZXEtPnJxX2J5dGVzX3NlbnQpDQogCQlyZXR1cm47DQogDQotCXJlcS0+ cnFfY29ubmVjdF9jb29raWUgPSB4cHJ0LT5jb25uZWN0X2Nvb2tpZTsNCisJY29ubmVjdF9jb29r aWUgPSB4cHJ0LT5jb25uZWN0X2Nvb2tpZTsNCiAJcmVxLT5ycV94dGltZSA9IGt0aW1lX2dldCgp Ow0KIAlzdGF0dXMgPSB4cHJ0LT5vcHMtPnNlbmRfcmVxdWVzdCh0YXNrKTsNCiAJdHJhY2VfeHBy dF90cmFuc21pdCh4cHJ0LCByZXEtPnJxX3hpZCwgc3RhdHVzKTsNCkBAIC0xMDUwLDYgKzEwNTEs NyBAQCB2b2lkIHhwcnRfdHJhbnNtaXQoc3RydWN0IHJwY190YXNrICp0YXNrKQ0KIAl4cHJ0LT5z dGF0LnBlbmRpbmdfdSArPSB4cHJ0LT5wZW5kaW5nLnFsZW47DQogCXNwaW5fdW5sb2NrX2JoKCZ4 cHJ0LT50cmFuc3BvcnRfbG9jayk7DQogDQorCXJlcS0+cnFfY29ubmVjdF9jb29raWUgPSBjb25u ZWN0X2Nvb2tpZTsNCiAJaWYgKHJwY19yZXBseV9leHBlY3RlZCh0YXNrKSAmJiAhUkVBRF9PTkNF KHJlcS0+cnFfcmVwbHlfYnl0ZXNfcmVjdmQpKSB7DQogCQkvKg0KIAkJICogU2xlZXAgb24gdGhl IHBlbmRpbmcgcXVldWUgaWYgd2UncmUgZXhwZWN0aW5nIGEgcmVwbHkuDQotLSANCjIuMTQuMw0K DQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmlt YXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Dec 14, 2017, at 4:33 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > > On Thu, 2017-12-14 at 15:59 -0500, Chuck Lever wrote: >>> On Dec 14, 2017, at 3:37 PM, Trond Myklebust <trondmy@primarydata.c >>> om> wrote: >>> >>> On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote: >>>>> On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primaryda >>>>> ta.c >>>>> om> wrote: >>>>> >>>>> Does the RDMA code update the connect cookie when the >>>>> connection >>>>> breaks? It looks to me as if it only does that when the >>>>> connection >>>>> is >>>>> re-established. We really want both. >>>>> >>>>>> I imagine this issue could also impact write buffer >>>>>> exhaustion >>>>>> on TCP. >>>>> >>>>> See net/sunrpc/xprtsock.c:xs_tcp_state_change() >>>> >>>> xprtrdma manipulates the connect_cookie in its connect worker, >>>> see rpcrdma_connect_worker. This was added by: >>>> >>>> commit 575448bd36208f99fe0dd554a43518d798966740 >>>> Author: Tom Talpey <talpey@netapp.com> >>>> AuthorDate: Thu Oct 9 15:00:40 2008 -0400 >>>> Commit: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> CommitDate: Fri Oct 10 15:10:36 2008 -0400 >>>> >>>> RPC/RDMA: suppress retransmit on RPC/RDMA clients. >>>> >>>> Would it be more correct to bump the cookie in >>>> rpcrdma_conn_upcall, >>>> which is the equivalent to xs_tcp_state_change? (if so, why, so >>>> I can compose a reasonable patch description) >>>> >>>> It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the >>>> RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure >>>> RDMA provides a distinction between "server disconnected" >>>> and "client disconnected" although that probably does not >>>> matter for this purpose. >>>> >>>> But, why would the additional cookie update help? The transport >>>> is not disconnecting before the deadlock. >>>> >>> >>> The connection cookie's purpose is twofold: >>> >>> 1) It tracks whether or not a request has been transmitted on the >>> current connection or not. >> >> That's broken by setting the cookie unconditionally outside >> the transport_lock, isn't it? >> >> >>> 2) It ensures that when several requests with the same connection >>> cookie all call xprt_conditional_disconnect(), then that results in >>> a >>> single disconnection event. To do so, it assumes that >>> xprt_autoclose() >>> will change the cookie if the disconnection attempt is successful. >>> >>> In TCP we do so in the xs_tcp_state_change(). If the RDMA transport >>> can >>> guarantee that the call to xprt->ops->close(xprt) is always >>> successful, >>> then you could do so there. >> >> I don't mind moving the cookie bump to rpcrdma_conn_upcall, >> but I'm not sure I understand the locking requirements. >> >> Currently, xprt_transmit sets the connect_cookie while holding >> the transport_lock. >> >> xprt_conditional_disconnect compares the cookie while holding >> the transport_lock. >> >> For TCP, the transport_lock is held when bumping the cookie >> in the ESTABLISHED case, but _not_ in the two CLOSE cases? > > That should be OK. The networking layer should provide sufficient > serialisation that we don't have to worry about collisions. > >> >> xprtrdma holds the transport_lock when bumping the cookie, >> which it does in its connect worker. It has to hold the lock >> because it skips the value 0. xprtrdma needs to guarantee >> that an RPC is never transmitted on the same connection >> twice (and maybe it could use rq_connect_cookie instead of >> its own cookie). >> >> xprt_reserve_init is holding the reserve_lock but not the >> transport_lock when it grabs the cookie. Maybe it should >> not be initializing the rqst's cookie there? >> >> Seems to me that xprt_transmit needs to update the rqst's >> cookie while holding the transport_lock, especially if >> xprtrdma needs to skip a cookie value? I'm sure I'm missing >> something. >> > > It should be OK, given that the connection is a state machine. > However, I missed something that you said earlier about > xprt_prepare_transmit(). > > OK. How about the following fixup patch instead of the earlier one? > > 8<--------------------------------------------------- > From 21cdb2802d9d8b71553998e6be5aafeff0742142 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 14 Dec 2017 07:05:27 -0500 > Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path > > --- > net/sunrpc/xprt.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 5e4278e9ce37..33b74fd84051 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task) > { > struct rpc_rqst *req = task->tk_rqstp; > struct rpc_xprt *xprt = req->rq_xprt; > + unsigned int connect_cookie; > int status, numreqs; > > dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); > @@ -1024,7 +1025,7 @@ void xprt_transmit(struct rpc_task *task) > } else if (!req->rq_bytes_sent) > return; > > - req->rq_connect_cookie = xprt->connect_cookie; > + connect_cookie = xprt->connect_cookie; > req->rq_xtime = ktime_get(); > status = xprt->ops->send_request(task); > trace_xprt_transmit(xprt, req->rq_xid, status); > @@ -1050,6 +1051,7 @@ void xprt_transmit(struct rpc_task *task) > xprt->stat.pending_u += xprt->pending.qlen; > spin_unlock_bh(&xprt->transport_lock); > > + req->rq_connect_cookie = connect_cookie; > if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { > /* > * Sleep on the pending queue if we're expecting a reply. > -- > 2.14.3 No problems here, passed basic testing with NFSv4.0 on a client with extra send_request fault injection. I hope we can get the recv race fix (as updated here) and the queue-work-on patch [1] into v4.15-rc. -- Chuck Lever [1] https://marc.info/?l=linux-nfs&m=151241427912572&w=2-- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 333b9d697ae5..5e4278e9ce37 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1024,6 +1024,7 @@ void xprt_transmit(struct rpc_task *task) } else if (!req->rq_bytes_sent) return; + req->rq_connect_cookie = xprt->connect_cookie; req->rq_xtime = ktime_get(); status = xprt->ops->send_request(task); trace_xprt_transmit(xprt, req->rq_xid, status); @@ -1047,20 +1048,27 @@ void xprt_transmit(struct rpc_task *task) xprt->stat.bklog_u += xprt->backlog.qlen; xprt->stat.sending_u += xprt->sending.qlen; xprt->stat.pending_u += xprt->pending.qlen; + spin_unlock_bh(&xprt->transport_lock); - /* Don't race with disconnect */ - if (!xprt_connected(xprt)) - task->tk_status = -ENOTCONN; - else { + if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { /* - * Sleep on the pending queue since - * we're expecting a reply. + * Sleep on the pending queue if we're expecting a reply. + * The spinlock ensures atomicity between the test of + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). */ - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) + spin_lock(&xprt->recv_lock); + if (!req->rq_reply_bytes_recvd) { rpc_sleep_on(&xprt->pending, task, xprt_timer); - req->rq_connect_cookie = xprt->connect_cookie; + /* + * Send an extra queue wakeup call if the + * connection was dropped in case the call to + * rpc_sleep_on() raced. + */ + if (!xprt_connected(xprt)) + xprt_wake_pending_tasks(xprt, -ENOTCONN); + } + spin_unlock(&xprt->recv_lock); } - spin_unlock_bh(&xprt->transport_lock); } static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task)