Message ID | 20180109192340.25702-1-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote: > The following sequence: > * Change queue pair state into IB_QPS_ERR. > * Post a work request on the queue pair. > Triggers the following race condition in the rdma_rxe driver: > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function > that examines the QP send queue. > * rxe_post_send() posts a work request on the QP send queue. If rxe_completer() runs before rxe_post_send(), the send queue is believed to be empty while a stale work request stays on the send queue indefinitely. To avoid this race, schedule rxe_completer() after a work request is queued on a qp in the error state by rxe_post_send(). I think that improves the log message, yes?
On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote: > On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote: > > The following sequence: > > * Change queue pair state into IB_QPS_ERR. > > * Post a work request on the queue pair. > > Triggers the following race condition in the rdma_rxe driver: > > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function > > that examines the QP send queue. > > * rxe_post_send() posts a work request on the QP send queue. > > If rxe_completer() runs before rxe_post_send(), the send queue is > believed to be empty while a stale work request stays on the send queue > indefinitely. To avoid this race, schedule rxe_completer() after a work > request is queued on a qp in the error state by rxe_post_send(). > > I think that improves the log message, yes? > I did some further edits. But, patch applied to for-next.
On Wed, Jan 10, 2018 at 05:01:40PM -0500, Doug Ledford wrote: > On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote: > > On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote: > > > The following sequence: > > > * Change queue pair state into IB_QPS_ERR. > > > * Post a work request on the queue pair. > > > Triggers the following race condition in the rdma_rxe driver: > > > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function > > > that examines the QP send queue. > > > * rxe_post_send() posts a work request on the QP send queue. > > > > If rxe_completer() runs before rxe_post_send(), the send queue is > > believed to be empty while a stale work request stays on the send queue > > indefinitely. To avoid this race, schedule rxe_completer() after a work > > request is queued on a qp in the error state by rxe_post_send(). > > > > I think that improves the log message, yes? > > > > I did some further edits. But, patch applied to for-next. The proposed patch definitely decreases the chance of races, but it is not fixing them. There is a chance to have change in qp state immediately after your "if ..." check. Thanks > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
On Tue, Jan 9, 2018 at 9:23 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > The following sequence: > * Change queue pair state into IB_QPS_ERR. > * Post a work request on the queue pair. > Triggers the following race condition in the rdma_rxe driver: > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function > that examines the QP send queue. > * rxe_post_send() posts a work request on the QP send queue. > Avoid that this race causes a work request to be ignored by scheduling > an rxe_completer() call from rxe_post_send() for queues that are in the > error state. > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Moni Shoua <monis@mellanox.com> > Cc: <stable@vger.kernel.org> # v4.8 > --- > drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c > index a6fbed48db8a..8f631d64c192 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.c > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c > @@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr, > (queue_count(qp->sq.queue) > 1); > > rxe_run_task(&qp->req.task, must_sched); > + if (unlikely(qp->req.state == QP_STATE_ERROR)) > + rxe_run_task(&qp->comp.task, 1); > > return err; > } > -- > 2.15.1 > Maybe I am missing something but I think that the race is when qp is in ERROR state and the following functions run in parallel * rxe_drain_req_pkts (called from rxe_requester after post_send) * rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR) Am I right? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote: > The proposed patch definitely decreases the chance of races, but it is not fixing them. > There is a chance to have change in qp state immediately after your "if ..." check. Hello Leon, Please have a look at rxe_qp_error() and you will see that the patch I posted is a proper fix. In the scenario you described rxe_qp_error() will trigger a run of rxe_completer(). Bart.
On Thu, 2018-01-11 at 13:27 +0200, Moni Shoua wrote: > Maybe I am missing something but I think that the race is when qp is > in ERROR state and the following functions run in parallel > * rxe_drain_req_pkts (called from rxe_requester after post_send) > * rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR) > > Am I right? Hello Moni, I think that's a real race and a race that has to be fixed but not the race that caused the missing completions in the tests I ran myself. Best regards, Bart.
On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote: > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote: > > The proposed patch definitely decreases the chance of races, but it is not fixing them. > > There is a chance to have change in qp state immediately after your "if ..." check. > > Hello Leon, > > Please have a look at rxe_qp_error() and you will see that the patch I posted > is a proper fix. In the scenario you described rxe_qp_error() will trigger a > run of rxe_completer(). Bart, What am I missing? CPU1 CPU2 if (unlikely.... <--- /* move the qp to the error state */ void rxe_qp_error(struct rxe_qp *qp) { qp->req.state = QP_STATE_ERROR; qp->resp.state = QP_STATE_ERROR; qp->attr.qp_state = IB_QPS_ERR; ---> rxe_run_task(&qp->req.task, must_sched); It is more or less the same as without "if (unlikely..." Thanks > > Bart.
T24gVGh1LCAyMDE4LTAxLTExIGF0IDIxOjAwICswMjAwLCBMZW9uIFJvbWFub3Zza3kgd3JvdGU6 DQo+IE9uIFRodSwgSmFuIDExLCAyMDE4IGF0IDA0OjAyOjMzUE0gKzAwMDAsIEJhcnQgVmFuIEFz c2NoZSB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTgtMDEtMTEgYXQgMDg6MjIgKzAyMDAsIExlb24g Um9tYW5vdnNreSB3cm90ZToNCj4gPiA+IFRoZSBwcm9wb3NlZCBwYXRjaCBkZWZpbml0ZWx5IGRl Y3JlYXNlcyB0aGUgY2hhbmNlIG9mIHJhY2VzLCBidXQgaXQgaXMgbm90IGZpeGluZyB0aGVtLg0K PiA+ID4gVGhlcmUgaXMgYSBjaGFuY2UgdG8gaGF2ZSBjaGFuZ2UgaW4gcXAgc3RhdGUgaW1tZWRp YXRlbHkgYWZ0ZXIgeW91ciAiaWYgLi4uIiBjaGVjay4NCj4gPiANCj4gPiBIZWxsbyBMZW9uLA0K PiA+IA0KPiA+IFBsZWFzZSBoYXZlIGEgbG9vayBhdCByeGVfcXBfZXJyb3IoKSBhbmQgeW91IHdp bGwgc2VlIHRoYXQgdGhlIHBhdGNoIEkgcG9zdGVkDQo+ID4gaXMgYSBwcm9wZXIgZml4LiBJbiB0 aGUgc2NlbmFyaW8geW91IGRlc2NyaWJlZCByeGVfcXBfZXJyb3IoKSB3aWxsIHRyaWdnZXIgYQ0K PiA+IHJ1biBvZiByeGVfY29tcGxldGVyKCkuDQo+IA0KPiBCYXJ0LA0KPiANCj4gV2hhdCBhbSBJ IG1pc3Npbmc/DQo+IA0KPiBDUFUxCQkJCQkJCUNQVTINCj4gCQkJCQkJCWlmICh1bmxpa2VseS4u Li4NCj4gCQkJCQkJPC0tLQ0KPiAvKiBtb3ZlIHRoZSBxcCB0byB0aGUgZXJyb3Igc3RhdGUgKi8N Cj4gdm9pZCByeGVfcXBfZXJyb3Ioc3RydWN0IHJ4ZV9xcCAqcXApDQo+IHsNCj4gICAgICAgICBx cC0+cmVxLnN0YXRlID0gUVBfU1RBVEVfRVJST1I7DQo+ICAgICAgICAgcXAtPnJlc3Auc3RhdGUg PSBRUF9TVEFURV9FUlJPUjsNCj4gICAgICAgICBxcC0+YXR0ci5xcF9zdGF0ZSA9IElCX1FQU19F UlI7DQo+IAkJCQkJCS0tLT4NCj4gCQkJCQkJCXJ4ZV9ydW5fdGFzaygmcXAtPnJlcS50YXNrLCBt dXN0X3NjaGVkKTsNCj4gDQo+IA0KPiANCj4gSXQgaXMgbW9yZSBvciBsZXNzIHRoZSBzYW1lIGFz IHdpdGhvdXQgImlmICh1bmxpa2VseS4uLiINCg0KSGVsbG8gTGVvbiwNCg0KSW4gdGhlIGFib3Zl IHRoZSBwYXJ0IG9mIHJ4ZV9xcF9lcnJvcigpIHRoYXQgSSB3YXMgcmVmZXJyaW5nIHRvIGluIG15 IGUtbWFpbA0KaXMgbWlzc2luZzoNCg0KCWlmIChxcF90eXBlKHFwKSA9PSBJQl9RUFRfUkMpDQoJ CXJ4ZV9ydW5fdGFzaygmcXAtPmNvbXAudGFzaywgMSk7DQoNCkJhcnQu -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote: > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote: > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote: > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote: > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them. > > > > There is a chance to have change in qp state immediately after your "if ..." check. > > > > > > Hello Leon, > > > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a > > > run of rxe_completer(). > > > > Bart, > > > > What am I missing? > > > > CPU1 CPU2 > > if (unlikely.... > > <--- > > /* move the qp to the error state */ > > void rxe_qp_error(struct rxe_qp *qp) > > { > > qp->req.state = QP_STATE_ERROR; > > qp->resp.state = QP_STATE_ERROR; > > qp->attr.qp_state = IB_QPS_ERR; > > ---> > > rxe_run_task(&qp->req.task, must_sched); > > > > > > > > It is more or less the same as without "if (unlikely..." > > Hello Leon, > > In the above the part of rxe_qp_error() that I was referring to in my e-mail > is missing: > > if (qp_type(qp) == IB_QPT_RC) > rxe_run_task(&qp->comp.task, 1); But it is exactly where race exists, as long QP isn't protected, it can switch CPUs and create race. Thanks > > Bart.
T24gRnJpLCAyMDE4LTAxLTEyIGF0IDA4OjIzICswMjAwLCBMZW9uIFJvbWFub3Zza3kgd3JvdGU6 DQo+IE9uIFRodSwgSmFuIDExLCAyMDE4IGF0IDA3OjA3OjA2UE0gKzAwMDAsIEJhcnQgVmFuIEFz c2NoZSB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTgtMDEtMTEgYXQgMjE6MDAgKzAyMDAsIExlb24g Um9tYW5vdnNreSB3cm90ZToNCj4gPiA+IE9uIFRodSwgSmFuIDExLCAyMDE4IGF0IDA0OjAyOjMz UE0gKzAwMDAsIEJhcnQgVmFuIEFzc2NoZSB3cm90ZToNCj4gPiA+ID4gT24gVGh1LCAyMDE4LTAx LTExIGF0IDA4OjIyICswMjAwLCBMZW9uIFJvbWFub3Zza3kgd3JvdGU6DQo+ID4gPiA+ID4gVGhl IHByb3Bvc2VkIHBhdGNoIGRlZmluaXRlbHkgZGVjcmVhc2VzIHRoZSBjaGFuY2Ugb2YgcmFjZXMs IGJ1dCBpdCBpcyBub3QgZml4aW5nIHRoZW0uDQo+ID4gPiA+ID4gVGhlcmUgaXMgYSBjaGFuY2Ug dG8gaGF2ZSBjaGFuZ2UgaW4gcXAgc3RhdGUgaW1tZWRpYXRlbHkgYWZ0ZXIgeW91ciAiaWYgLi4u IiBjaGVjay4NCj4gPiA+ID4gDQo+ID4gPiA+IEhlbGxvIExlb24sDQo+ID4gPiA+IA0KPiA+ID4g PiBQbGVhc2UgaGF2ZSBhIGxvb2sgYXQgcnhlX3FwX2Vycm9yKCkgYW5kIHlvdSB3aWxsIHNlZSB0 aGF0IHRoZSBwYXRjaCBJIHBvc3RlZA0KPiA+ID4gPiBpcyBhIHByb3BlciBmaXguIEluIHRoZSBz Y2VuYXJpbyB5b3UgZGVzY3JpYmVkIHJ4ZV9xcF9lcnJvcigpIHdpbGwgdHJpZ2dlciBhDQo+ID4g PiA+IHJ1biBvZiByeGVfY29tcGxldGVyKCkuDQo+ID4gPiANCj4gPiA+IEJhcnQsDQo+ID4gPiAN Cj4gPiA+IFdoYXQgYW0gSSBtaXNzaW5nPw0KPiA+ID4gDQo+ID4gPiBDUFUxCQkJCQkJCUNQVTIN Cj4gPiA+IAkJCQkJCQlpZiAodW5saWtlbHkuLi4uDQo+ID4gPiAJCQkJCQk8LS0tDQo+ID4gPiAv KiBtb3ZlIHRoZSBxcCB0byB0aGUgZXJyb3Igc3RhdGUgKi8NCj4gPiA+IHZvaWQgcnhlX3FwX2Vy cm9yKHN0cnVjdCByeGVfcXAgKnFwKQ0KPiA+ID4gew0KPiA+ID4gICAgICAgICBxcC0+cmVxLnN0 YXRlID0gUVBfU1RBVEVfRVJST1I7DQo+ID4gPiAgICAgICAgIHFwLT5yZXNwLnN0YXRlID0gUVBf U1RBVEVfRVJST1I7DQo+ID4gPiAgICAgICAgIHFwLT5hdHRyLnFwX3N0YXRlID0gSUJfUVBTX0VS UjsNCj4gPiA+IAkJCQkJCS0tLT4NCj4gPiA+IAkJCQkJCQlyeGVfcnVuX3Rhc2soJnFwLT5yZXEu dGFzaywgbXVzdF9zY2hlZCk7DQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gDQo+ID4gPiBJdCBpcyBt b3JlIG9yIGxlc3MgdGhlIHNhbWUgYXMgd2l0aG91dCAiaWYgKHVubGlrZWx5Li4uIg0KPiA+IA0K PiA+IEhlbGxvIExlb24sDQo+ID4gDQo+ID4gSW4gdGhlIGFib3ZlIHRoZSBwYXJ0IG9mIHJ4ZV9x cF9lcnJvcigpIHRoYXQgSSB3YXMgcmVmZXJyaW5nIHRvIGluIG15IGUtbWFpbA0KPiA+IGlzIG1p c3Npbmc6DQo+ID4gDQo+ID4gCWlmIChxcF90eXBlKHFwKSA9PSBJQl9RUFRfUkMpDQo+ID4gCQly eGVfcnVuX3Rhc2soJnFwLT5jb21wLnRhc2ssIDEpOw0KPiANCj4gDQo+IEJ1dCBpdCBpcyBleGFj dGx5IHdoZXJlIHJhY2UgZXhpc3RzLCBhcyBsb25nIFFQIGlzbid0IHByb3RlY3RlZCwgaXQgY2Fu DQo+IHN3aXRjaCBDUFVzIGFuZCBjcmVhdGUgcmFjZS4NCg0KSGVsbG8gTGVvbiwNCg0KQ2FuIHlv dSBjbGFyaWZ5IHdoaWNoIHJhY2UgeW91IGFyZSByZWZlcnJpbmcgdG8/IHJ4ZV9ydW5fdGFzaygp IHVzZXMgdGhlDQp0YXNrbGV0IG1lY2hhbmlzbSBhbmQgdGFza2xldHMgYXJlIGd1YXJhbnRlZWQg dG8gcnVuIG9uIGF0IG1vc3Qgb25lIENQVSBhdCBhDQp0aW1lLiBTZWUgYWxzbyB0aGUgIlRvcCBh bmQgQm90dG9tIEhhbHZlcyIgY2hhcHRlciBpbiBMaW51eCBEZXZpY2UgRHJpdmVycywNCjNyZCBl ZGl0aW9uLiBTZWUgYWxzbyB0aGUgdGFza2xldF9zY2hlZHVsZSgpIGltcGxlbWVudGF0aW9uIGlu DQo8bGludXgvaW50ZXJydXB0Lmg+IGFuZCBpbiBrZXJuZWwvc29mdGlycS5jLg0KDQpUaGFua3Ms DQoNCkJhcnQu -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 12, 2018 at 06:32:47AM +0000, Bart Van Assche wrote: > On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote: > > On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote: > > > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote: > > > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote: > > > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote: > > > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them. > > > > > > There is a chance to have change in qp state immediately after your "if ..." check. > > > > > > > > > > Hello Leon, > > > > > > > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted > > > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a > > > > > run of rxe_completer(). > > > > > > > > Bart, > > > > > > > > What am I missing? > > > > > > > > CPU1 CPU2 > > > > if (unlikely.... > > > > <--- > > > > /* move the qp to the error state */ > > > > void rxe_qp_error(struct rxe_qp *qp) > > > > { > > > > qp->req.state = QP_STATE_ERROR; > > > > qp->resp.state = QP_STATE_ERROR; > > > > qp->attr.qp_state = IB_QPS_ERR; > > > > ---> > > > > rxe_run_task(&qp->req.task, must_sched); > > > > > > > > > > > > > > > > It is more or less the same as without "if (unlikely..." > > > > > > Hello Leon, > > > > > > In the above the part of rxe_qp_error() that I was referring to in my e-mail > > > is missing: > > > > > > if (qp_type(qp) == IB_QPT_RC) > > > rxe_run_task(&qp->comp.task, 1); > > > > > > But it is exactly where race exists, as long QP isn't protected, it can > > switch CPUs and create race. > > Hello Leon, > > Can you clarify which race you are referring to? rxe_run_task() uses the > tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a > time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers, > 3rd edition. See also the tasklet_schedule() implementation in > <linux/interrupt.h> and in kernel/softirq.c. Ahh, Bart, Sorry. I found the cause of my misunderstanding, it is "comp" in the rxe_run_task call and not "req". Thanks > > Thanks, > > Bart.
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index a6fbed48db8a..8f631d64c192 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr, (queue_count(qp->sq.queue) > 1); rxe_run_task(&qp->req.task, must_sched); + if (unlikely(qp->req.state == QP_STATE_ERROR)) + rxe_run_task(&qp->comp.task, 1); return err; }
The following sequence: * Change queue pair state into IB_QPS_ERR. * Post a work request on the queue pair. Triggers the following race condition in the rdma_rxe driver: * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function that examines the QP send queue. * rxe_post_send() posts a work request on the QP send queue. Avoid that this race causes a work request to be ignored by scheduling an rxe_completer() call from rxe_post_send() for queues that are in the error state. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Moni Shoua <monis@mellanox.com> Cc: <stable@vger.kernel.org> # v4.8 --- drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++ 1 file changed, 2 insertions(+)