diff mbox

QLogic driver problem (ib_qib)

Message ID 82fd80bcd97dae02558e4122b7c6964f@alukardd.org (mailing list archive)
State Deferred
Headers show

Commit Message

Alexey July 16, 2015, 3:31 p.m. UTC
Hello.

I have issue with stuck ib_srp module because ib_qib driver is too old.
 From dmesg about srp_daemon:
[Mon Jul 13 18:16:01 2015] srp_daemon      D ffff880be35fa8f0     0 
20491      1 0x00000000
[Mon Jul 13 18:16:01 2015]  ffff880be35fa8f0 0000000000000000 
ffff880c097ae150 0000000000013b40
[Mon Jul 13 18:16:01 2015]  0000000000013b40 ffff880be35fa8f0 
ffff8800bad5ffd8 0000000000000000
[Mon Jul 13 18:16:01 2015]  7fffffffffffffff 7fffffffffffffff 
0000000000000002 ffff8818047ba870
[Mon Jul 13 18:16:01 2015] Call Trace:
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e20e6>] ? 
console_conditional_schedule+0xf/0xf
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e2102>] ? 
schedule_timeout+0x1c/0xec
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e28d7>] ? 
_raw_spin_lock_irq+0xa/0x15
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e0e9c>] ? 
__wait_for_common+0x124/0x163
[Mon Jul 13 18:16:01 2015]  [<ffffffff81069d57>] ? 
try_to_wake_up+0x1a8/0x1a8
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e2903>] ? 
_raw_spin_unlock_irqrestore+0xc/0xd
[Mon Jul 13 18:16:01 2015]  [<ffffffffa04f4e9b>] ? 
srp_destroy_qp+0xe5/0xee [ib_srp]
[Mon Jul 13 18:16:01 2015]  [<ffffffffa04f54d4>] ? 
srp_free_ch_ib+0x97/0x14f [ib_srp]
[Mon Jul 13 18:16:01 2015]  [<ffffffffa04f923b>] ? 
srp_create_target+0xa2c/0xac5 [ib_srp]
[Mon Jul 13 18:16:01 2015]  [<ffffffff811394fc>] ? 
path_cleanup+0x1a/0x33
[Mon Jul 13 18:16:01 2015]  [<ffffffff8113d370>] ? 
path_openat+0x418/0x46d
[Mon Jul 13 18:16:01 2015]  [<ffffffff81121361>] ? __kmalloc+0xe5/?0xf7
[Mon Jul 13 18:16:01 2015]  [<ffffffff81186a25>] ? 
kernfs_fop_write+0x60/0x128
[Mon Jul 13 18:16:01 2015]  [<ffffffff81186aa1>] ? 
kernfs_fop_write+0xdc/0x128
[Mon Jul 13 18:16:01 2015]  [<ffffffff811323f2>] ? vfs_write+0x9c/?0x11e
[Mon Jul 13 18:16:01 2015]  [<ffffffff8113263c>] ? SyS_write+0x51/?0x85
[Mon Jul 13 18:16:01 2015]  [<ffffffff813e2ce9>] ? 
system_call_fastpath+0x12/0x17

Bart Van Assche help us and resulting patch you can see below:

But with mlx driver all works fine.


My current installation is:
linux 3.19.1
qib 1.11
ib_srp 2.0.32
InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)


I hope you can help fix ib_qib driver and we can remove this ugly patch 
from our source of ib_srp.

Regards,
Alexey Mochkin
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Marciniszyn, Mike July 16, 2015, 8:15 p.m. UTC | #1
PiBCYXJ0IFZhbiBBc3NjaGUgaGVscCB1cyBhbmQgcmVzdWx0aW5nIHBhdGNoIHlvdSBjYW4gc2Vl
IGJlbG93Og0KPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9pbmZpbmliYW5kL3VscC9zcnAvaWJfc3Jw
LmMNCj4gYi9kcml2ZXJzL2luZmluaWJhbmQvdWxwL3NycC9pYl9zcnAuYw0KPiBpbmRleCBjNWFm
NjdhLi40NzBlYjU0IDEwMDY0NA0KPiAtLS0gYS9kcml2ZXJzL2luZmluaWJhbmQvdWxwL3NycC9p
Yl9zcnAuYw0KPiArKysgYi9kcml2ZXJzL2luZmluaWJhbmQvdWxwL3NycC9pYl9zcnAuYw0KPiBA
QCAtNTksNiArNTksNyBAQA0KPiAgICNpbmNsdWRlICIuLi8uLi8uLi8uLi9pbmNsdWRlL3Njc2kv
c2NzaV90cmFuc3BvcnRfc3JwLmgiDQo+IA0KPiAgICNpbmNsdWRlICJpYl9zcnAuaCINCj4gKyNp
bmNsdWRlIDxsaW51eC9kZWxheS5oPg0KPiANCj4gICAjZGVmaW5lIERSVl9OQU1FCSJpYl9zcnAi
DQo+ICAgI2RlZmluZSBQRlgJCURSVl9OQU1FICI6ICINCj4gQEAgLTYyMSwxMSArNjIyLDcgQEAg
c3RhdGljIHZvaWQgc3JwX2Rlc3Ryb3lfcXAoc3RydWN0IHNycF9yZG1hX2NoICpjaCkNCj4gICAJ
aWYgKHJldCkNCj4gICAJCWdvdG8gb3V0Ow0KPiANCj4gLQlpbml0X2NvbXBsZXRpb24oJmNoLT5k
b25lKTsNCj4gLQlyZXQgPSBpYl9wb3N0X3JlY3YoY2gtPnFwLCAmd3IsICZiYWRfd3IpOw0KPiAt
CVdBUk5fT05DRShyZXQsICJpYl9wb3N0X3JlY3YoKSByZXR1cm5lZCAlZFxuIiwgcmV0KTsNCj4g
LQlpZiAocmV0ID09IDApDQo+IC0JCXdhaXRfZm9yX2NvbXBsZXRpb24oJmNoLT5kb25lKTsNCj4g
Kwltc2xlZXAoMTAwKTsNCj4gDQo+ICAgb3V0Og0KPiAgIAlpYl9kZXN0cm95X3FwKGNoLT5xcCk7
DQoNClBvc3QgcmVjZWl2ZSBjYW4gcmV0dXJuIGEgc3luY2hyb25vdXMgZmFpbHVyZS4gICBUaGUg
cXVlc3Rpb24gd291bGQgYmUgd2hpY2ggb25lIGFuZCB3aHk/DQoNCkJhcnQsIGFueSBpZGVhcz8N
Cg0KTWlrZQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 16, 2015, 8:27 p.m. UTC | #2
On 07/16/2015 01:15 PM, Marciniszyn, Mike wrote:
>> Bart Van Assche help us and resulting patch you can see below:
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>> b/drivers/infiniband/ulp/srp/ib_srp.c
>> index c5af67a..470eb54 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -59,6 +59,7 @@
>>    #include "../../../../include/scsi/scsi_transport_srp.h"
>>
>>    #include "ib_srp.h"
>> +#include <linux/delay.h>
>>
>>    #define DRV_NAME	"ib_srp"
>>    #define PFX		DRV_NAME ": "
>> @@ -621,11 +622,7 @@ static void srp_destroy_qp(struct srp_rdma_ch *ch)
>>    	if (ret)
>>    		goto out;
>>
>> -	init_completion(&ch->done);
>> -	ret = ib_post_recv(ch->qp, &wr, &bad_wr);
>> -	WARN_ONCE(ret, "ib_post_recv() returned %d\n", ret);
>> -	if (ret == 0)
>> -		wait_for_completion(&ch->done);
>> +	msleep(100);
>>
>>    out:
>>    	ib_destroy_qp(ch->qp);
>
> Post receive can return a synchronous failure.   The question would be which one and why?
>
> Bart, any ideas?

Hello Mike,

ib_post_recv() can indeed return a synchronous failure. But I think a 
synchronous failure should be handled properly by the SRP initiator 
driver: if ret < 0 then the wait_for_completion() call is skipped. What 
Alexey reported is that the wait_for_completion() call did not finish 
which means that ib_post_recv() returned 0. Or did I perhaps 
misinterpret something ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index c5af67a..470eb54 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -59,6 +59,7 @@ 
  #include "../../../../include/scsi/scsi_transport_srp.h"

  #include "ib_srp.h"
+#include <linux/delay.h>

  #define DRV_NAME	"ib_srp"
  #define PFX		DRV_NAME ": "
@@ -621,11 +622,7 @@  static void srp_destroy_qp(struct srp_rdma_ch *ch)
  	if (ret)
  		goto out;

-	init_completion(&ch->done);
-	ret = ib_post_recv(ch->qp, &wr, &bad_wr);
-	WARN_ONCE(ret, "ib_post_recv() returned %d\n", ret);
-	if (ret == 0)
-		wait_for_completion(&ch->done);
+	msleep(100);

  out:
  	ib_destroy_qp(ch->qp);