diff mbox

[02/12] scsi_transport_srp: Fix a race condition

Message ID 5541EE66.7090608@sandisk.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 30, 2015, 8:57 a.m. UTC
Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
is in progress. This patch avoids that an I/O timeout can trigger the
following kernel warning:

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447 srp_terminate_io+0xef/0x100 [ib_srp]()
Call Trace:
 [<ffffffff814c65a2>] dump_stack+0x4e/0x68
 [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
 [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
 [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0 [scsi_transport_srp]
 [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0 [scsi_transport_srp]
 [<ffffffff8106e09b>] process_one_work+0x1db/0x780
 [<ffffffff8106e75b>] worker_thread+0x11b/0x450
 [<ffffffff81073c64>] kthread+0xe4/0x100
 [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0

See also patch "scsi_transport_srp: Add transport layer error
handling" (commit ID 29c17324803c).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.13
---
 drivers/scsi/scsi_transport_srp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Sagi Grimberg April 30, 2015, 9:44 a.m. UTC | #1
On 4/30/2015 11:57 AM, Bart Van Assche wrote:
> Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
> is in progress. This patch avoids that an I/O timeout can trigger the
> following kernel warning:
>
> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447 srp_terminate_io+0xef/0x100 [ib_srp]()
> Call Trace:
>   [<ffffffff814c65a2>] dump_stack+0x4e/0x68
>   [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
>   [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
>   [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0 [scsi_transport_srp]
>   [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0 [scsi_transport_srp]
>   [<ffffffff8106e09b>] process_one_work+0x1db/0x780
>   [<ffffffff8106e75b>] worker_thread+0x11b/0x450
>   [<ffffffff81073c64>] kthread+0xe4/0x100
>   [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0
>
> See also patch "scsi_transport_srp: Add transport layer error
> handling" (commit ID 29c17324803c).
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: James Bottomley <JBottomley@Odin.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> Cc: <stable@vger.kernel.org> #v3.13
> ---
>   drivers/scsi/scsi_transport_srp.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 6ce1c48..4a44337 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -437,8 +437,10 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
>
>   	/* Involve the LLD if possible to terminate all I/O on the rport. */
>   	i = to_srp_internal(shost->transportt);
> -	if (i->f->terminate_rport_io)
> +	if (i->f->terminate_rport_io) {
> +		srp_wait_for_queuecommand(shost);
>   		i->f->terminate_rport_io(rport);
> +	}

Why not just terminate the inflight IO before unblocking the target?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 30, 2015, 10:20 a.m. UTC | #2
On 04/30/15 11:44, Sagi Grimberg wrote:
> On 4/30/2015 11:57 AM, Bart Van Assche wrote:
>> Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
>> is in progress. This patch avoids that an I/O timeout can trigger the
>> following kernel warning:
>>
>> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447
>> srp_terminate_io+0xef/0x100 [ib_srp]()
>> Call Trace:
>>   [<ffffffff814c65a2>] dump_stack+0x4e/0x68
>>   [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
>>   [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
>>   [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
>>   [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0
>> [scsi_transport_srp]
>>   [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0
>> [scsi_transport_srp]
>>   [<ffffffff8106e09b>] process_one_work+0x1db/0x780
>>   [<ffffffff8106e75b>] worker_thread+0x11b/0x450
>>   [<ffffffff81073c64>] kthread+0xe4/0x100
>>   [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0
>>
>> See also patch "scsi_transport_srp: Add transport layer error
>> handling" (commit ID 29c17324803c).
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: James Bottomley <JBottomley@Odin.com>
>> Cc: Sagi Grimberg <sagig@mellanox.com>
>> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>> Cc: <stable@vger.kernel.org> #v3.13
>> ---
>>   drivers/scsi/scsi_transport_srp.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_transport_srp.c
>> b/drivers/scsi/scsi_transport_srp.c
>> index 6ce1c48..4a44337 100644
>> --- a/drivers/scsi/scsi_transport_srp.c
>> +++ b/drivers/scsi/scsi_transport_srp.c
>> @@ -437,8 +437,10 @@ static void __rport_fail_io_fast(struct srp_rport
>> *rport)
>>
>>       /* Involve the LLD if possible to terminate all I/O on the
>> rport. */
>>       i = to_srp_internal(shost->transportt);
>> -    if (i->f->terminate_rport_io)
>> +    if (i->f->terminate_rport_io) {
>> +        srp_wait_for_queuecommand(shost);
>>           i->f->terminate_rport_io(rport);
>> +    }
>
> Why not just terminate the inflight IO before unblocking the target?

Sorry but I don't think that would prevent the described race condition. 
The call trace in the description of this patch illustrates that 
srp_queuecommand() can still be active even after the transport state 
has been changed into "offline". Hence if terminate_rport_io() would be 
invoked earlier the same race would still exist.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 6ce1c48..4a44337 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -437,8 +437,10 @@  static void __rport_fail_io_fast(struct srp_rport *rport)
 
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
-	if (i->f->terminate_rport_io)
+	if (i->f->terminate_rport_io) {
+		srp_wait_for_queuecommand(shost);
 		i->f->terminate_rport_io(rport);
+	}
 }
 
 /**