diff mbox

[09/11] IB/srp: Fix a NULL pointer dereference

Message ID 571A9472.5050202@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche April 22, 2016, 9:15 p.m. UTC
Avoid that running xfstests on top of the SRP initiator triggers
the call trace below. This patch has been tested by running the
following shell command on an initiator system that has access
to 3200 SRP LUNs:

/etc/init.d/multipathd start
while true; do
  /etc/init.d/srpd start
  sleep 400
  /etc/init.d/srpd stop
  for p in /sys/class/srp_remote_ports/*; do
    echo 1 >$p/delete &
  done
  wait
  dmsetup remove_all
done

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa0918bb1>] srp_claim_req+0x31/0x90 [ib_srp]
Call Trace:
 <IRQ>
 [<ffffffffa091c096>] srp_process_rsp+0xa6/0x2a0 [ib_srp]
 [<ffffffffa091c5ec>] srp_handle_recv+0x16c/0x340 [ib_srp]
 [<ffffffffa091c7f9>] srp_recv_completion+0x39/0x70 [ib_srp]
 [<ffffffffa0184442>] mlx4_ib_cq_comp+0x12/0x20 [mlx4_ib]
 [<ffffffffa005e86d>] mlx4_cq_completion+0x3d/0x80 [mlx4_core]
 [<ffffffffa006002b>] mlx4_eq_int+0x53b/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67e0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68ef>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba829>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f2d>] handle_irq+0x5d/0x130
 [<ffffffff810071ed>] do_IRQ+0x6d/0x130
 [<ffffffff8151c104>] common_interrupt+0x84/0x84
 <EOI>

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Sagi Grimberg April 26, 2016, 9:04 p.m. UTC | #1
> Avoid that running xfstests on top of the SRP initiator triggers
> the call trace below. This patch has been tested by running the
> following shell command on an initiator system that has access
> to 3200 SRP LUNs:

That's good to know, but the patch description needs to state where
the NULL deref originates i.e. when can req be NULL and why it is
OK to just assign to NULL and continue...

>   		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
>   		if (scmnd) {
>   			req = (void *)scmnd->host_scribble;
> -			scmnd = srp_claim_req(ch, req, NULL, scmnd);
> +			scmnd = req ? srp_claim_req(ch, req, NULL, scmnd) :
> +				NULL;
>   		}
>   		if (!scmnd) {
>   			shost_printk(KERN_ERR, target->scsi_host,
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky April 27, 2016, 6:20 a.m. UTC | #2
On Wed, Apr 27, 2016 at 12:04:52AM +0300, Sagi Grimberg wrote:
> >Avoid that running xfstests on top of the SRP initiator triggers
> >the call trace below. This patch has been tested by running the
> >following shell command on an initiator system that has access
> >to 3200 SRP LUNs:
> 
> That's good to know, but the patch description needs to state where
> the NULL deref originates i.e. when can req be NULL and why it is
> OK to just assign to NULL and continue...
> 
> >  		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
> >  		if (scmnd) {
> >  			req = (void *)scmnd->host_scribble;
> >-			scmnd = srp_claim_req(ch, req, NULL, scmnd);
> >+			scmnd = req ? srp_claim_req(ch, req, NULL, scmnd) :
> >+				NULL;
> >  		}
> >  		if (!scmnd) {
> >  			shost_printk(KERN_ERR, target->scsi_host,
> >

And if it is OK to assign NULL to scmd, will the error print above still
valid?

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 27, 2016, 2:22 p.m. UTC | #3
On 04/26/16 23:20, Leon Romanovsky wrote:
> On Wed, Apr 27, 2016 at 12:04:52AM +0300, Sagi Grimberg wrote:
>>> Avoid that running xfstests on top of the SRP initiator triggers
>>> the call trace below. This patch has been tested by running the
>>> following shell command on an initiator system that has access
>>> to 3200 SRP LUNs:
>>
>> That's good to know, but the patch description needs to state where
>> the NULL deref originates i.e. when can req be NULL and why it is
>> OK to just assign to NULL and continue...
>>
>>>   		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
>>>   		if (scmnd) {
>>>   			req = (void *)scmnd->host_scribble;
>>> -			scmnd = srp_claim_req(ch, req, NULL, scmnd);
>>> +			scmnd = req ? srp_claim_req(ch, req, NULL, scmnd) :
>>> +				NULL;
>>>   		}
>>>   		if (!scmnd) {
>>>   			shost_printk(KERN_ERR, target->scsi_host,
>>>
>
> And if it is OK to assign NULL to scmd, will the error print above still
> valid?

Sorry Leon but I don't understand your question. Have you noticed the if 
(!scmnd) above that error print?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 27, 2016, 3:39 p.m. UTC | #4
On 04/26/2016 11:20 PM, Leon Romanovsky wrote:
> On Wed, Apr 27, 2016 at 12:04:52AM +0300, Sagi Grimberg wrote:
>>> Avoid that running xfstests on top of the SRP initiator triggers
>>> the call trace below. This patch has been tested by running the
>>> following shell command on an initiator system that has access
>>> to 3200 SRP LUNs:
>>
>> That's good to know, but the patch description needs to state where
>> the NULL deref originates i.e. when can req be NULL and why it is
>> OK to just assign to NULL and continue...
>>
>>>   		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
>>>   		if (scmnd) {
>>>   			req = (void *)scmnd->host_scribble;
>>> -			scmnd = srp_claim_req(ch, req, NULL, scmnd);
>>> +			scmnd = req ? srp_claim_req(ch, req, NULL, scmnd) :
>>> +				NULL;
>>>   		}
>>>   		if (!scmnd) {
>>>   			shost_printk(KERN_ERR, target->scsi_host,
>>>
>
> And if it is OK to assign NULL to scmd, will the error print above still
> valid?

Yes. The purpose of that printk() statement is to print a message 
whenever an SRP response has been received but that response is not 
passed to the SCSI mid-layer through scsi_done().

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 27, 2016, 7:50 p.m. UTC | #5
On 04/26/2016 02:04 PM, Sagi Grimberg wrote:
>> Avoid that running xfstests on top of the SRP initiator triggers
>> the call trace below. This patch has been tested by running the
>> following shell command on an initiator system that has access
>> to 3200 SRP LUNs:
>
> That's good to know, but the patch description needs to state where
> the NULL deref originates i.e. when can req be NULL and why it is
> OK to just assign to NULL and continue...

I think the issue fixed by this patch is a use-after-free of scmnd. If 
an rport is deleted srp_finish_req() calls scsi_done(). A concurrent 
scsi_queue_rq() call can reuse that scmnd before srp_process_rsp() gets 
called. This wouldn't occur if srp_wait_for_queuecommand() would also 
wait for scsi_queue_rq(). Since I expect it will be hard to convince the 
block layer maintainer to fix the root cause I came up with this patch.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 3, 2016, 9:30 a.m. UTC | #6
On Fri, Apr 22, 2016 at 02:15:30PM -0700, Bart Van Assche wrote:
> Avoid that running xfstests on top of the SRP initiator triggers
> the call trace below. This patch has been tested by running the
> following shell command on an initiator system that has access
> to 3200 SRP LUNs:

As mentioned by the others a better changelog would be useful.

> 
> /etc/init.d/multipathd start
> while true; do
>   /etc/init.d/srpd start
>   sleep 400
>   /etc/init.d/srpd stop
>   for p in /sys/class/srp_remote_ports/*; do
>     echo 1 >$p/delete &
>   done
>   wait
>   dmsetup remove_all
> done

Any chance you could have a git repo with all your srp tests?  Having them
avaiable in a single place for others to use would really help with
test coverage I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 3, 2016, 8:57 p.m. UTC | #7
On 05/03/2016 02:30 AM, Christoph Hellwig wrote:
> Any chance you could have a git repo with all your srp tests?  Having them
> available in a single place for others to use would really help with
> test coverage I think.

I will try to free up some time to work on this. I will need an SRP 
testsuite anyway before I simplify the SRP initiator by removing support 
for SG-lists with gaps (by setting q->limits.virt_boundary_mask).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 3, 2016, 9:01 p.m. UTC | #8
----- Original Message -----
From: "Bart Van Assche" <bart.vanassche@sandisk.com>
To: "Christoph Hellwig" <hch@lst.de>
Cc: "Doug Ledford" <dledford@redhat.com>, "Sagi Grimberg" <sagi@grimberg.me>, "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org
Sent: Tuesday, May 3, 2016 4:57:02 PM
Subject: Re: [PATCH 09/11] IB/srp: Fix a NULL pointer dereference

On 05/03/2016 02:30 AM, Christoph Hellwig wrote:
> Any chance you could have a git repo with all your srp tests?  Having them
> available in a single place for others to use would really help with
> test coverage I think.

I will try to free up some time to work on this. I will need an SRP 
testsuite anyway before I simplify the SRP initiator by removing support 
for SG-lists with gaps (by setting q->limits.virt_boundary_mask).

Bart.

I am always happy to help test anything if needed.
I have a standing configuration in my lab always setup.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index caefd1a..f4003f6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1814,7 +1814,8 @@  static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
 		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
 		if (scmnd) {
 			req = (void *)scmnd->host_scribble;
-			scmnd = srp_claim_req(ch, req, NULL, scmnd);
+			scmnd = req ? srp_claim_req(ch, req, NULL, scmnd) :
+				NULL;
 		}
 		if (!scmnd) {
 			shost_printk(KERN_ERR, target->scsi_host,