diff mbox series

[fix] scsi_lib: make sure scsi_request.sense valid

Message ID 20190116155700.28967-1-dgilbert@interlog.com (mailing list archive)
State Changes Requested
Headers show
Series [fix] scsi_lib: make sure scsi_request.sense valid | expand

Commit Message

Douglas Gilbert Jan. 16, 2019, 3:57 p.m. UTC
The block layer assumes scsi_request:sense is always a valid
pointer. This is set up once in scsi_mq_init_request() and the
containing scsi_cmnd object is used often, being re-initialized
by scsi_init_command(). That works unless some code re-purposes
part of the scsi_cmnd object for something else. And that is
what bidi handling does in scsi_mq_prep_fn(). The result is an
oops at some later time when the partly overwritten object is
re-used. The overwrite is from d285203cf647d but 'git blame'
does not show removed code, so that commit may not be the
culprit.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---

This was found while injecting errors (thus generating sense data)
into a sequence of bidi commands. At some later time the block
layer blew up with a scsi_request::sense NULL dereference in
sg_rq_end_io(). Without testing I'm confident the bsg driver,
the osd ULD and exofs are exposed to this bug.

 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Bart Van Assche Jan. 16, 2019, 11:56 p.m. UTC | #1
On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
> The block layer assumes scsi_request:sense is always a valid
> pointer. This is set up once in scsi_mq_init_request() and the
> containing scsi_cmnd object is used often, being re-initialized
> by scsi_init_command(). That works unless some code re-purposes
> part of the scsi_cmnd object for something else. And that is
> what bidi handling does in scsi_mq_prep_fn(). The result is an
> oops at some later time when the partly overwritten object is
> re-used. The overwrite is from d285203cf647d but 'git blame'
> does not show removed code, so that commit may not be the
> culprit.
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
> 
> This was found while injecting errors (thus generating sense data)
> into a sequence of bidi commands. At some later time the block
> layer blew up with a scsi_request::sense NULL dereference in
> sg_rq_end_io(). Without testing I'm confident the bsg driver,
> the osd ULD and exofs are exposed to this bug.
> 
>  drivers/scsi/scsi_lib.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b13cc9288ba0..71259bd4040a 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
>  
>  	cmd->device = dev;
>  	cmd->sense_buffer = buf;
> +	cmd->req.sense = buf;
>  	cmd->prot_sdb = prot;
>  	cmd->flags = flags;
>  	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);

Hi Doug,

The description of this patch does not look correct to me. scsi_init_command()
does not overwrite the sense pointer. From the body of that function:

	/* zero out the cmd, except for the embedded scsi_request */
	memset((char *)cmd + sizeof(cmd->req), 0,
		sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);

It is not clear to me which code overwrites the sense pointer. I think that
needs to be figured out before discussion of this patch can continue.

Thanks,

Bart.
Douglas Gilbert Jan. 17, 2019, 12:54 a.m. UTC | #2
On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
> On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
>> The block layer assumes scsi_request:sense is always a valid
>> pointer. This is set up once in scsi_mq_init_request() and the
>> containing scsi_cmnd object is used often, being re-initialized
>> by scsi_init_command(). That works unless some code re-purposes
>> part of the scsi_cmnd object for something else. And that is
>> what bidi handling does in scsi_mq_prep_fn(). The result is an
>> oops at some later time when the partly overwritten object is
>> re-used. The overwrite is from d285203cf647d but 'git blame'
>> does not show removed code, so that commit may not be the
>> culprit.
>>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>
>> This was found while injecting errors (thus generating sense data)
>> into a sequence of bidi commands. At some later time the block
>> layer blew up with a scsi_request::sense NULL dereference in
>> sg_rq_end_io(). Without testing I'm confident the bsg driver,
>> the osd ULD and exofs are exposed to this bug.
>>
>>   drivers/scsi/scsi_lib.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index b13cc9288ba0..71259bd4040a 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
>>   
>>   	cmd->device = dev;
>>   	cmd->sense_buffer = buf;
>> +	cmd->req.sense = buf;
>>   	cmd->prot_sdb = prot;
>>   	cmd->flags = flags;
>>   	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
> 
> Hi Doug,
> 
> The description of this patch does not look correct to me. scsi_init_command()
> does not overwrite the sense pointer. From the body of that function:
> 
> 	/* zero out the cmd, except for the embedded scsi_request */
> 	memset((char *)cmd + sizeof(cmd->req), 0,
> 		sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);
> 
> It is not clear to me which code overwrites the sense pointer. I think that
> needs to be figured out before discussion of this patch can continue.

Bart,
Please re-read the explanation. The two sentences in the middle should tell
you that you are looking at the wrong memset().

And I'm waiting for the person who wrote the questionable code to comment.


If you don't believe me, try sending a device reset to a scsi_debug device.
Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same scsi_debug
device, you should get a Unit Attention concerning that device reset. If
you do, keep sending that bidi command. Make sure you have no important
files open because that machine will lock solid. Basically what happens
when a rq_end_io() callback de-references a NULL pointer. It looks like
it has been there since 2014 and took me 2 days to find. Sorry that I can't
explain it in one simple sentence.

Douglas


*** Use 'man sg_raw' and see the EXAMPLES section (second to last example)
     You can use a bsg device as shown.
Bart Van Assche Jan. 17, 2019, 1:06 a.m. UTC | #3
On Wed, 2019-01-16 at 19:54 -0500, Douglas Gilbert wrote:
> On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
> > On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
> > > The block layer assumes scsi_request:sense is always a valid
> > > pointer. This is set up once in scsi_mq_init_request() and the
> > > containing scsi_cmnd object is used often, being re-initialized
> > > by scsi_init_command(). That works unless some code re-purposes
> > > part of the scsi_cmnd object for something else. And that is
> > > what bidi handling does in scsi_mq_prep_fn(). The result is an
> > > oops at some later time when the partly overwritten object is
> > > re-used. The overwrite is from d285203cf647d but 'git blame'
> > > does not show removed code, so that commit may not be the
> > > culprit.
> > > 
> > > Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> > > ---
> > > 
> > > This was found while injecting errors (thus generating sense data)
> > > into a sequence of bidi commands. At some later time the block
> > > layer blew up with a scsi_request::sense NULL dereference in
> > > sg_rq_end_io(). Without testing I'm confident the bsg driver,
> > > the osd ULD and exofs are exposed to this bug.
> > > 
> > >   drivers/scsi/scsi_lib.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index b13cc9288ba0..71259bd4040a 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
> > >   
> > >   	cmd->device = dev;
> > >   	cmd->sense_buffer = buf;
> > > +	cmd->req.sense = buf;
> > >   	cmd->prot_sdb = prot;
> > >   	cmd->flags = flags;
> > >   	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
> > 
> > Hi Doug,
> > 
> > The description of this patch does not look correct to me. scsi_init_command()
> > does not overwrite the sense pointer. From the body of that function:
> > 
> > 	/* zero out the cmd, except for the embedded scsi_request */
> > 	memset((char *)cmd + sizeof(cmd->req), 0,
> > 		sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);
> > 
> > It is not clear to me which code overwrites the sense pointer. I think that
> > needs to be figured out before discussion of this patch can continue.
> 
> Bart,
> Please re-read the explanation. The two sentences in the middle should tell
> you that you are looking at the wrong memset().
> 
> And I'm waiting for the person who wrote the questionable code to comment.
> 
> 
> If you don't believe me, try sending a device reset to a scsi_debug device.
> Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same scsi_debug
> device, you should get a Unit Attention concerning that device reset. If
> you do, keep sending that bidi command. Make sure you have no important
> files open because that machine will lock solid. Basically what happens
> when a rq_end_io() callback de-references a NULL pointer. It looks like
> it has been there since 2014 and took me 2 days to find. Sorry that I can't
> explain it in one simple sentence.

Hi Doug,

Is this perhaps the memset you are referring to?

		memset(bidi_sdb, 0, sizeof(struct scsi_data_buffer));

Bart.
Bart Van Assche Jan. 17, 2019, 9:48 p.m. UTC | #4
On Thu, 2019-01-17 at 16:05 -0500, Douglas Gilbert wrote:
> On 2019-01-16 8:06 p.m., Bart Van Assche wrote:
> > On Wed, 2019-01-16 at 19:54 -0500, Douglas Gilbert wrote:
> > > On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
> > > > On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
> > > > > The block layer assumes scsi_request:sense is always a valid
> > > > > pointer. This is set up once in scsi_mq_init_request() and the
> > > > > containing scsi_cmnd object is used often, being re-initialized
> > > > > by scsi_init_command(). That works unless some code re-purposes
> > > > > part of the scsi_cmnd object for something else. And that is
> > > > > what bidi handling does in scsi_mq_prep_fn(). The result is an
> > > > > oops at some later time when the partly overwritten object is
> > > > > re-used. The overwrite is from d285203cf647d but 'git blame'
> > > > > does not show removed code, so that commit may not be the
> > > > > culprit.
> > > > > 
> > > > > Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> > > > > ---
> > > > > 
> > > > > This was found while injecting errors (thus generating sense data)
> > > > > into a sequence of bidi commands. At some later time the block
> > > > > layer blew up with a scsi_request::sense NULL dereference in
> > > > > sg_rq_end_io(). Without testing I'm confident the bsg driver,
> > > > > the osd ULD and exofs are exposed to this bug.
> > > > > 
> > > > >    drivers/scsi/scsi_lib.c | 1 +
> > > > >    1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > > index b13cc9288ba0..71259bd4040a 100644
> > > > > --- a/drivers/scsi/scsi_lib.c
> > > > > +++ b/drivers/scsi/scsi_lib.c
> > > > > @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
> > > > >    
> > > > >    	cmd->device = dev;
> > > > >    	cmd->sense_buffer = buf;
> > > > > +	cmd->req.sense = buf;
> > > > >    	cmd->prot_sdb = prot;
> > > > >    	cmd->flags = flags;
> > > > >    	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
> > > > 
> > > > Hi Doug,
> > > > 
> > > > The description of this patch does not look correct to me. scsi_init_command()
> > > > does not overwrite the sense pointer. From the body of that function:
> > > > 
> > > > 	/* zero out the cmd, except for the embedded scsi_request */
> > > > 	memset((char *)cmd + sizeof(cmd->req), 0,
> > > > 		sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);
> > > > 
> > > > It is not clear to me which code overwrites the sense pointer. I think that
> > > > needs to be figured out before discussion of this patch can continue.
> > > 
> > > Bart,
> > > Please re-read the explanation. The two sentences in the middle should tell
> > > you that you are looking at the wrong memset().
> > > 
> > > And I'm waiting for the person who wrote the questionable code to comment.
> > > 
> > > 
> > > If you don't believe me, try sending a device reset to a scsi_debug device.
> > > Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same scsi_debug
> > > device, you should get a Unit Attention concerning that device reset. If
> > > you do, keep sending that bidi command. Make sure you have no important
> > > files open because that machine will lock solid. Basically what happens
> > > when a rq_end_io() callback de-references a NULL pointer. It looks like
> > > it has been there since 2014 and took me 2 days to find. Sorry that I can't
> > > explain it in one simple sentence.
> > 
> > Hi Doug,
> > 
> > Is this perhaps the memset you are referring to?
> > 
> > 		memset(bidi_sdb, 0, sizeof(struct scsi_data_buffer));
> 
> Yes.

Hi Doug,

I think it's worse than what you explained in your patch description: some code
interprets blk_mq_rq_to_pdu(cmd->request->next_rq) as a struct scsi_data_buffer,
e.g. scsi_mq_prep_fn(). Other code in the SCSI core interprets the same data
structure as a struct scsi_request, e.g. scsi_io_completion(). So I don't think
that your patch is sufficient. I have already started working on a patch series
that clears up this confusion.

Bart.
Douglas Gilbert Jan. 17, 2019, 11 p.m. UTC | #5
On 2019-01-17 4:48 p.m., Bart Van Assche wrote:
> On Thu, 2019-01-17 at 16:05 -0500, Douglas Gilbert wrote:
>> On 2019-01-16 8:06 p.m., Bart Van Assche wrote:
>>> On Wed, 2019-01-16 at 19:54 -0500, Douglas Gilbert wrote:
>>>> On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
>>>>> On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
>>>>>> The block layer assumes scsi_request:sense is always a valid
>>>>>> pointer. This is set up once in scsi_mq_init_request() and the
>>>>>> containing scsi_cmnd object is used often, being re-initialized
>>>>>> by scsi_init_command(). That works unless some code re-purposes
>>>>>> part of the scsi_cmnd object for something else. And that is
>>>>>> what bidi handling does in scsi_mq_prep_fn(). The result is an
>>>>>> oops at some later time when the partly overwritten object is
>>>>>> re-used. The overwrite is from d285203cf647d but 'git blame'
>>>>>> does not show removed code, so that commit may not be the
>>>>>> culprit.
>>>>>>
>>>>>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>>>>>> ---
>>>>>>
>>>>>> This was found while injecting errors (thus generating sense data)
>>>>>> into a sequence of bidi commands. At some later time the block
>>>>>> layer blew up with a scsi_request::sense NULL dereference in
>>>>>> sg_rq_end_io(). Without testing I'm confident the bsg driver,
>>>>>> the osd ULD and exofs are exposed to this bug.
>>>>>>
>>>>>>     drivers/scsi/scsi_lib.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>>> index b13cc9288ba0..71259bd4040a 100644
>>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>>> @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
>>>>>>     
>>>>>>     	cmd->device = dev;
>>>>>>     	cmd->sense_buffer = buf;
>>>>>> +	cmd->req.sense = buf;
>>>>>>     	cmd->prot_sdb = prot;
>>>>>>     	cmd->flags = flags;
>>>>>>     	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
>>>>>
>>>>> Hi Doug,
>>>>>
>>>>> The description of this patch does not look correct to me. scsi_init_command()
>>>>> does not overwrite the sense pointer. From the body of that function:
>>>>>
>>>>> 	/* zero out the cmd, except for the embedded scsi_request */
>>>>> 	memset((char *)cmd + sizeof(cmd->req), 0,
>>>>> 		sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);
>>>>>
>>>>> It is not clear to me which code overwrites the sense pointer. I think that
>>>>> needs to be figured out before discussion of this patch can continue.
>>>>
>>>> Bart,
>>>> Please re-read the explanation. The two sentences in the middle should tell
>>>> you that you are looking at the wrong memset().
>>>>
>>>> And I'm waiting for the person who wrote the questionable code to comment.
>>>>
>>>>
>>>> If you don't believe me, try sending a device reset to a scsi_debug device.
>>>> Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same scsi_debug
>>>> device, you should get a Unit Attention concerning that device reset. If
>>>> you do, keep sending that bidi command. Make sure you have no important
>>>> files open because that machine will lock solid. Basically what happens
>>>> when a rq_end_io() callback de-references a NULL pointer. It looks like
>>>> it has been there since 2014 and took me 2 days to find. Sorry that I can't
>>>> explain it in one simple sentence.
>>>
>>> Hi Doug,
>>>
>>> Is this perhaps the memset you are referring to?
>>>
>>> 		memset(bidi_sdb, 0, sizeof(struct scsi_data_buffer));
>>
>> Yes.
> 
> Hi Doug,
> 
> I think it's worse than what you explained in your patch description: some code
> interprets blk_mq_rq_to_pdu(cmd->request->next_rq) as a struct scsi_data_buffer,
> e.g. scsi_mq_prep_fn(). Other code in the SCSI core interprets the same data
> structure as a struct scsi_request, e.g. scsi_io_completion(). So I don't think
> that your patch is sufficient. I have already started working on a patch series
> that clears up this confusion.

Great.

A reason from CH about why he did that would be useful. There is an instance
of struct scsi_data_buffer already in a scsi_cmnd object (called sdb), so
why not use it and keep the scsi_cmnd object "clean" ??

There should be a coding rule: if you abuse a structure (i.e. blasting another
object over what the type system is indicating in a obscure fashion) then that
must be noted in a comment including a rationale. This is such a case.

Doug Gilbert
Bart Van Assche Jan. 17, 2019, 11:39 p.m. UTC | #6
On Thu, 2019-01-17 at 18:00 -0500, Douglas Gilbert wrote:
> A reason from CH about why he did that would be useful. There is an instance
> of struct scsi_data_buffer already in a scsi_cmnd object (called sdb), so
> why not use it and keep the scsi_cmnd object "clean" ??
> 
> There should be a coding rule: if you abuse a structure (i.e. blasting another
> object over what the type system is indicating in a obscure fashion) then that
> must be noted in a comment including a rationale. This is such a case.

My guess is that the current state of the code is the result of an oversight.

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b13cc9288ba0..71259bd4040a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1175,6 +1175,7 @@  void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
 
 	cmd->device = dev;
 	cmd->sense_buffer = buf;
+	cmd->req.sense = buf;
 	cmd->prot_sdb = prot;
 	cmd->flags = flags;
 	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);