diff mbox

megaraid: add scsi_cmnd NULL check before use

Message ID 1462668011.32105.7.camel@petros-ultrathin (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Petros Koutoupis May 8, 2016, 12:40 a.m. UTC
The current state of the code checks to see if the reference to scsi_cmnd is
not null, but it never checks to see if it is null and always assumes it is valid
before its use in below switch statement. This patch addresses that.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Finn Thain May 8, 2016, 3:32 a.m. UTC | #1
On Sat, 7 May 2016, Petros Koutoupis wrote:

> The current state of the code checks to see if the reference to 
> scsi_cmnd is not null, but it never checks to see if it is null and 
> always assumes it is valid before its use in below switch statement. 
> This patch addresses that.
> 

There is no sign-off here.

> --- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
> +++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
> @@ -2277,6 +2277,10 @@ complete_cmd_fusion(struct megasas_insta
>  
>  		if (cmd_fusion->scmd)
>  			cmd_fusion->scmd->SCp.ptr = NULL;
> +		else if ((!cmd_fusion->scmd) &&
> +			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
> +			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
> +			goto next;

That contains a tautology.

>  
>  		scmd_local = cmd_fusion->scmd;
>  		status = scsi_io_req->RaidContext.status;
> @@ -2336,7 +2340,7 @@ complete_cmd_fusion(struct megasas_insta
>  				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
>  			break;
>  		}
> -
> +next:
>  		fusion->last_reply_idx[MSIxIndex]++;
>  		if (fusion->last_reply_idx[MSIxIndex] >=
>  		    fusion->reply_q_depth)
> 
> 

Your patch is equivalent to this one:

@@ -2294,6 +2294,8 @@
 			complete(&cmd_fusion->done);
 			break;
 		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
+			if (unlikely(!scmd_local))
+				break;
 			/* Update load balancing info */
 			device_id = MEGASAS_DEV_INDEX(scmd_local);
 			lbinfo = &fusion->load_balance_info[device_id];
@@ -2311,6 +2313,8 @@
 			}
 			/* Fall thru and complete IO */
 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
+			if (unlikely(!scmd_local))
+				break;
 			/* Map the FW Cmd Status */
 			map_cmd_status(cmd_fusion, status, extStatus);
 			scsi_io_req->RaidContext.status = 0;


I don't know anything about this driver or hardware so I'm not actually 
advocating any of these changes, just pointing out that your patch has 
issues.

I would have expected the 'smatch' checker to detect either the potential 
NULL pointer derefs or alternatively the redundant test for a non-NULL 
pointer. Maybe it has detected this already (?) or maybe the NULL pointer 
deref can be shown to be impossible?
Petros Koutoupis May 8, 2016, 12:08 p.m. UTC | #2
On Sun, 2016-05-08 at 13:32 +1000, Finn Thain wrote:
> On Sat, 7 May 2016, Petros Koutoupis wrote:
> 
> > The current state of the code checks to see if the reference to 
> > scsi_cmnd is not null, but it never checks to see if it is null and 
> > always assumes it is valid before its use in below switch statement. 
> > This patch addresses that.
> > 
> 
> There is no sign-off here.
> 

Oops. Will resend.

> > --- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
> > +++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
> > @@ -2277,6 +2277,10 @@ complete_cmd_fusion(struct megasas_insta
> >  
> >  		if (cmd_fusion->scmd)
> >  			cmd_fusion->scmd->SCp.ptr = NULL;
> > +		else if ((!cmd_fusion->scmd) &&
> > +			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
> > +			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
> > +			goto next;
> 
> That contains a tautology.
> 

How so? The very definition of tautology implies that I am repeating
the same kind of functionality in two different ways. The thing is,
this function routine does not always need to have a valid scsi_cmnd
reference, which is why in the first it checks if it is valid. In
the second, it only checks if it is null under the only two cases
which make use of it.

> >  
> >  		scmd_local = cmd_fusion->scmd;
> >  		status = scsi_io_req->RaidContext.status;
> > @@ -2336,7 +2340,7 @@ complete_cmd_fusion(struct megasas_insta
> >  				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
> >  			break;
> >  		}
> > -
> > +next:
> >  		fusion->last_reply_idx[MSIxIndex]++;
> >  		if (fusion->last_reply_idx[MSIxIndex] >=
> >  		    fusion->reply_q_depth)
> > 
> > 
> 
> Your patch is equivalent to this one:
> 
> @@ -2294,6 +2294,8 @@
>  			complete(&cmd_fusion->done);
>  			break;
>  		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
> +			if (unlikely(!scmd_local))
> +				break;
>  			/* Update load balancing info */
>  			device_id = MEGASAS_DEV_INDEX(scmd_local);
>  			lbinfo = &fusion->load_balance_info[device_id];
> @@ -2311,6 +2313,8 @@
>  			}
>  			/* Fall thru and complete IO */
>  		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
> +			if (unlikely(!scmd_local))
> +				break;
>  			/* Map the FW Cmd Status */
>  			map_cmd_status(cmd_fusion, status, extStatus);
>  			scsi_io_req->RaidContext.status = 0;
> 
> 

Yeah, except your patch introduces duplication of the same code and
even worse, if the command starts in the MPI2_FUNCTION_SCSI_IO_REQUEST
case, it will check the status of the pointer twice which seems pretty
inefficient for systems running such high loads (i.e. the "Fall
thru").


> I don't know anything about this driver or hardware so I'm not actually 
> advocating any of these changes, just pointing out that your patch has 
> issues.
> 
> I would have expected the 'smatch' checker to detect either the potential 
> NULL pointer derefs or alternatively the redundant test for a non-NULL 
> pointer. Maybe it has detected this already (?) or maybe the NULL pointer 
> deref can be shown to be impossible?
> 

It is very possible. We have had customers in the field hit pretty
hard by this.

> -- 


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Finn Thain May 8, 2016, 12:22 p.m. UTC | #3
On Sun, 8 May 2016, Petros Koutoupis wrote:

> > 
> > That contains a tautology.
> > 
> 
> How so?

if (x)
	/* ... */
else if (!x && (whatever))
	/* ... */
Petros Koutoupis May 8, 2016, 4:34 p.m. UTC | #4
On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
> On Sun, 8 May 2016, Petros Koutoupis wrote:
> 
> > > 
> > > That contains a tautology.
> > > 
> > 
> > How so?
> 
> if (x)
> 	/* ... */
> else if (!x && (whatever))
> 	/* ... */
> 
> -- 

Thank you but I know the logic of what I wrote. A tautology
will yield the same results no matter what the interpretation.
That is not a tautology. The two conditionals in my case check
different states and serve different purposes.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Calaby May 9, 2016, 1:35 a.m. UTC | #5
Hi Petros,

On Mon, May 9, 2016 at 2:34 AM, Petros Koutoupis
<petros@petroskoutoupis.com> wrote:
> On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
>> On Sun, 8 May 2016, Petros Koutoupis wrote:
>>
>> > >
>> > > That contains a tautology.
>> > >
>> >
>> > How so?
>>
>> if (x)
>>       /* ... */
>> else if (!x && (whatever))
>>       /* ... */
>>
>> --
>
> Thank you but I know the logic of what I wrote. A tautology
> will yield the same results no matter what the interpretation.
> That is not a tautology. The two conditionals in my case check
> different states and serve different purposes.

You're missing the point.

Execution will only reach the else branch if "!cmd_fusion->scmd",
hence checking that is unnecessary. Removing that test (and all the
unnecessary parentheses) will reduce the second test to:

else if (scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
        scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)

which is much cleaner.

Thanks,
Petros Koutoupis May 9, 2016, 2:59 a.m. UTC | #6
On Mon, 2016-05-09 at 11:35 +1000, Julian Calaby wrote:
> Hi Petros,
> 
> On Mon, May 9, 2016 at 2:34 AM, Petros Koutoupis
> <petros@petroskoutoupis.com> wrote:
> > On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
> >> On Sun, 8 May 2016, Petros Koutoupis wrote:
> >>
> >> > >
> >> > > That contains a tautology.
> >> > >
> >> >
> >> > How so?
> >>
> >> if (x)
> >>       /* ... */
> >> else if (!x && (whatever))
> >>       /* ... */
> >>
> >> --
> >
> > Thank you but I know the logic of what I wrote. A tautology
> > will yield the same results no matter what the interpretation.
> > That is not a tautology. The two conditionals in my case check
> > different states and serve different purposes.
> 
> You're missing the point.
> 
> Execution will only reach the else branch if "!cmd_fusion->scmd",
> hence checking that is unnecessary. Removing that test (and all the
> unnecessary parentheses) will reduce the second test to:
> 
> else if (scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
>         scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)
> 
> which is much cleaner.
> 
> Thanks,
> 

Julian,

I agree. I will clean up and resend. Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Carpenter May 9, 2016, 8:05 a.m. UTC | #7
Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
NULL then assign "scmd_local = cmd_fusion->scmd;" and dereference
scmd_local unconditionally...

It does catch part of the bug if you have cross function analysis:

  drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
  error: we previously assumed 'cmd_fusion->scmd' could be null (see line 2281)

But that code was from 2010 so I never reported it to the original
author or the list.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Carpenter May 9, 2016, 9:43 a.m. UTC | #8
I've updated Smatch to catch these and I'm testing it now.  We'll see
how it goes.  If my quick and dirty method doesn't has too many false
positives then it's will take some months before I get around to doing
it properly...

Thanks for notifying me on this.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sumit Saxena May 9, 2016, 9:48 a.m. UTC | #9
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Monday, May 09, 2016 1:36 PM
> To: Finn Thain
> Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
NULL
> then assign "scmd_local = cmd_fusion->scmd;" and dereference scmd_local
> unconditionally...
>
> It does catch part of the bug if you have cross function analysis:
>
>   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
>   error: we previously assumed 'cmd_fusion->scmd' could be null (see
line 2281)
>
> But that code was from 2010 so I never reported it to the original
author or the
> list.
"cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set to
MPI2_FUNCTION_SCSI_IO_REQUEST (OR) MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
(inside these two cases only, cmd_fusion->scmd will be dereferenced). If
cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that will a
BUG and
should not continue with other commands processing in that case.

>
> regards,
> dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Carpenter May 9, 2016, 7:29 p.m. UTC | #10
I'm confused what you are saying.  According to Petros, these bugs are
causing failures in real life, I was only added to the CC list because
Smatch should have warned about them.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petros Koutoupis May 9, 2016, 9:28 p.m. UTC | #11
On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > 
> > -----Original Message-----
> > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > Sent: Monday, May 09, 2016 1:36 PM
> > To: Finn Thain
> > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > 
> > Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
> NULL
> > 
> > then assign "scmd_local = cmd_fusion->scmd;" and dereference scmd_local
> > unconditionally...
> > 
> > It does catch part of the bug if you have cross function analysis:
> > 
> >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
> >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> line 2281)
> > 
> > 
> > But that code was from 2010 so I never reported it to the original
> author or the
> > 
> > list.
> "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set to
> MPI2_FUNCTION_SCSI_IO_REQUEST (OR) MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> (inside these two cases only, cmd_fusion->scmd will be dereferenced). If
> cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that will a
> BUG and
> should not continue with other commands processing in that case.
> 

Sumit,

To clarify, a detection of cmd_fusion->scmd being NULL with
scsi_io_req->Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
BUG() and not attempt to iterate to the next command in the
list. Thank you.

--
Petros
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sumit Saxena May 11, 2016, 9:41 a.m. UTC | #12
> -----Original Message-----
> From: Petros Koutoupis [mailto:petros@petroskoutoupis.com]
> Sent: Tuesday, May 10, 2016 2:59 AM
> To: Sumit Saxena; Dan Carpenter; Finn Thain
> Cc: kashyap.desai@avagotech.com; sumit.saxena@avagotech.com;
> uday.lingala@avagotech.com; megaraidlinux.pdl@avagotech.com; linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > >
> > > -----Original Message-----
> > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > > Sent: Monday, May 09, 2016 1:36 PM
> > > To: Finn Thain
> > > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > >
> > > Smatch doesn't quite catch it because we check "cmd_fusion->scmd"
> > > for
> > NULL
> > >
> > > then assign "scmd_local = cmd_fusion->scmd;" and dereference
> > > scmd_local unconditionally...
> > >
> > > It does catch part of the bug if you have cross function analysis:
> > >
> > >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318
> > > complete_cmd_fusion()
> > >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> > line 2281)
> > >
> > >
> > > But that code was from 2010 so I never reported it to the original
> > author or the
> > >
> > > list.
> > "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set
> > to MPI2_FUNCTION_SCSI_IO_REQUEST (OR)
> > MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> > (inside these two cases only, cmd_fusion->scmd will be dereferenced).
> > If cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that
> > will a BUG and should not continue with other commands processing in
> > that case.
> >
>
> Sumit,
>
> To clarify, a detection of cmd_fusion->scmd being NULL with scsi_io_req-
> >Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
> MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
> BUG() and not attempt to iterate to the next command in the list. Thank
> you.

Petros,

WARN_ON() can be used in this case. Upstream may have concerns on using
BUG_ON() and also BUG_ON() won't help in this case. In production
environment we never encountered this.

Thanks,
Sumit
>
> --
> Petros
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petros Koutoupis May 12, 2016, 1:49 a.m. UTC | #13
On Wed, 2016-05-11 at 15:11 +0530, Sumit Saxena wrote:
> > -----Original Message-----
> > From: Petros Koutoupis [mailto:petros@petroskoutoupis.com]
> > Sent: Tuesday, May 10, 2016 2:59 AM
> > To: Sumit Saxena; Dan Carpenter; Finn Thain
> > Cc: kashyap.desai@avagotech.com; sumit.saxena@avagotech.com;
> > uday.lingala@avagotech.com; megaraidlinux.pdl@avagotech.com; linux-
> > scsi@vger.kernel.org
> > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> >
> > On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > > >
> > > > -----Original Message-----
> > > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > > > Sent: Monday, May 09, 2016 1:36 PM
> > > > To: Finn Thain
> > > > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > > > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > > > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > > > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > > >
> > > > Smatch doesn't quite catch it because we check "cmd_fusion->scmd"
> > > > for
> > > NULL
> > > >
> > > > then assign "scmd_local = cmd_fusion->scmd;" and dereference
> > > > scmd_local unconditionally...
> > > >
> > > > It does catch part of the bug if you have cross function analysis:
> > > >
> > > >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318
> > > > complete_cmd_fusion()
> > > >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> > > line 2281)
> > > >
> > > >
> > > > But that code was from 2010 so I never reported it to the original
> > > author or the
> > > >
> > > > list.
> > > "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set
> > > to MPI2_FUNCTION_SCSI_IO_REQUEST (OR)
> > > MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> > > (inside these two cases only, cmd_fusion->scmd will be dereferenced).
> > > If cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that
> > > will a BUG and should not continue with other commands processing in
> > > that case.
> > >
> >
> > Sumit,
> >
> > To clarify, a detection of cmd_fusion->scmd being NULL with scsi_io_req-
> > >Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
> > MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
> > BUG() and not attempt to iterate to the next command in the list. Thank
> > you.
> 
> Petros,
> 
> WARN_ON() can be used in this case. Upstream may have concerns on using
> BUG_ON() and also BUG_ON() won't help in this case. In production
> environment we never encountered this.
> 
> Thanks,
> Sumit
> >
> > --
> > Petros

Sumit,

I will resubmit the patch with all the recommendations. Thank you. In case
you are interested, I have a crash file showcasing the error. I can always
provide this outside of this mailing thread.

--
Petros


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Carpenter May 12, 2016, 6:35 a.m. UTC | #14
On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> Sumit,
> 
> I will resubmit the patch with all the recommendations. Thank you. In case
> you are interested, I have a crash file showcasing the error. I can always
> provide this outside of this mailing thread.
> 

Please send it to the thread.

To be honest, I totally can't understand this thread.  Sumit says it is
impossible and you are saying that you have seen it happen in real life.
Are you using out of tree code or something?  What are you doing that is
unexpected?

I don't see the point of adding a WARN_ON().  NULL derefs normally
generate a pretty clear stack trace already (unless they are caused by
memory corruption).  Why are we not just fixing the bugs instead of
warning and then crashing.

Also when I'm doing static analysis people always tell me that "that bug
is impossible, trust me." and instead of trusting people I really wish
they would just show me the relevant code that prevents it from
happening.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sumit Saxena May 12, 2016, 12:35 p.m. UTC | #15
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Thursday, May 12, 2016 12:05 PM
> To: Petros Koutoupis
> Cc: Sumit Saxena; Finn Thain; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> > Sumit,
> >
> > I will resubmit the patch with all the recommendations. Thank you. In
> > case you are interested, I have a crash file showcasing the error. I
> > can always provide this outside of this mailing thread.
> >
>
> Please send it to the thread.
>
> To be honest, I totally can't understand this thread.  Sumit says it is
impossible
> and you are saying that you have seen it happen in real life.
> Are you using out of tree code or something?  What are you doing that is
> unexpected?
>
> I don't see the point of adding a WARN_ON().  NULL derefs normally
generate a
> pretty clear stack trace already (unless they are caused by memory
corruption).
> Why are we not just fixing the bugs instead of warning and then
crashing.
Agree, if there scsi_cmnd is coming as NULL, please attach logs. I will
look into them.
>
> Also when I'm doing static analysis people always tell me that "that bug
is
> impossible, trust me." and instead of trusting people I really wish they
would just
> show me the relevant code that prevents it from happening.
Inside megasas_build_io_fusion() function,  driver sets "cmd->scmd"
pointer(SCSI command pointer received from SCSI mid layer). Functions
called inside megasas_build_io_fusion()(which actually builds frame to be
sent to firmware)
are setting Function type- MPI2_FUNCTION_SCSI_IO_REQUEST (or)
MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST. So in case Function type set to any
one these two, there must be valid "cmd->scmd".

Thanks,
Sumit

>
> regards,
> dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Finn Thain May 13, 2016, 7:43 a.m. UTC | #16
On Thu, 12 May 2016, Sumit Saxena wrote:

> > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> >
> > Also when I'm doing static analysis people always tell me that "that 
> > bug is impossible, trust me." and instead of trusting people I really 
> > wish they would just show me the relevant code that prevents it from 
> > happening.
> 
> Inside megasas_build_io_fusion() function, driver sets "cmd->scmd" 
> pointer(SCSI command pointer received from SCSI mid layer). Functions 
> called inside megasas_build_io_fusion()(which actually builds frame to 
> be sent to firmware) are setting Function type- 
> MPI2_FUNCTION_SCSI_IO_REQUEST (or) MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST. 
> So in case Function type set to any one these two, there must be valid 
> "cmd->scmd".

That doesn't show what prevents the bug. It merely shows that the bug does 
not always manifest.

For example, you might check whether anything prevents 
megasas_build_io_fusion() from returning before assigning cmd->scmd, like 
so:

2112 	if (sge_count > instance->max_num_sge) {
2113 		dev_err(&instance->pdev->dev, "Error. sge_count (0x%x) exceeds "
2114 		       "max (0x%x) allowed\n", sge_count,
2115 		       instance->max_num_sge);
2116 		return 1;
2117 	}

Another possibility: cmd->io_request->Function is valid yet cmd->scmd is 
NULL when seen from the interrupt handler if it intervenes between the two 
statements in megasas_return_cmd_fusion():

180 inline void megasas_return_cmd_fusion(struct megasas_instance *instance,
181 	struct megasas_cmd_fusion *cmd)
182 {
183 	cmd->scmd = NULL;
184 	memset(cmd->io_request, 0, sizeof(struct MPI2_RAID_SCSI_IO_REQUEST));
185 }

You might want to confirm that locking always prevents that.

OTOH, without an actual backtrace, I too might be reluctant to pursue this 
kind of speculation.
Sumit Saxena May 13, 2016, 7:43 a.m. UTC | #17
> -----Original Message-----
> From: Finn Thain [mailto:fthain@telegraphics.com.au]
> Sent: Friday, May 13, 2016 1:14 PM
> To: Sumit Saxena
> Cc: Dan Carpenter; Petros Koutoupis; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
>
> On Thu, 12 May 2016, Sumit Saxena wrote:
>
> > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > >
> > > Also when I'm doing static analysis people always tell me that "that
> > > bug is impossible, trust me." and instead of trusting people I
> > > really wish they would just show me the relevant code that prevents
> > > it from happening.
> >
> > Inside megasas_build_io_fusion() function, driver sets "cmd->scmd"
> > pointer(SCSI command pointer received from SCSI mid layer). Functions
> > called inside megasas_build_io_fusion()(which actually builds frame to
> > be sent to firmware) are setting Function type-
> > MPI2_FUNCTION_SCSI_IO_REQUEST (or)
> MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST.
> > So in case Function type set to any one these two, there must be valid
> > "cmd->scmd".
>
> That doesn't show what prevents the bug. It merely shows that the bug
does not
> always manifest.
>
> For example, you might check whether anything prevents
> megasas_build_io_fusion() from returning before assigning cmd->scmd,
like
> so:
>
> 2112 	if (sge_count > instance->max_num_sge) {
> 2113 		dev_err(&instance->pdev->dev, "Error. sge_count (0x%x)
> exceeds "
> 2114 		       "max (0x%x) allowed\n", sge_count,
> 2115 		       instance->max_num_sge);
> 2116 		return 1;
> 2117 	}
>
> Another possibility: cmd->io_request->Function is valid yet cmd->scmd is
NULL
> when seen from the interrupt handler if it intervenes between the two
> statements in megasas_return_cmd_fusion():

For IOs returned by above error are not actually fired to firmware so
there will be no interrupt handler called for the same.
Anyways, if anyone has logs pertaining to the failure, please attach..
>
> 180 inline void megasas_return_cmd_fusion(struct megasas_instance
*instance,
> 181 	struct megasas_cmd_fusion *cmd)
> 182 {
> 183 	cmd->scmd = NULL;
> 184 	memset(cmd->io_request, 0, sizeof(struct
> MPI2_RAID_SCSI_IO_REQUEST));
> 185 }
>
> You might want to confirm that locking always prevents that.
>
> OTOH, without an actual backtrace, I too might be reluctant to pursue
this kind
> of speculation.
>
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Finn Thain May 13, 2016, 9:25 a.m. UTC | #18
On Fri, 13 May 2016, Sumit Saxena wrote:

> 
> For IOs returned by above error are not actually fired to firmware so 
> there will be no interrupt handler called for the same.

You don't need both possibilities to hold. Either one would do it!
Petros Koutoupis May 13, 2016, 11:34 p.m. UTC | #19
On Thu, 2016-05-12 at 09:35 +0300, Dan Carpenter wrote:
> On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> > Sumit,
> > 
> > I will resubmit the patch with all the recommendations. Thank you. In case
> > you are interested, I have a crash file showcasing the error. I can always
> > provide this outside of this mailing thread.
> > 
> 
> Please send it to the thread.
> 
> To be honest, I totally can't understand this thread.  Sumit says it is
> impossible and you are saying that you have seen it happen in real life.
> Are you using out of tree code or something?  What are you doing that is
> unexpected?
> 
> I don't see the point of adding a WARN_ON().  NULL derefs normally
> generate a pretty clear stack trace already (unless they are caused by
> memory corruption).  Why are we not just fixing the bugs instead of
> warning and then crashing.
> 
> Also when I'm doing static analysis people always tell me that "that bug
> is impossible, trust me." and instead of trusting people I really wish
> they would just show me the relevant code that prevents it from
> happening.
> 
> regards,
> dan carpenter
> 
All,

You will find the tarball containing the crash and kernel images here:
https://drive.google.com/open?id=0B771xrWtHcpWWFI3Yl93WFBfT28

Note that it is a 7GB file.

--
Petros

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
+++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
@@ -2277,6 +2277,10 @@  complete_cmd_fusion(struct megasas_insta
 
 		if (cmd_fusion->scmd)
 			cmd_fusion->scmd->SCp.ptr = NULL;
+		else if ((!cmd_fusion->scmd) &&
+			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
+			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
+			goto next;
 
 		scmd_local = cmd_fusion->scmd;
 		status = scsi_io_req->RaidContext.status;
@@ -2336,7 +2340,7 @@  complete_cmd_fusion(struct megasas_insta
 				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
 			break;
 		}
-
+next:
 		fusion->last_reply_idx[MSIxIndex]++;
 		if (fusion->last_reply_idx[MSIxIndex] >=
 		    fusion->reply_q_depth)