diff mbox

[03/15] mpt3sas: Add sanity checks for scsi tracker before accessing it.

Message ID 1522402644-3016-4-git-send-email-chaitra.basappa@broadcom.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Chaitra P B March 30, 2018, 9:37 a.m. UTC
Check scsi tracker for NULL before accessing it.
And in some places there are possibilities for getting valid st
but still other fields are not set.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_ctl.c   | 5 ++++-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 9 ++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig March 30, 2018, 11:59 a.m. UTC | #1
On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:
> Check scsi tracker for NULL before accessing it.
> And in some places there are possibilities for getting valid st
> but still other fields are not set.

Please explain how that could ever happen.  You should never see
a scsi_cmnd without the device pointer.
Bart Van Assche March 30, 2018, 3:54 p.m. UTC | #2
On Fri, 2018-03-30 at 15:07 +0530, Chaitra P B wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c

> index c1b17d6..2f27d5c 100644

> --- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c

> +++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c

> @@ -590,7 +590,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg,

>  		struct scsiio_tracker *st;

>  

>  		scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);

> -		if (!scmd)

> +		if (scmd == NULL || scmd->device == NULL ||

> +				scmd->device->hostdata == NULL)

>  			continue;

>  		if (lun != scmd->device->lun)

>  			continue;


Is _ctl_set_task_mid() always called from the I/O completion path? As
Christoph already wrote, these checks do not make sense in the completion
path.

> @@ -600,6 +601,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg,

>  		if (priv_data->sas_target->handle != handle)

>  			continue;

>  		st = scsi_cmd_priv(scmd);

> +		if ((!st) || (st->smid == 0))

> +			continue;

>  		tm_request->TaskMID = cpu_to_le16(st->smid);

>  		found = 1;

>  	}


Since the I/O submission path guarantees that st->smid != 0, how could
st->smid ever be zero in the I/O completion path?

> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> index c9cce65..6b1aaa0 100644

> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> @@ -1465,7 +1465,7 @@ mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc, u16 smid)

>  		scmd = scsi_host_find_tag(ioc->shost, unique_tag);

>  		if (scmd) {

>  			st = scsi_cmd_priv(scmd);

> -			if (st->cb_idx == 0xFF)

> +			if ((!st) || (st->cb_idx == 0xFF) || (st->smid == 0))

>  				scmd = NULL;

>  		}

>  	}

> @@ -4451,6 +4451,13 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)

>  		count++;

>  		_scsih_set_satl_pending(scmd, false);

>  		st = scsi_cmd_priv(scmd);

> +		/*

> +		 * It may be possible that SCSI scmd got prepared by SML

> +		 * but it has not issued to the driver, for these type of

> +		 * scmd's don't do anything"

> +		 */

> +		if (st && st->smid == 0)

> +			continue;


This seems wrong to me. If a SCSI command has not been submitted to the
firmware skipping it in this function will introduce a delay because the
command will only be completed after it has timed out and after the SCSI
error handler has finished its processing. I think it's better to complete
the command from this function instead of waiting until for the SCSI error
handler.

Bart.
Sreekanth Reddy April 2, 2018, 6:23 a.m. UTC | #3
On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:
>> Check scsi tracker for NULL before accessing it.
>> And in some places there are possibilities for getting valid st
>> but still other fields are not set.
>
> Please explain how that could ever happen.  You should never see
> a scsi_cmnd without the device pointer.

Chris,

Here is one example, During Host reset operation time driver will
flush out all the outstanding IO's to the SML in below function,

static void
_scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
{
        struct scsi_cmnd *scmd;
        struct scsiio_tracker *st;
        u16 smid;
        int count = 0;

[SR] Here driver is looping starting from smid one to HBA queue depth.
        for (smid = 1; smid <= ioc->scsiio_depth; smid++) {

[SR] Some times it is possible that scsi_cmnd might have created at
SML but it might not be issued to the driver as host reset operation
is going on, So here we will get non-NULL scmd.
                scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                if (!scmd)
                        continue;
                count++;
                _scsih_set_satl_pending(scmd, false);
[SR] Here we are trying to get the scsi tracker 'st' for the above
scmd (which is not received by the driver) and so fields of this 'st'
are uninitialized.
                st = scsi_cmd_priv(scmd);
[SR] And here we are trying to clear the scsi tracker 'st' which is
not yet all initialized by the driver, in other terms we are trying to
flush out the scmd command which is not yet all received by the
driver.
                mpt3sas_base_clear_st(ioc, st);
                scsi_dma_unmap(scmd);
                if (ioc->pci_error_recovery || ioc->remove_host)
                        scmd->result = DID_NO_CONNECT << 16;
                else
                        scmd->result = DID_RESET << 16;
                scmd->scsi_done(scmd);
        }
        dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n",
            ioc->name, count));
}

Thanks,
Sreekanth
Bart Van Assche April 2, 2018, 3:25 p.m. UTC | #4
On Mon, 2018-04-02 at 11:53 +0530, Sreekanth Reddy wrote:
> On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote:

> > On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:

> > > Check scsi tracker for NULL before accessing it.

> > > And in some places there are possibilities for getting valid st

> > > but still other fields are not set.

> > 

> > Please explain how that could ever happen.  You should never see

> > a scsi_cmnd without the device pointer.

> 

> Chris,

> 

> Here is one example, During Host reset operation time driver will

> flush out all the outstanding IO's to the SML in below function,

> 

> static void

> _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)

> {

>         struct scsi_cmnd *scmd;

>         struct scsiio_tracker *st;

>         u16 smid;

>         int count = 0;

> 

> [SR] Here driver is looping starting from smid one to HBA queue depth.

>         for (smid = 1; smid <= ioc->scsiio_depth; smid++) {

> 

> [SR] Some times it is possible that scsi_cmnd might have created at

> SML but it might not be issued to the driver as host reset operation

> is going on, So here we will get non-NULL scmd.

>                 scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);

>                 if (!scmd)

>                         continue;

>                 count++;

>                 _scsih_set_satl_pending(scmd, false);

> [SR] Here we are trying to get the scsi tracker 'st' for the above

> scmd (which is not received by the driver) and so fields of this 'st'

> are uninitialized.

>                 st = scsi_cmd_priv(scmd);

> [SR] And here we are trying to clear the scsi tracker 'st' which is

> not yet all initialized by the driver, in other terms we are trying to

> flush out the scmd command which is not yet all received by the

> driver.

>                 mpt3sas_base_clear_st(ioc, st);

>                 scsi_dma_unmap(scmd);

>                 if (ioc->pci_error_recovery || ioc->remove_host)

>                         scmd->result = DID_NO_CONNECT << 16;

>                 else

>                         scmd->result = DID_RESET << 16;

>                 scmd->scsi_done(scmd);

>         }

>         dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n",

>             ioc->name, count));

> }


Hello Sreekanth,

From mpt3sas_scsih_scsi_lookup_get():

			st = scsi_cmd_priv(scmd);
			if (st->cb_idx == 0xFF)
				scmd = NULL;

From mpt3sas_base_get_smid(), mpt3sas_base_get_smid_scsiio() and
mpt3sas_base_get_smid_hpr():

	request->cb_idx = cb_idx;

Can _scsih_flush_running_cmds() be executed concurrently with scsih_qcmd()?
In other words, can it happen that mpt3sas_scsih_scsi_lookup_get() sees that
st->cb_idx == 0xff just before scsih_qcmd() assigns a value to .cb_idx? Can
this cause _scsih_flush_running_cmds() to skip commands that it shouldn't
skip?

Can it happen that _scsih_flush_running_cmds() calls .scsi_done() for a SCSI
command just after scsih_qcmd() transferred control for that command to the
firmware? Can that cause .scsi_done() to be called twice for the same command?

Thanks,

Bart.
Sreekanth Reddy April 3, 2018, 4:41 a.m. UTC | #5
On Mon, Apr 2, 2018 at 8:55 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Mon, 2018-04-02 at 11:53 +0530, Sreekanth Reddy wrote:
>> On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> > On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:
>> > > Check scsi tracker for NULL before accessing it.
>> > > And in some places there are possibilities for getting valid st
>> > > but still other fields are not set.
>> >
>> > Please explain how that could ever happen.  You should never see
>> > a scsi_cmnd without the device pointer.
>>
>> Chris,
>>
>> Here is one example, During Host reset operation time driver will
>> flush out all the outstanding IO's to the SML in below function,
>>
>> static void
>> _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
>> {
>>         struct scsi_cmnd *scmd;
>>         struct scsiio_tracker *st;
>>         u16 smid;
>>         int count = 0;
>>
>> [SR] Here driver is looping starting from smid one to HBA queue depth.
>>         for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
>>
>> [SR] Some times it is possible that scsi_cmnd might have created at
>> SML but it might not be issued to the driver as host reset operation
>> is going on, So here we will get non-NULL scmd.
>>                 scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
>>                 if (!scmd)
>>                         continue;
>>                 count++;
>>                 _scsih_set_satl_pending(scmd, false);
>> [SR] Here we are trying to get the scsi tracker 'st' for the above
>> scmd (which is not received by the driver) and so fields of this 'st'
>> are uninitialized.
>>                 st = scsi_cmd_priv(scmd);
>> [SR] And here we are trying to clear the scsi tracker 'st' which is
>> not yet all initialized by the driver, in other terms we are trying to
>> flush out the scmd command which is not yet all received by the
>> driver.
>>                 mpt3sas_base_clear_st(ioc, st);
>>                 scsi_dma_unmap(scmd);
>>                 if (ioc->pci_error_recovery || ioc->remove_host)
>>                         scmd->result = DID_NO_CONNECT << 16;
>>                 else
>>                         scmd->result = DID_RESET << 16;
>>                 scmd->scsi_done(scmd);
>>         }
>>         dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n",
>>             ioc->name, count));
>> }
>
> Hello Sreekanth,
>
> From mpt3sas_scsih_scsi_lookup_get():
>
>                         st = scsi_cmd_priv(scmd);
>                         if (st->cb_idx == 0xFF)
>                                 scmd = NULL;
>
> From mpt3sas_base_get_smid(), mpt3sas_base_get_smid_scsiio() and
> mpt3sas_base_get_smid_hpr():
>
>         request->cb_idx = cb_idx;
>
> Can _scsih_flush_running_cmds() be executed concurrently with scsih_qcmd()?

[SR] No, driver calls _scsih_flush_running_cmds during Host reset time
and during host reset time driver will set 'ioc->shost_recovery' flag.
So in the scsih_qcmd() driver will return the incoming SCSI cmds with
"SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as
shown below,

/* host recovery or link resets sent via IOCTLs */
        if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress)
                return SCSI_MLQUEUE_HOST_BUSY;

> In other words, can it happen that mpt3sas_scsih_scsi_lookup_get() sees that
> st->cb_idx == 0xff just before scsih_qcmd() assigns a value to .cb_idx? Can
> this cause _scsih_flush_running_cmds() to skip commands that it shouldn't
> skip?
[SR] No, it won't happen. as I explained above during host reset time
driver return the incoming SCSI commands with host busy status and
_scsih_flush_running_cmds called during the host reset time.

>
> Can it happen that _scsih_flush_running_cmds() calls .scsi_done() for a SCSI
> command just after scsih_qcmd() transferred control for that command to the
> firmware? Can that cause .scsi_done() to be called twice for the same command?
[SR] No, while _scsih_flush_running_cmds() function is getting
executed no SCSI commands are issued to the firmware.

>
> Thanks,
>
> Bart.
>
>
Bart Van Assche April 3, 2018, 3:56 p.m. UTC | #6
On Tue, 2018-04-03 at 10:11 +0530, Sreekanth Reddy wrote:
> [SR] No, driver calls _scsih_flush_running_cmds during Host reset time

> and during host reset time driver will set 'ioc->shost_recovery' flag.

> So in the scsih_qcmd() driver will return the incoming SCSI cmds with

> "SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as

> shown below,

> 

> /* host recovery or link resets sent via IOCTLs */

>         if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress)

>                 return SCSI_MLQUEUE_HOST_BUSY;


The ioc->shost_recovery flag is involved in at least the following race:
* From one context a SCSI command is submitted and scsih_qcmd() gets called.
* At the same time sg_reset is invoked from a shell and triggers a call to
  scsih_host_reset(). That function in turn will call
  mpt3sas_base_hard_reset_handler().

I think this scenario can cause ioc->shost_recovery to be set by the mpt3sas
driver after it has been checked by the scsih_qcmd() function.

Anyway, let's get back to patch 03/15 at the start of this e-mail thread.
That patch looks to me like an incomplete attempt to work around a race
condition in the mpt3sas driver. I don't expect that anyone will trust that
patch without further explanation. Which race condition does that patch
address? And why do the mpt3sas maintainers believe that this patch is
sufficient to address that race condition?

Thanks,

Bart.
Sreekanth Reddy April 4, 2018, 12:28 p.m. UTC | #7
On Tue, Apr 3, 2018 at 9:26 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Tue, 2018-04-03 at 10:11 +0530, Sreekanth Reddy wrote:
>> [SR] No, driver calls _scsih_flush_running_cmds during Host reset time
>> and during host reset time driver will set 'ioc->shost_recovery' flag.
>> So in the scsih_qcmd() driver will return the incoming SCSI cmds with
>> "SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as
>> shown below,
>>
>> /* host recovery or link resets sent via IOCTLs */
>>         if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress)
>>                 return SCSI_MLQUEUE_HOST_BUSY;
>
> The ioc->shost_recovery flag is involved in at least the following race:
> * From one context a SCSI command is submitted and scsih_qcmd() gets called.
> * At the same time sg_reset is invoked from a shell and triggers a call to
>   scsih_host_reset(). That function in turn will call
>   mpt3sas_base_hard_reset_handler().
>
> I think this scenario can cause ioc->shost_recovery to be set by the mpt3sas
> driver after it has been checked by the scsih_qcmd() function.

Then these outstanding commands will be get flush by the driver in
_scsih_flush_running_cmds() function which we call as a part of host
reset operation.

>
> Anyway, let's get back to patch 03/15 at the start of this e-mail thread.
> That patch looks to me like an incomplete attempt to work around a race
> condition in the mpt3sas driver. I don't expect that anyone will trust that
> patch without further explanation. Which race condition does that patch
> address? And why do the mpt3sas maintainers believe that this patch is
> sufficient to address that race condition?

Sure Bart, we will add proper description with the information which I
explained in this mail thread on how this patch will fix below issue,

During Host reset operation time driver will
flush out all the outstanding IO's to the SML in below function,

static void
_scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
{
        struct scsi_cmnd *scmd;
        struct scsiio_tracker *st;
        u16 smid;
        int count = 0;

[SR] Here driver is looping starting from smid one to HBA queue depth.
        for (smid = 1; smid <= ioc->scsiio_depth; smid++) {

[SR] Some times it is possible that scsi_cmnd might have created at
SML but it might not be issued to the driver as host reset operation
is going on, So here we will get non-NULL scmd.
                scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                if (!scmd)
                        continue;
                count++;
                _scsih_set_satl_pending(scmd, false);
[SR] Here we are trying to get the scsi tracker 'st' for the above
scmd (which is not received by the driver) and so fields of this 'st'
are uninitialized.
                st = scsi_cmd_priv(scmd);
[SR] And here we are trying to clear the scsi tracker 'st' which is
not yet all initialized by the driver, in other terms we are trying to
flush out the scmd command which is not yet all received by the
driver.
                mpt3sas_base_clear_st(ioc, st);
                scsi_dma_unmap(scmd);
                if (ioc->pci_error_recovery || ioc->remove_host)
                        scmd->result = DID_NO_CONNECT << 16;
                else
                        scmd->result = DID_RESET << 16;
                scmd->scsi_done(scmd);
        }
        dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n",
            ioc->name, count));
}

>
> Thanks,
>
> Bart.
>
diff mbox

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index c1b17d6..2f27d5c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -590,7 +590,8 @@  _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg,
 		struct scsiio_tracker *st;
 
 		scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
-		if (!scmd)
+		if (scmd == NULL || scmd->device == NULL ||
+				scmd->device->hostdata == NULL)
 			continue;
 		if (lun != scmd->device->lun)
 			continue;
@@ -600,6 +601,8 @@  _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg,
 		if (priv_data->sas_target->handle != handle)
 			continue;
 		st = scsi_cmd_priv(scmd);
+		if ((!st) || (st->smid == 0))
+			continue;
 		tm_request->TaskMID = cpu_to_le16(st->smid);
 		found = 1;
 	}
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index c9cce65..6b1aaa0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -1465,7 +1465,7 @@  mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc, u16 smid)
 		scmd = scsi_host_find_tag(ioc->shost, unique_tag);
 		if (scmd) {
 			st = scsi_cmd_priv(scmd);
-			if (st->cb_idx == 0xFF)
+			if ((!st) || (st->cb_idx == 0xFF) || (st->smid == 0))
 				scmd = NULL;
 		}
 	}
@@ -4451,6 +4451,13 @@  _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		count++;
 		_scsih_set_satl_pending(scmd, false);
 		st = scsi_cmd_priv(scmd);
+		/*
+		 * It may be possible that SCSI scmd got prepared by SML
+		 * but it has not issued to the driver, for these type of
+		 * scmd's don't do anything"
+		 */
+		if (st && st->smid == 0)
+			continue;
 		mpt3sas_base_clear_st(ioc, st);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)