diff mbox

megaraid_sas: move command counter to correct place

Message ID 20170728140359.15424-1-thenzl@redhat.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Tomas Henzl July 28, 2017, 2:03 p.m. UTC
the eh reset function returns success when fw_outstanding equals zero,
that means that the counter shouldn't be decremented
when the driver still owns the command


Signed-off-by: Tomas Henzl <thenzl@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Martin K. Petersen Aug. 7, 2017, 5:20 p.m. UTC | #1
Tomas,

> the eh reset function returns success when fw_outstanding equals zero,
> that means that the counter shouldn't be decremented
> when the driver still owns the command

Kashyap? Sumit?
Sumit Saxena Aug. 7, 2017, 5:31 p.m. UTC | #2
>-----Original Message-----
>From: Tomas Henzl [mailto:thenzl@redhat.com]
>Sent: Friday, July 28, 2017 7:34 PM
>To: linux-scsi@vger.kernel.org
>Cc: sumit.saxena@broadcom.com; kashyap.desai@broadcom.com
>Subject: [PATCH] megaraid_sas: move command counter to correct place
>
>the eh reset function returns success when fw_outstanding equals zero,
that
>means that the counter shouldn't be decremented when the driver still
owns
>the command
>
>
>Signed-off-by: Tomas Henzl <thenzl@redhat.com>
>---
> drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>index f990ab4d45..c615aadb2b 100644
>--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>@@ -3046,7 +3046,6 @@ complete_cmd_fusion(struct megasas_instance
>*instance, u32 MSIxIndex)
> 			}
> 			//Fall thru and complete IO
> 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO
>Path */
>-			atomic_dec(&instance->fw_outstanding);
> 			if (cmd_fusion->r1_alt_dev_handle ==
>MR_DEVHANDLE_INVALID) {
> 				map_cmd_status(fusion, scmd_local, status,
> 					       extStatus,
>le32_to_cpu(data_length), @@ -3060,6 +3059,7 @@
>complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex)
> 				scmd_local->scsi_done(scmd_local);
> 			} else	/* Optimal VD - R1 FP command completion.
>*/
> 				megasas_complete_r1_command(instance,
>cmd_fusion);
>+			atomic_dec(&instance->fw_outstanding);
> 			break;
> 		case MEGASAS_MPI2_FUNCTION_PASSTHRU_IO_REQUEST:
>/*MFI command */
> 			cmd_mfi = instance->cmd_list[cmd_fusion-
>>sync_cmd_idx];

Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>

>--
>2.9.4
Martin K. Petersen Aug. 7, 2017, 5:36 p.m. UTC | #3
Tomas,

> the eh reset function returns success when fw_outstanding equals zero,
> that means that the counter shouldn't be decremented when the driver
> still owns the command

Applied to 4.13/scsi-fixes. Thank you!
Sumit Saxena Aug. 8, 2017, 7:37 a.m. UTC | #4
>-----Original Message-----
>From: Martin K. Petersen [mailto:martin.petersen@oracle.com]
>Sent: Monday, August 07, 2017 11:07 PM
>To: Tomas Henzl
>Cc: linux-scsi@vger.kernel.org; sumit.saxena@broadcom.com;
>kashyap.desai@broadcom.com
>Subject: Re: [PATCH] megaraid_sas: move command counter to correct place
>
>
>Tomas,
>
>> the eh reset function returns success when fw_outstanding equals zero,
>> that means that the counter shouldn't be decremented when the driver
>> still owns the command
>
>Applied to 4.13/scsi-fixes. Thank you!

Just realized that this patch may cause performance regression.
With this patch below scenario may occur-

-Consider outstanding IOs reaches to controller's Queue depth.
-Driver frees up command and complete it back to SML.
-Since host_busy is decremented, SML can issue one new  IO to driver.
-By the time, if "fw_outstanding" is not decremented by driver, then
driver will return newly submitted IO back to SML with return status
SCSI_MLQUEUE_HOST_BUSY because of below code
  in megaraid_sas driver's IO submission path-

    if (atomic_inc_return(&instance->fw_outstanding) >
            instance->host->can_queue) {
        atomic_dec(&instance->fw_outstanding);
        return SCSI_MLQUEUE_HOST_BUSY;
    }

This situation will be more evident when RAID1 fastpath IOs are running as
in that case driver will be issuing two IOs to firmware for single IO
issued from SML.
Above situation can be avoided, if this patch is removed.

Regarding Tomas' concern, there should not be any problem as driver calls
"synchronize_irq" before checking "fw_outstanding". Once fw_outstanding is
decremented and
driver frees up command, then only driver will be checking
"fw_outstanding" equals to zero or not so all this will always fall in a
sequence and will not
cause the problem stated by Tomas.

I am sorry for confusion and would request to revert this patch.

Thanks,
Sumit

>
>--
>Martin K. Petersen	Oracle Linux Engineering
James Bottomley Aug. 8, 2017, 3:39 p.m. UTC | #5
On Tue, 2017-08-08 at 13:07 +0530, Sumit Saxena wrote:
> > 
> > -----Original Message-----
> > From: Martin K. Petersen [mailto:martin.petersen@oracle.com]
> > Sent: Monday, August 07, 2017 11:07 PM
> > To: Tomas Henzl
> > Cc: linux-scsi@vger.kernel.org; sumit.saxena@broadcom.com;
> > kashyap.desai@broadcom.com
> > Subject: Re: [PATCH] megaraid_sas: move command counter to correct
> > place
> > 
> > 
> > Tomas,
> > 
> > > 
> > > the eh reset function returns success when fw_outstanding equals
> > > zero,
> > > that means that the counter shouldn't be decremented when the
> > > driver
> > > still owns the command
> > 
> > Applied to 4.13/scsi-fixes. Thank you!
> 
> Just realized that this patch may cause performance regression.
> With this patch below scenario may occur-
> 
> -Consider outstanding IOs reaches to controller's Queue depth.
> -Driver frees up command and complete it back to SML.
> -Since host_busy is decremented, SML can issue one new  IO to driver.
> -By the time, if "fw_outstanding" is not decremented by driver, then
> driver will return newly submitted IO back to SML with return status
> SCSI_MLQUEUE_HOST_BUSY because of below code
>   in megaraid_sas driver's IO submission path-
> 
>     if (atomic_inc_return(&instance->fw_outstanding) >
>             instance->host->can_queue) {
>         atomic_dec(&instance->fw_outstanding);
>         return SCSI_MLQUEUE_HOST_BUSY;
>     }
> 
> This situation will be more evident when RAID1 fastpath IOs are
> running as in that case driver will be issuing two IOs to firmware
> for single IO issued from SML. Above situation can be avoided, if
> this patch is removed.
> 
> Regarding Tomas' concern, there should not be any problem as driver
> calls "synchronize_irq" before checking "fw_outstanding". Once
> fw_outstanding is decremented and driver frees up command, then only
> driver will be checking "fw_outstanding" equals to zero or not so all
> this will always fall in a sequence and will not cause the problem
> stated by Tomas.
> 
> I am sorry for confusion and would request to revert this patch.

OK, I've taken it out of the fixes tree.

Martin: this means my fixes branch got rebased; can you rebase your
fixes branch on top of mine before we all get a beating from Stephen
Rothwell because of a mismerge in linux-next due to the tree
differences?

Thanks,

James
Martin K. Petersen Aug. 8, 2017, 3:50 p.m. UTC | #6
James,

> Martin: this means my fixes branch got rebased; can you rebase your
> fixes branch on top of mine before we all get a beating from Stephen
> Rothwell because of a mismerge in linux-next due to the tree
> differences?

Done.
Tomas Henzl Aug. 9, 2017, 2:09 p.m. UTC | #7
On 8.8.2017 09:37, Sumit Saxena wrote:
>> -----Original Message-----
>> From: Martin K. Petersen [mailto:martin.petersen@oracle.com]
>> Sent: Monday, August 07, 2017 11:07 PM
>> To: Tomas Henzl
>> Cc: linux-scsi@vger.kernel.org; sumit.saxena@broadcom.com;
>> kashyap.desai@broadcom.com
>> Subject: Re: [PATCH] megaraid_sas: move command counter to correct place
>>
>>
>> Tomas,
>>
>>> the eh reset function returns success when fw_outstanding equals zero,
>>> that means that the counter shouldn't be decremented when the driver
>>> still owns the command
>> Applied to 4.13/scsi-fixes. Thank you!
> Just realized that this patch may cause performance regression.
> With this patch below scenario may occur-
>
> -Consider outstanding IOs reaches to controller's Queue depth.
> -Driver frees up command and complete it back to SML.
> -Since host_busy is decremented, SML can issue one new  IO to driver.
> -By the time, if "fw_outstanding" is not decremented by driver, then
> driver will return newly submitted IO back to SML with return status
> SCSI_MLQUEUE_HOST_BUSY because of below code
>   in megaraid_sas driver's IO submission path-
>
>     if (atomic_inc_return(&instance->fw_outstanding) >
>             instance->host->can_queue) {
>         atomic_dec(&instance->fw_outstanding);
>         return SCSI_MLQUEUE_HOST_BUSY;
>     }
>
> This situation will be more evident when RAID1 fastpath IOs are running as
> in that case driver will be issuing two IOs to firmware for single IO
> issued from SML.
> Above situation can be avoided, if this patch is removed.
>
> Regarding Tomas' concern, there should not be any problem as driver calls
> "synchronize_irq" before checking "fw_outstanding". Once fw_outstanding is
> decremented and
> driver frees up command, then only driver will be checking
> "fw_outstanding" equals to zero or not so all this will always fall in a
> sequence and will not
> cause the problem stated by Tomas.

I haven't expected this to fix a real issue in latest upstream code,
just wanted to follow the correct ordering.
If it creates a performance issue, reverting the patch is fine for me.

>
> I am sorry for confusion and would request to revert this patch.
>
> Thanks,
> Sumit
>
>> --
>> Martin K. Petersen	Oracle Linux Engineering
diff mbox

Patch

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index f990ab4d45..c615aadb2b 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -3046,7 +3046,6 @@  complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex)
 			}
 			//Fall thru and complete IO
 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
-			atomic_dec(&instance->fw_outstanding);
 			if (cmd_fusion->r1_alt_dev_handle == MR_DEVHANDLE_INVALID) {
 				map_cmd_status(fusion, scmd_local, status,
 					       extStatus, le32_to_cpu(data_length),
@@ -3060,6 +3059,7 @@  complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex)
 				scmd_local->scsi_done(scmd_local);
 			} else	/* Optimal VD - R1 FP command completion. */
 				megasas_complete_r1_command(instance, cmd_fusion);
+			atomic_dec(&instance->fw_outstanding);
 			break;
 		case MEGASAS_MPI2_FUNCTION_PASSTHRU_IO_REQUEST: /*MFI command */
 			cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];