diff mbox

[3.10,141/319] scsi: mpt3sas: Fix secure erase premature termination

Message ID ae8ffc4f86c5af1caf97dfa9a6493443@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Sathya Prakash Veerichetty Feb. 6, 2017, 4:21 p.m. UTC
Willy,
I think this patch had a problem and later modified to a different
blocking mechanism.  Could you please pull in the latest change for this?

Thanks
Sathya

-----Original Message-----
From: Willy Tarreau [mailto:w@1wt.eu]
Sent: Sunday, February 05, 2017 12:19 PM
To: linux-kernel@vger.kernel.org; stable@vger.kernel.org;
linux@roeck-us.net
Cc: Andrey Grodzovsky; linux-scsi@vger.kernel.org; Sathya Prakash; Chaitra
P B; Suganath Prabu Subramani; Sreekanth Reddy; Hannes Reinecke; Martin K
. Petersen; Willy Tarreau
Subject: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature
termination

From: Andrey Grodzovsky <andrey2805@gmail.com>

commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific command
which timed out, this leads to entire device halt (scsi_state terminated)
and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

 		scsi_print_command(scmd);
 #endif

+	/*
+	 * Lock the device for any subsequent command until command is
+	 * done.
+	 */
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_block(scmd->device);
+
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -4046,6 +4057,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16
smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;

+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);

 	if (mpi_reply == NULL) {
--
2.8.0.rc2.1.gbe9624a

Comments

Willy Tarreau Feb. 6, 2017, 10:26 p.m. UTC | #1
Hi Sathya,

On Mon, Feb 06, 2017 at 09:21:44AM -0700, Sathya Prakash Veerichetty wrote:
> Willy,
> I think this patch had a problem and later modified to a different
> blocking mechanism.  Could you please pull in the latest change for this?

Much appreciated, thanks. I've checked and found the patch you're
talking about :

  commit ffb58456589443ca572221fabbdef3db8483a779
  Author: James Bottomley <James.Bottomley@HansenPartnership.com>
  Date:   Sun Jan 1 09:39:24 2017 -0800

    scsi: mpt3sas: fix hang on ata passthrough commands
    
    mpt3sas has a firmware failure where it can only handle one pass through
    ATA command at a time.  If another comes in, contrary to the SAT
    standard, it will hang until the first one completes (causing long
    commands like secure erase to timeout).  The original fix was to block
    the device when an ATA command came in, but this caused a regression
    with
    
    commit 669f044170d8933c3d66d231b69ea97cb8447338
    Author: Bart Van Assche <bart.vanassche@sandisk.com>
    Date:   Tue Nov 22 16:17:13 2016 -0800
    
        scsi: srp_transport: Move queuecommand() wait code to SCSI core
    
    So fix the original fix of the secure erase timeout by properly
    returning SAM_STAT_BUSY like the SAT recommends.  The original patch
    also had a concurrency problem since scsih_qcmd is lockless at that
    point (this is fixed by using atomic bitops to set and test the flag).
    
    [mkp: addressed feedback wrt. test_bit and fixed whitespace]
    
    Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
    Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

We don't have the referenced commit above in 3.10 so we should be safe.
Additionally I checked that neither 4.4 nor 3.12 have them either, so
that makes me feel confident that we can skip it in 3.10 as well.

Thanks!
Willy
James Bottomley Feb. 7, 2017, 6:38 a.m. UTC | #2
On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> Hi Sathya,
> 
> On Mon, Feb 06, 2017 at 09:21:44AM -0700, Sathya Prakash Veerichetty
> wrote:
> > Willy,
> > I think this patch had a problem and later modified to a different
> > blocking mechanism.  Could you please pull in the latest change for
> > this?
> 
> Much appreciated, thanks. I've checked and found the patch you're
> talking about :
> 
>   commit ffb58456589443ca572221fabbdef3db8483a779
>   Author: James Bottomley <James.Bottomley@HansenPartnership.com>
>   Date:   Sun Jan 1 09:39:24 2017 -0800
> 
>     scsi: mpt3sas: fix hang on ata passthrough commands
>     
>     mpt3sas has a firmware failure where it can only handle one pass
> through
>     ATA command at a time.  If another comes in, contrary to the SAT
>     standard, it will hang until the first one completes (causing
> long
>     commands like secure erase to timeout).  The original fix was to
> block
>     the device when an ATA command came in, but this caused a
> regression
>     with
>     
>     commit 669f044170d8933c3d66d231b69ea97cb8447338
>     Author: Bart Van Assche <bart.vanassche@sandisk.com>
>     Date:   Tue Nov 22 16:17:13 2016 -0800
>     
>         scsi: srp_transport: Move queuecommand() wait code to SCSI
> core
>     
>     So fix the original fix of the secure erase timeout by properly
>     returning SAM_STAT_BUSY like the SAT recommends.  The original
> patch
>     also had a concurrency problem since scsih_qcmd is lockless at
> that
>     point (this is fixed by using atomic bitops to set and test the
> flag).
>     
>     [mkp: addressed feedback wrt. test_bit and fixed whitespace]
>     
>     Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature
> termination)
>     Signed-off-by: James Bottomley <
> James.Bottomley@HansenPartnership.com>
>     Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Reported-by: Ingo Molnar <mingo@kernel.org>
>     Tested-by: Ingo Molnar <mingo@kernel.org>
>     Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> We don't have the referenced commit above in 3.10 so we should be 
> safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> either, so that makes me feel confident that we can skip it in 3.10
> as well.

The original was also racy with respect to multiple commands, so the
above fixed the race as well.

James
diff mbox

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f8c4b85..e414b71 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,6 +3515,10 @@  _scsih_eedp_error_handling(struct scsi_cmnd *scmd,
u16 ioc_status)
 	    SAM_STAT_CHECK_CONDITION;
 }

+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) {
+	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); }

 /**
  * _scsih_qcmd_lck - main scsi request entry point @@ -3543,6 +3547,13 @@
_scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))