[v2,0/2] Fix SCSI async abort handling when eh_deadline is active

Message ID	20211029194311.17504-1-emilne@redhat.com (mailing list archive)
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: "Ewan D. Milne" <emilne@redhat.com> To: linux-scsi@vger.kernel.org Subject: [PATCH v2 0/2] Fix SCSI async abort handling when eh_deadline is active Date: Fri, 29 Oct 2021 15:43:09 -0400 Message-Id: <20211029194311.17504-1-emilne@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Fix SCSI async abort handling when eh_deadline is active \| expand [v2,0/2] Fix SCSI async abort handling when eh_deadline is active [v2,1/2] scsi: core: avoid leaving shost->last_reset with stale value if EH does not run [v2,2/2] scsi: core: simplify control flow in scmd_eh_abort_handler()

Message ID

20211029194311.17504-1-emilne@redhat.com (mailing list archive)

Headers

From: "Ewan D. Milne" <emilne@redhat.com>
To: linux-scsi@vger.kernel.org
Subject: [PATCH v2 0/2] Fix SCSI async abort handling when eh_deadline is
 active
Date: Fri, 29 Oct 2021 15:43:09 -0400
Message-Id: <20211029194311.17504-1-emilne@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Fix SCSI async abort handling when eh_deadline is active | expand

Message

Ewan Milne Oct. 29, 2021, 7:43 p.m. UTC

There is a code path in the SCSI async abort handling that can cause error
handling of subsequent scsi_cmnds to proceed immediately to host reset with no
other attempt at recovery.

This can be seen by the following:

    modprobe scsi_debug every_nth=10 opts=4
    echo 7 > /sys/module/scsi_mod/parameters/scsi_logging_level
    echo 10 > /sys/devices/pseudo_0/adapter0/host8/scsi_host/host<N>/eh_deadline

and performing I/O to the scsi_debug device, the host will get reset
because prior aborts succeeded, because ->last_reset does not get invalidated.

The patch series contains a fix, followed by a simplification
of the control flow to remove duplicate code.  Only the first patch
is Cc: stable as the second part doesn't qualify.  Yes, I know the
first patch is >100 lines, I couldn't make it smaller unfortunately.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>

v2:
    - Introduced scsi_eh_abort_cleanup() in patch 1/1 to factor out code
      (This is then removed in patch 2/2 since code refactoring results
       in only one place it is called though.)
    - Moved introduction of local "shost" to cleanup patch 2/2

Ewan D. Milne (2):
  scsi: core: avoid leaving shost->last_reset with stale value if EH
    does not run
  scsi: core: simplify control flow in scmd_eh_abort_handler()

 drivers/scsi/hosts.c      |  1 +
 drivers/scsi/scsi_error.c | 92 ++++++++++++++++++++++++++++++-----------------
 drivers/scsi/scsi_lib.c   |  1 +
 include/scsi/scsi_cmnd.h  |  2 +-
 include/scsi/scsi_host.h  |  1 +
 5 files changed, 63 insertions(+), 34 deletions(-)