[3/3] scsi: handle zone resources errors

Message ID	20200910073952.212130-4-damien.lemoal@wdc.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=aJuI=CT=vger.kernel.org=linux-scsi-owner@kernel.org> IronPort-SDR: rsKLbiL3Xt/EhJximl3DiRtC7V3ghNmxVEhwWOZJ0hmd3pz6HWLoFkj/B/snwShFFJOjQjHF2c 8W02w3aSrvBhHUoB2uUiYz1O23/HsAxsuNU77u5Kbrruz/HibT9DB5n0B95QQZ9SXfWMwQAgK+ CwInq8buHQPQW1k+18bhnWfyksfqmeHLQF8xC506c1DqeZQUlXZLkxOZQTDtcfdDe6l2u6xL1F vVf38imdUUAOhQKB5p03I/C3gqc/rHC/VBav+Uj3lqGk1oi6py5DjIwQ3QILSRPeqryOfAVH7O LX8= IronPort-SDR: MzOuoeJiXbR3gCwvDUb62sTfVbBIzMP3bmV0trqU3HB/D9J6saZcdv1FU56FCNU0Es+ac0VcU2 4zqyOGAbNxxQ== IronPort-SDR: yhhxweRy5zvO9e4ll1zBp+mWjoAl+ZvuLU4tNLOEcxWHXTAK6bFRi8U83STvw+cD+XVBj86AJP BkGn0/M/lxyQ== WDCIronportException: Internal From: Damien Le Moal <damien.lemoal@wdc.com> To: linux-scsi@vger.kernel.org, "Martin K . Petersen" <martin.petersen@oracle.com> Subject: [PATCH 3/3] scsi: handle zone resources errors Date: Thu, 10 Sep 2020 16:39:52 +0900 Message-Id: <20200910073952.212130-4-damien.lemoal@wdc.com> In-Reply-To: <20200910073952.212130-1-damien.lemoal@wdc.com> References: <20200910073952.212130-1-damien.lemoal@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	Improve error handling \| expand [0/3] Improve error handling [1/3] scsi: Cleanup scsi_noretry_cmd() [2/3] scsi: update additional sense codes list [3/3] scsi: handle zone resources errors

Message ID

20200910073952.212130-4-damien.lemoal@wdc.com (mailing list archive)

State

Superseded

Headers

IronPort-SDR: 
 rsKLbiL3Xt/EhJximl3DiRtC7V3ghNmxVEhwWOZJ0hmd3pz6HWLoFkj/B/snwShFFJOjQjHF2c
 8W02w3aSrvBhHUoB2uUiYz1O23/HsAxsuNU77u5Kbrruz/HibT9DB5n0B95QQZ9SXfWMwQAgK+
 CwInq8buHQPQW1k+18bhnWfyksfqmeHLQF8xC506c1DqeZQUlXZLkxOZQTDtcfdDe6l2u6xL1F
 vVf38imdUUAOhQKB5p03I/C3gqc/rHC/VBav+Uj3lqGk1oi6py5DjIwQ3QILSRPeqryOfAVH7O
 LX8=
IronPort-SDR: 
 MzOuoeJiXbR3gCwvDUb62sTfVbBIzMP3bmV0trqU3HB/D9J6saZcdv1FU56FCNU0Es+ac0VcU2
 4zqyOGAbNxxQ==
IronPort-SDR: 
 yhhxweRy5zvO9e4ll1zBp+mWjoAl+ZvuLU4tNLOEcxWHXTAK6bFRi8U83STvw+cD+XVBj86AJP
 BkGn0/M/lxyQ==
WDCIronportException: Internal
From: Damien Le Moal <damien.lemoal@wdc.com>
To: linux-scsi@vger.kernel.org,
        "Martin K . Petersen" <martin.petersen@oracle.com>
Subject: [PATCH 3/3] scsi: handle zone resources errors
Date: Thu, 10 Sep 2020 16:39:52 +0900
Message-Id: <20200910073952.212130-4-damien.lemoal@wdc.com>
In-Reply-To: <20200910073952.212130-1-damien.lemoal@wdc.com>
References: <20200910073952.212130-1-damien.lemoal@wdc.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk

Series

Improve error handling | expand

Commit Message

Damien Le Moal Sept. 10, 2020, 7:39 a.m. UTC

ZBC or ZAC disks that have a limit on the number of open zones may fail
a zone open command or a write to a zone that is not already implicitly
or explicitly open if the total number of open zones is already at the
maximum allowed.

For these operations, instead of returning the generic BLK_STS_IOERR,
return BLK_STS_DEV_RESOURCE which is returned as -EBUSY to the I/O
issuer, allowing the device user to act appropriately on these
relatively benign zone resource errors.

With this change the NVMe (ZNS) and sd drivers both return the same
error code for zone resource errors, facilitating the implementation of
IO error handling by the user with a common code base for both device
types.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/scsi_lib.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Damien Le Moal Sept. 10, 2020, 7:45 a.m. UTC | #1

On 2020/09/10 16:40, Damien Le Moal wrote:
> ZBC or ZAC disks that have a limit on the number of open zones may fail
> a zone open command or a write to a zone that is not already implicitly
> or explicitly open if the total number of open zones is already at the
> maximum allowed.
> 
> For these operations, instead of returning the generic BLK_STS_IOERR,
> return BLK_STS_DEV_RESOURCE which is returned as -EBUSY to the I/O
> issuer, allowing the device user to act appropriately on these
> relatively benign zone resource errors.
> 
> With this change the NVMe (ZNS) and sd drivers both return the same
> error code for zone resource errors, facilitating the implementation of
> IO error handling by the user with a common code base for both device
> types.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/scsi/scsi_lib.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 7c6dd6f75190..7eb4a80c3bbb 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -758,6 +758,18 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
>  			/* See SSC3rXX or current. */
>  			action = ACTION_FAIL;
>  			break;
> +		case DATA_PROTECT:
> +			sdev_printk(KERN_INFO, cmd->device,
> +				    "asc/ascq = 0x%02x 0x%02x\n",
> +				    sshdr.asc, sshdr.ascq);

Oops... Forgot to remove my debug message. Re-sending without it.

> +			action = ACTION_FAIL;
> +			if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) ||
> +			    (sshdr.asc == 0x55 &&
> +			     (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) {
> +				/* Insufficient zone resources */
> +				blk_stat = BLK_STS_DEV_RESOURCE;
> +			}
> +			break;
>  		default:
>  			action = ACTION_FAIL;
>  			break;
>

Christoph Hellwig Sept. 10, 2020, 5:53 p.m. UTC | #2

On Thu, Sep 10, 2020 at 04:39:52PM +0900, Damien Le Moal wrote:
> +		case DATA_PROTECT:
> +			sdev_printk(KERN_INFO, cmd->device,
> +				    "asc/ascq = 0x%02x 0x%02x\n",
> +				    sshdr.asc, sshdr.ascq);
> +			action = ACTION_FAIL;
> +			if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) ||
> +			    (sshdr.asc == 0x55 &&
> +			     (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) {
> +				/* Insufficient zone resources */
> +				blk_stat = BLK_STS_DEV_RESOURCE;

BLK_STS_DEV_RESOURCE is a magic error code leading to a retry on the
particular request_queue once it isn't busy any more.  Please don't
abuse it for random other conditions.

Damien Le Moal Sept. 10, 2020, 10:16 p.m. UTC | #3

On 2020/09/11 2:54, Christoph Hellwig wrote:
> On Thu, Sep 10, 2020 at 04:39:52PM +0900, Damien Le Moal wrote:
>> +		case DATA_PROTECT:
>> +			sdev_printk(KERN_INFO, cmd->device,
>> +				    "asc/ascq = 0x%02x 0x%02x\n",
>> +				    sshdr.asc, sshdr.ascq);
>> +			action = ACTION_FAIL;
>> +			if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) ||
>> +			    (sshdr.asc == 0x55 &&
>> +			     (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) {
>> +				/* Insufficient zone resources */
>> +				blk_stat = BLK_STS_DEV_RESOURCE;
> 
> BLK_STS_DEV_RESOURCE is a magic error code leading to a retry on the
> particular request_queue once it isn't busy any more.  Please don't
> abuse it for random other conditions.

Yes, but that is for the submission path, isn't it ? This change is in the
completion path and action is set to ACTION_FAIL, so the request is terminated
right away without any retry (tested). More importantly, this leads to the block
layer returning -EBUSY which allows the user to differentiate this
temporary/trivial error from the potentially more serious -EIO.

Keith sent a patch for NVMe ZNS doing something similar, which will result in
the block layer returning -EBUSY for zone resource errors. I would like to unify
scsi and nvme behavior for these recoverable zone resource errors.

So should we define a new BLK_STS_BUSYERR status to differentiate from the
default BLK_STS_IOERR and not overload BLK_STS_DEV_RESOURCE (or
BLK_STS_ZONE_RESOURCE) ?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7c6dd6f75190..7eb4a80c3bbb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -758,6 +758,18 @@  static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
 			/* See SSC3rXX or current. */
 			action = ACTION_FAIL;
 			break;
+		case DATA_PROTECT:
+			sdev_printk(KERN_INFO, cmd->device,
+				    "asc/ascq = 0x%02x 0x%02x\n",
+				    sshdr.asc, sshdr.ascq);
+			action = ACTION_FAIL;
+			if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) ||
+			    (sshdr.asc == 0x55 &&
+			     (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) {
+				/* Insufficient zone resources */
+				blk_stat = BLK_STS_DEV_RESOURCE;
+			}
+			break;
 		default:
 			action = ACTION_FAIL;
 			break;

[3/3] scsi: handle zone resources errors

Commit Message

Comments

Patch