Message ID | 20200910073952.212130-4-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Improve error handling | expand |
On 2020/09/10 16:40, Damien Le Moal wrote: > ZBC or ZAC disks that have a limit on the number of open zones may fail > a zone open command or a write to a zone that is not already implicitly > or explicitly open if the total number of open zones is already at the > maximum allowed. > > For these operations, instead of returning the generic BLK_STS_IOERR, > return BLK_STS_DEV_RESOURCE which is returned as -EBUSY to the I/O > issuer, allowing the device user to act appropriately on these > relatively benign zone resource errors. > > With this change the NVMe (ZNS) and sd drivers both return the same > error code for zone resource errors, facilitating the implementation of > IO error handling by the user with a common code base for both device > types. > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > drivers/scsi/scsi_lib.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 7c6dd6f75190..7eb4a80c3bbb 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -758,6 +758,18 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) > /* See SSC3rXX or current. */ > action = ACTION_FAIL; > break; > + case DATA_PROTECT: > + sdev_printk(KERN_INFO, cmd->device, > + "asc/ascq = 0x%02x 0x%02x\n", > + sshdr.asc, sshdr.ascq); Oops... Forgot to remove my debug message. Re-sending without it. > + action = ACTION_FAIL; > + if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) || > + (sshdr.asc == 0x55 && > + (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) { > + /* Insufficient zone resources */ > + blk_stat = BLK_STS_DEV_RESOURCE; > + } > + break; > default: > action = ACTION_FAIL; > break; >
On Thu, Sep 10, 2020 at 04:39:52PM +0900, Damien Le Moal wrote: > + case DATA_PROTECT: > + sdev_printk(KERN_INFO, cmd->device, > + "asc/ascq = 0x%02x 0x%02x\n", > + sshdr.asc, sshdr.ascq); > + action = ACTION_FAIL; > + if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) || > + (sshdr.asc == 0x55 && > + (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) { > + /* Insufficient zone resources */ > + blk_stat = BLK_STS_DEV_RESOURCE; BLK_STS_DEV_RESOURCE is a magic error code leading to a retry on the particular request_queue once it isn't busy any more. Please don't abuse it for random other conditions.
On 2020/09/11 2:54, Christoph Hellwig wrote: > On Thu, Sep 10, 2020 at 04:39:52PM +0900, Damien Le Moal wrote: >> + case DATA_PROTECT: >> + sdev_printk(KERN_INFO, cmd->device, >> + "asc/ascq = 0x%02x 0x%02x\n", >> + sshdr.asc, sshdr.ascq); >> + action = ACTION_FAIL; >> + if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) || >> + (sshdr.asc == 0x55 && >> + (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) { >> + /* Insufficient zone resources */ >> + blk_stat = BLK_STS_DEV_RESOURCE; > > BLK_STS_DEV_RESOURCE is a magic error code leading to a retry on the > particular request_queue once it isn't busy any more. Please don't > abuse it for random other conditions. Yes, but that is for the submission path, isn't it ? This change is in the completion path and action is set to ACTION_FAIL, so the request is terminated right away without any retry (tested). More importantly, this leads to the block layer returning -EBUSY which allows the user to differentiate this temporary/trivial error from the potentially more serious -EIO. Keith sent a patch for NVMe ZNS doing something similar, which will result in the block layer returning -EBUSY for zone resource errors. I would like to unify scsi and nvme behavior for these recoverable zone resource errors. So should we define a new BLK_STS_BUSYERR status to differentiate from the default BLK_STS_IOERR and not overload BLK_STS_DEV_RESOURCE (or BLK_STS_ZONE_RESOURCE) ?
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 7c6dd6f75190..7eb4a80c3bbb 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -758,6 +758,18 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) /* See SSC3rXX or current. */ action = ACTION_FAIL; break; + case DATA_PROTECT: + sdev_printk(KERN_INFO, cmd->device, + "asc/ascq = 0x%02x 0x%02x\n", + sshdr.asc, sshdr.ascq); + action = ACTION_FAIL; + if ((sshdr.asc == 0x0C && sshdr.ascq == 0x12) || + (sshdr.asc == 0x55 && + (sshdr.ascq == 0x0E || sshdr.ascq == 0x0F))) { + /* Insufficient zone resources */ + blk_stat = BLK_STS_DEV_RESOURCE; + } + break; default: action = ACTION_FAIL; break;
ZBC or ZAC disks that have a limit on the number of open zones may fail a zone open command or a write to a zone that is not already implicitly or explicitly open if the total number of open zones is already at the maximum allowed. For these operations, instead of returning the generic BLK_STS_IOERR, return BLK_STS_DEV_RESOURCE which is returned as -EBUSY to the I/O issuer, allowing the device user to act appropriately on these relatively benign zone resource errors. With this change the NVMe (ZNS) and sd drivers both return the same error code for zone resource errors, facilitating the implementation of IO error handling by the user with a common code base for both device types. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- drivers/scsi/scsi_lib.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)