diff mbox

sd: Don't treat succeeded SYNC as error

Message ID CAMGffEnAYXdYNfN7aHZHLjnhxY2T2V6m4tPNi4eT4kDuarrYxQ@mail.gmail.com (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Jinpu Wang April 25, 2016, 10:36 a.m. UTC
Hi, all

We hit IO error on fsync, it turns out was because sd treat succeeded
SYNC as error. From what I checked in SBC spec there is no indication
we should fail IO in this case, so we create this patch.


Best Regards,

Jack Wang

Comments

Bart Van Assche April 25, 2016, 8:28 p.m. UTC | #1
On 04/25/2016 03:36 AM, Jinpu Wang wrote:
> We hit IO error on fsync, it turns out was because sd treat succeeded
> SYNC as error. From what I checked in SBC spec there is no indication
> we should fail IO in this case, so we create this patch.

Please follow the rules in Documentation/SubmittingPatches and submit 
patches inline. Regarding the patch itself: why is the switch(op) 
needed? Can it be left out? And regarding (sshdr.asc == 0x2a && 
sshdr.ascq == 0x09): which other unit attentions should cause SCSI 
commands to succeed instead of to fail?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang April 26, 2016, 10:57 a.m. UTC | #2
On Mon, Apr 25, 2016 at 10:28 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 04/25/2016 03:36 AM, Jinpu Wang wrote:
>>
>> We hit IO error on fsync, it turns out was because sd treat succeeded
>> SYNC as error. From what I checked in SBC spec there is no indication
>> we should fail IO in this case, so we create this patch.
>
>
> Please follow the rules in Documentation/SubmittingPatches and submit
> patches inline. Regarding the patch itself: why is the switch(op) needed?
> Can it be left out? And regarding (sshdr.asc == 0x2a && sshdr.ascq == 0x09):
> which other unit attentions should cause SCSI commands to succeed instead of
> to fail?
>
> Bart.

Hi Bart,

Thanks for looking into this.
Sure, I can resubmit inline.
Because our test only cover this, unconditional treat other commands
without test may break other application.

At least for READ/WRITE operation, ML will requeue it because not all
bytes transfered:
[ 1794.236001] sd 1:0:0:0: Capacity data has changed
[ 1794.236141] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[ 1794.236381] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.236609] sd 1:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current]
[ 1794.236725] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Capacity data has changed
[ 1794.236839] sd 1:0:0:0: [sdb] tag#0 scsi host busy 1 failed 0
[ 1794.236981] sd 1:0:0:0: Notifying upper driver of completion (result 8000002)
[ 1794.237096] sd 1:0:0:0: [sdb] tag#0 sd_done: completed 0 of 512 bytes
[ 1794.237210] sd 1:0:0:0: [sdb] tag#0 1 sectors total, 0 bytes done.
[ 1794.237325] sd 1:0:0:0: [sdb] tag#0 Inserting command
ffff88022b9d4780 into mlqueue
[ 1794.238631] sd 1:0:0:0: [sdb] sd_open
[ 1794.238745] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.238864] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.238989] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.239107] sd 1:0:0:0: unblocking device at zero depth
[ 1794.239224] sd 1:0:0:0: [sdb] tag#0 Send: scmd 0xffff88022b9d4780
[ 1794.239338] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.239592] sd 1:0:0:0: [sdb] tag#1 Send: scmd 0xffff8802296af380
[ 1794.239675] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[ 1794.239677] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.239678] sd 1:0:0:0: [sdb] tag#0 scsi host busy 2 failed 0
[ 1794.239679] sd 1:0:0:0: Notifying upper driver of completion (result 0)

But for SYNC CACHE:

[   84.450906] sd 6:0:0:1: [sdc] sd_open
[   84.451008] sd 6:0:0:1: scsi_block_when_processing_errors: rtn: 1
[   84.451084] sd 6:0:0:1: [sdc] tag#1 Send: scmd 0xffff880210357080
[   84.451143] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[   84.456081] sd 6:0:0:1: [sdc] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[   84.456209] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 10
00 00 08 00
[   84.456295] sd 6:0:0:1: [sdc] tag#0 scsi host busy 2 failed 0
[   84.456374] sd 6:0:0:1: Notifying upper driver of completion (result 0)
[   84.456438] sd 6:0:0:1: [sdc] tag#0 sd_done: completed 4096 of 4096 bytes
[   84.456498] sd 6:0:0:1: [sdc] tag#0 8 sectors total, 4096 bytes done.
[   84.456595] sd 6:0:0:1: Capacity data has changed
[   84.456612] sd 6:0:0:1: [sdc] sd_setup_read_write_cmnd: block=280, count=8
[   84.456616] sd 6:0:0:1: [sdc] block=280
[   84.456620] sd 6:0:0:1: [sdc] reading 8/8 512 byte blocks.
[   84.456628] sd 6:0:0:1: [sdc] tag#0 Send: scmd 0xffff8800ca41ed80
[   84.456634] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 18
00 00 08 00
[   84.457003] sd 6:0:0:1: [sdc] tag#1 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[   84.457090] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[   84.457211] sd 6:0:0:1: [sdc] tag#1 Sense Key : Unit Attention [current]
[   84.457272] sd 6:0:0:1: [sdc] tag#1 Add. Sense: Capacity data has changed
[   84.457331] sd 6:0:0:1: [sdc] tag#1 scsi host busy 2 failed 0
[   84.457387] sd 6:0:0:1: Notifying upper driver of completion (result 8000002)
[   84.457445] sd 6:0:0:1: [sdc] tag#1 sd_done: completed 0 of 0 bytes
[   84.457503] sd 6:0:0:1: [sdc] tag#1 0 sectors total, 0 bytes done.
[   84.457562] blk_update_request: I/O error, dev sdc, sector 0

It completed with error directly.
We don't have failure case with our UA yet.

Thanks,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 5d1f72d9643ce61cd9f3d312377378c43f171d0c Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Mon, 25 Apr 2016 12:05:22 +0200
Subject: [PATCH] sd: Don't treat succeeded SYNC as error

We hit IO error in our production on multipath devices during resize
device on target side, the problem turns out sd driver passes up as IO
error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate
Capacity data has changed, even storage side sync the data properly.

In order to fix this check in sd_done, report success if condition
matches.

Sebastian Parschauer report/analyze the bug here:
https://sourceforge.net/p/scst/mailman/message/34953416/

Signed-off-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/sd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5a5457a..e9bfe01 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1833,6 +1833,19 @@  static int sd_done(struct scsi_cmnd *SCpnt)
 			}
 		}
 		break;
+	case UNIT_ATTENTION:
+		/* Capacity data has changed */
+		if (sshdr.asc == 0x2a && sshdr.ascq == 0x09) {
+			switch (op) {
+			/* don't treat succeeded fsync() as error */
+			case SYNCHRONIZE_CACHE:
+			case SYNCHRONIZE_CACHE_16:
+				if (good_bytes == scsi_bufflen(SCpnt))
+					SCpnt->result = 0;
+					break;
+			}
+		}
+		break;
 	default:
 		break;
 	}
-- 
1.9.1