Message ID | 20220302053559.32147-11-martin.petersen@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [01/14] scsi: mpt3sas: Use cached ATA Information VPD page | expand |
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 3/1/22 21:35, Martin K. Petersen wrote: > + if (mode == SD_ZERO_DEFAULT) > + sdkp->zeroing_override = false; > + else > + sdkp->zeroing_override = true; Hmm ... since sd_config_write_zeroes() special-cases mode == SD_ZERO_DEFAULT, does the value of zeroing_override matter if mode == SD_ZERO_DEFAULT? > +static void sd_config_write_zeroes(struct scsi_disk *sdkp, > + enum sd_zeroing_mode mode) > +{ > + struct request_queue *q = sdkp->disk->queue; > + unsigned int logical_block_size = sdkp->device->sector_size; > + > + if (mode == SD_ZERO_DEFAULT && !sdkp->zeroing_override) { > + if (sdkp->lbprz && sdkp->lbpws) > + mode = SD_ZERO_WS16_UNMAP; > + else if (sdkp->lbprz && sdkp->lbpws10) > + mode = SD_ZERO_WS10_UNMAP; > + else if (sdkp->max_ws_blocks) > + mode = SD_ZERO_WS; > + else > + mode = SD_ZERO_WRITE; > + } > + > + if (mode == SD_ZERO_DISABLE) > + sdkp->zeroing_override = true; What does "zeroing_override" mean? I would expect if mode == SD_ZERO_DISABLE that that choice from the user is not overridden. Thanks, Bart.
On Wed, Mar 02, 2022 at 12:35:55AM -0500, Martin K. Petersen wrote: > In preparation for adding support for the WRITE SAME(16) NDOB flag, > move configuration of the WRITE_ZEROES operation to a separate > function. This is done to facilitate fetching all VPD pages before > choosing the appropriate zeroing method for a given device. > > The deferred configuration also allows us to mirror the discard > behavior and permit the user to revert a device to the kernel default > configuration by echoing "default" to the sysfs file. > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > drivers/scsi/sd.c | 56 +++++++++++++++++++++++++++++++++-------------- > drivers/scsi/sd.h | 7 ++++-- > 2 files changed, 44 insertions(+), 19 deletions(-) > Hello Martin, somehow this patch triggers a regression on s390x with zFCP in `next-20220714`. In our daily regression test suite a simple: # mkfs.ext4 -F /dev/mapper/mpathc1 causes the block layer to trip with this when trying to discard blocks (at least that's my assumption about what its trying to do) from a SCSI disk: [ 33.042224] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 33.042228] device-mapper: multipath: 251:0: Failing path 8:0. [ 33.042239] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 33.042267] device-mapper: multipath: 251:0: Failing path 8:64. [ 33.197329] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 33.198850] device-mapper: multipath: 251:0: Reinstating path 8:64. [ 33.210742] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 33.210752] device-mapper: multipath: 251:0: Failing path 8:0. [ 33.210771] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 33.210792] device-mapper: multipath: 251:0: Failing path 8:64. [ 38.200929] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 38.201489] device-mapper: multipath: 251:0: Reinstating path 8:64. [ 38.220039] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 38.220045] device-mapper: multipath: 251:0: Failing path 8:0. [ 38.220056] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 38.220060] device-mapper: multipath: 251:0: Failing path 8:64. [ 43.202538] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 43.203015] device-mapper: multipath: 251:0: Reinstating path 8:64. [ 43.219877] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 43.219881] device-mapper: multipath: 251:0: Failing path 8:0. [ 43.219889] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 43.219892] device-mapper: multipath: 251:0: Failing path 8:64. [ 48.204035] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 48.204526] device-mapper: multipath: 251:0: Reinstating path 8:64. [ 48.219951] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 48.219964] device-mapper: multipath: 251:0: Failing path 8:0. [ 48.219990] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 48.219996] device-mapper: multipath: 251:0: Failing path 8:64. [ 53.205456] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 53.206950] device-mapper: multipath: 251:0: Reinstating path 8:64. [ 53.219820] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 53.219824] device-mapper: multipath: 251:0: Failing path 8:0. [ 53.219834] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 53.219837] device-mapper: multipath: 251:0: Failing path 8:64. [ 58.209693] device-mapper: multipath: 251:0: Reinstating path 8:0. [ 58.210143] device-mapper: multipath: 251:0: Reinstating path 8:64. This continues ad infinitum. I suspected this patchset as it's new in next-20220714, and next-20220704 didn't have this bug in our regression runs. I didn't see any other 'obvious' patch in scsi or block that has a diff between those two tags. I started bisecting between: 9821106213c8 ("scsi: zfcp: Drop redundant "the" in the comments") as good, and f095c3cd1b69 ("scsi: qla2xxx: Update version to 10.02.07.800-k") as bad; and ended up at: 1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function") I ran this on Fedora 36, with mkfs.ext4 1.46.5 (30-Dec-2021). The multipath device: create: mpathc (36005076307ffc5e3000000000000805c) dm-0 IBM,2107900 size=20G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 0:0:0:1079787648 sda 8:0 active ready running `- 1:0:0:1079787648 sde 8:64 active ready running Some information on the block devices and topology: # lsblk /dev/sda /dev/sde NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 20G 0 disk └─mpathc 251:0 0 20G 0 mpath └─mpathc1 251:2 0 20G 0 part sde 8:64 0 20G 0 disk └─mpathc 251:0 0 20G 0 mpath └─mpathc1 251:2 0 20G 0 part # lsblk -t /dev/sda /dev/sde NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME sda 0 512 0 512 512 1 bfq 256 512 0B └─mpathc 0 512 0 512 512 1 mq-deadline 256 128 0B └─mpathc1 0 512 0 512 512 1 128 128 0B sde 0 512 0 512 512 1 bfq 256 512 0B └─mpathc 0 512 0 512 512 1 mq-deadline 256 128 0B └─mpathc1 0 512 0 512 512 1 128 128 0B # lsblk -D /dev/sda /dev/sde NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 1G 4G 0 └─mpathc 0 1G 4G 0 └─mpathc1 0 1G 4G 0 sde 0 1G 4G 0 └─mpathc 0 1G 4G 0 └─mpathc1 0 1G 4G 0 # lsblk -S /dev/sda /dev/sde NAME HCTL TYPE VENDOR MODEL REV SERIAL TRAN sda 0:0:0:1079787648 disk IBM 2107900 1060 75DL241805C fc sde 1:0:0:1079787648 disk IBM 2107900 1060 75DL241805C fc Any idea why this is happening? In case you need more details, this reproduces very reliably here.
Benjamin, > # lsblk -D /dev/sda /dev/sde > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO > sda 0 1G 4G 0 > └─mpathc 0 1G 4G 0 > └─mpathc1 0 1G 4G 0 > sde 0 1G 4G 0 > └─mpathc 0 1G 4G 0 > └─mpathc1 0 1G 4G 0 > # lsblk -S /dev/sda /dev/sde > NAME HCTL TYPE VENDOR MODEL REV SERIAL TRAN > sda 0:0:0:1079787648 disk IBM 2107900 1060 75DL241805C fc > sde 1:0:0:1079787648 disk IBM 2107900 1060 75DL241805C fc > > Any idea why this is happening? In case you need more details, this > reproduces very reliably here. Are the reported discard limits different on kernels predating that commit? Please send the output of: # grep . /sys/block/sdN/queue/discard_* /sys/block/sdN/device/scsi_disk/*/*_mode # sg_readcap -l /dev/sdN # sg_vpg -p bl /dev/sdN # sg_vpg -p lbpv /dev/sdN Ideally (for the grep) before and after the offending commit. Thanks!
On Mon, Jul 18, 2022 at 10:23:26PM -0400, Martin K. Petersen wrote: > Please send the output of: > > # grep . /sys/block/sdN/queue/discard_* /sys/block/sdN/device/scsi_disk/*/*_mode > # sg_readcap -l /dev/sdN > # sg_vpg -p bl /dev/sdN > # sg_vpg -p lbpv /dev/sdN > > Ideally (for the grep) before and after the offending commit. Sure, I assume with `sg_vpg` you mean `sg_vpd`. 1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function") --------------------------------------------------------------------------------- This is the first bad commit # lsblk -s /dev/mapper/mpathe1 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS mpathe1 251:5 0 20G 0 part └─mpathe 251:2 0 20G 0 mpath ├─sde 8:64 0 20G 0 disk └─sdi 8:128 0 20G 0 disk # ll /dev/mapper/{mpathe1,mpathe} lrwxrwxrwx. 1 root root 7 Jul 19 12:52 /dev/mapper/mpathe -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 19 12:52 /dev/mapper/mpathe1 -> ../dm-5 # lsblk -st /dev/mapper/mpathe1 NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME mpathe1 0 512 0 512 512 1 128 128 0B └─mpathe 0 512 0 512 512 1 mq-deadline 256 128 0B ├─sde 0 512 0 512 512 1 bfq 256 512 0B └─sdi 0 512 0 512 512 1 bfq 256 512 0B # lsblk -sD /dev/mapper/mpathe1 NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO mpathe1 0 1G 32M 0 └─mpathe 0 1G 32M 0 ├─sde 0 1G 32M 0 └─sdi 0 1G 32M 0 # grep -H . /sys/block/{sde,sdi,dm-2,dm-5}/queue/discard_* /sys/block/{sde,sdi}/device/scsi_disk/*/*_mode /sys/block/sde/queue/discard_granularity:1073741824 /sys/block/sde/queue/discard_max_bytes:33553920 /sys/block/sde/queue/discard_max_hw_bytes:33553920 /sys/block/sde/queue/discard_zeroes_data:0 /sys/block/sdi/queue/discard_granularity:1073741824 /sys/block/sdi/queue/discard_max_bytes:33553920 /sys/block/sdi/queue/discard_max_hw_bytes:33553920 /sys/block/sdi/queue/discard_zeroes_data:0 /sys/block/dm-2/queue/discard_granularity:1073741824 /sys/block/dm-2/queue/discard_max_bytes:33553920 /sys/block/dm-2/queue/discard_max_hw_bytes:33553920 /sys/block/dm-2/queue/discard_zeroes_data:0 /sys/block/dm-5/queue/discard_granularity:1073741824 /sys/block/dm-5/queue/discard_max_bytes:33553920 /sys/block/dm-5/queue/discard_max_hw_bytes:33553920 /sys/block/dm-5/queue/discard_zeroes_data:0 /sys/block/sde/device/scsi_disk/2:0:0:1083719810/protection_mode:none /sys/block/sde/device/scsi_disk/2:0:0:1083719810/provisioning_mode:writesame_16 /sys/block/sde/device/scsi_disk/2:0:0:1083719810/zeroing_mode:writesame_16_unmap /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/protection_mode:none /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/provisioning_mode:writesame_16 /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/zeroing_mode:writesame_16_unmap # sg_readcap -l /dev/sde Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=1, lbprz=1 Last LBA=41943039 (0x27fffff), Number of logical blocks=41943040 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 21474836480 bytes, 20480.0 MiB, 21.47 GB # dmesg | tail -n 5 [ 111.308428] sd 2:0:0:1083719810: [sde] tag#2053 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s [ 111.308438] sd 2:0:0:1083719810: [sde] tag#2053 CDB: Inquiry 12 01 b9 00 04 00 [ 111.308441] sd 2:0:0:1083719810: [sde] tag#2053 Sense Key : Illegal Request [current] [ 111.308444] sd 2:0:0:1083719810: [sde] tag#2053 Add. Sense: Invalid field in cdb [ 111.311099] sde: sde1 # sg_readcap -l /dev/sdi Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=1, lbprz=1 Last LBA=41943039 (0x27fffff), Number of logical blocks=41943040 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 21474836480 bytes, 20480.0 MiB, 21.47 GB # dmesg | tail -n 5 [ 125.621343] sd 3:0:0:1083719810: [sdi] tag#2325 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s [ 125.621352] sd 3:0:0:1083719810: [sdi] tag#2325 CDB: Inquiry 12 01 b9 00 04 00 [ 125.621355] sd 3:0:0:1083719810: [sdi] tag#2325 Sense Key : Illegal Request [current] [ 125.621358] sd 3:0:0:1083719810: [sdi] tag#2325 Add. Sense: Invalid field in cdb [ 125.623898] sdi: sdi1 # sg_vpd -p bl /dev/sde Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 1 blocks Optimal transfer length granularity: 0 blocks [not reported] Maximum transfer length: 0 blocks [not reported] Optimal transfer length: 0 blocks [not reported] Maximum prefetch transfer length: 0 blocks [ignored] Maximum unmap LBA count: -1 [unbounded] Maximum unmap block descriptor count: 0 [Unmap command not implemented] Optimal unmap granularity: 2097152 blocks Unmap granularity alignment valid: true Unmap granularity alignment: 0 Maximum write same length: 0 blocks [not reported] Maximum atomic transfer length: 0 blocks [not reported] Atomic alignment: 0 [unaligned atomic writes permitted] Atomic transfer length granularity: 0 [no granularity requirement Maximum atomic transfer length with atomic boundary: 0 blocks [not reported] Maximum atomic boundary size: 0 blocks [can only write atomic 1 block] # sg_vpd -p lbpv /dev/sde Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 0 Write same (16) with unmap bit supported (LBPWS): 1 Write same (10) with unmap bit supported (LBPWS10): 0 Logical block provisioning read zeros (LBPRZ): 0 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 21 Descriptor present (DP): 0 Minimum percentage: 0 [not reported] Provisioning type: 0 (not known or fully provisioned) Threshold percentage: 0 [percentages not supported] # sg_vpd -p bl /dev/sdi Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 1 blocks Optimal transfer length granularity: 0 blocks [not reported] Maximum transfer length: 0 blocks [not reported] Optimal transfer length: 0 blocks [not reported] Maximum prefetch transfer length: 0 blocks [ignored] Maximum unmap LBA count: -1 [unbounded] Maximum unmap block descriptor count: 0 [Unmap command not implemented] Optimal unmap granularity: 2097152 blocks Unmap granularity alignment valid: true Unmap granularity alignment: 0 Maximum write same length: 0 blocks [not reported] Maximum atomic transfer length: 0 blocks [not reported] Atomic alignment: 0 [unaligned atomic writes permitted] Atomic transfer length granularity: 0 [no granularity requirement Maximum atomic transfer length with atomic boundary: 0 blocks [not reported] Maximum atomic boundary size: 0 blocks [can only write atomic 1 block] # sg_vpd -p lbpv /dev/sdi Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 0 Write same (16) with unmap bit supported (LBPWS): 1 Write same (10) with unmap bit supported (LBPWS10): 0 Logical block provisioning read zeros (LBPRZ): 0 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 21 Descriptor present (DP): 0 Minimum percentage: 0 [not reported] Provisioning type: 0 (not known or fully provisioned) Threshold percentage: 0 [percentages not supported] # mkfs.ext4 -F /dev/mapper/mpathe1 ... [ 307.192885] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 307.192892] device-mapper: multipath: 251:2: Failing path 8:128. [ 307.192938] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 307.192941] device-mapper: multipath: 251:2: Failing path 8:64. [ 311.548555] device-mapper: multipath: 251:2: Reinstating path 8:128. [ 311.548883] device-mapper: multipath: 251:2: Reinstating path 8:64. [ 311.562499] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 311.562521] device-mapper: multipath: 251:2: Failing path 8:128. [ 311.562553] blk_insert_cloned_request: over max size limit. (4194304 > 65535) [ 311.562557] device-mapper: multipath: 251:2: Failing path 8:64. ... 5be0f08e9d95 ("scsi: sd: Fix discard errors during revalidate") --------------------------------------------------------------- This is the last good commit # lsblk -s /dev/mapper/mpathe1 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS mpathe1 251:6 0 20G 0 part └─mpathe 251:2 0 20G 0 mpath ├─sde 8:64 0 20G 0 disk └─sdf 8:80 0 20G 0 disk # ll /dev/mapper/{mpathe1,mpathe} lrwxrwxrwx. 1 root root 7 Jul 19 12:29 /dev/mapper/mpathe -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 19 12:37 /dev/mapper/mpathe1 -> ../dm-6 # lsblk -st /dev/mapper/mpathe1 NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME mpathe1 0 512 0 512 512 1 128 128 0B └─mpathe 0 512 0 512 512 1 mq-deadline 256 128 0B ├─sde 0 512 0 512 512 1 bfq 256 512 0B └─sdf 0 512 0 512 512 1 bfq 256 512 0B # lsblk -sD /dev/mapper/mpathe1 NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO mpathe1 0 1G 4G 0 └─mpathe 0 1G 4G 0 ├─sde 0 1G 4G 0 └─sdf 0 1G 4G 0 # grep -H . /sys/block/{sde,sdf,dm-2,dm-6}/queue/discard_* /sys/block/{sde,sdf}/device/scsi_disk/*/*_mode /sys/block/sde/queue/discard_granularity:1073741824 /sys/block/sde/queue/discard_max_bytes:4294966784 /sys/block/sde/queue/discard_max_hw_bytes:4294966784 /sys/block/sde/queue/discard_zeroes_data:0 /sys/block/sdf/queue/discard_granularity:1073741824 /sys/block/sdf/queue/discard_max_bytes:4294966784 /sys/block/sdf/queue/discard_max_hw_bytes:4294966784 /sys/block/sdf/queue/discard_zeroes_data:0 /sys/block/dm-2/queue/discard_granularity:1073741824 /sys/block/dm-2/queue/discard_max_bytes:4294966784 /sys/block/dm-2/queue/discard_max_hw_bytes:4294966784 /sys/block/dm-2/queue/discard_zeroes_data:0 /sys/block/dm-6/queue/discard_granularity:1073741824 /sys/block/dm-6/queue/discard_max_bytes:4294966784 /sys/block/dm-6/queue/discard_max_hw_bytes:4294966784 /sys/block/dm-6/queue/discard_zeroes_data:0 /sys/block/sde/device/scsi_disk/2:0:0:1083719810/protection_mode:none /sys/block/sde/device/scsi_disk/2:0:0:1083719810/provisioning_mode:writesame_16 /sys/block/sde/device/scsi_disk/2:0:0:1083719810/zeroing_mode:writesame_16_unmap /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/protection_mode:none /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/provisioning_mode:writesame_16 /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/zeroing_mode:writesame_16_unmap # mkfs.ext4 -F /dev/mapper/mpathe1 mke2fs 1.46.5 (30-Dec-2021) Discarding device blocks: done Creating filesystem with 5242368 4k blocks and 1310720 inodes Filesystem UUID: 5d0dc4c2-445c-4a90-aaa1-0998459497c5 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This is a IBM DS8870 (first announced in 2012): https://www.ibm.com/common/ssi/rep_sm/4/877/ENUS2424-_h04/index.html This is one of the oldest storage boxes we have right now, and this regression it doesn't seem to happen on newer models as far as I can see.
Benjamin, > This is one of the oldest storage boxes we have right now, and this > regression it doesn't seem to happen on newer models as far as I can > see. I have not had much luck reproducing your results today despite reporting the same parameters in the VPD pages as your device. I would appreciate if you send me the output of: # grep . /sys/block/{sd,dm}*/queue/write_zeroes_max_bytes /sys/block/sd*/device/scsi_disk/*/max_write_same_blocks for the failing configuration. Thanks!
On Wed, Jul 20, 2022 at 10:29:06PM -0400, Martin K. Petersen wrote: > > This is one of the oldest storage boxes we have right now, and this > > regression it doesn't seem to happen on newer models as far as I can > > see. > > I have not had much luck reproducing your results today despite > reporting the same parameters in the VPD pages as your device. > > I would appreciate if you send me the output of: > > # grep . /sys/block/{sd,dm}*/queue/write_zeroes_max_bytes /sys/block/sd*/device/scsi_disk/*/max_write_same_blocks > > for the failing configuration. Again at commit 1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function") The device names keep changing like that because these are CI systems that are reset and re-installed every night, and I don't get the same system every day; just in case this looks strange. # lsblk -s -o '+KNAME' /dev/mapper/mpathd1 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS KNAME mpathd1 251:2 0 20G 0 part dm-2 └─mpathd 251:0 0 20G 0 mpath dm-0 ├─sda 8:0 0 20G 0 disk sda └─sde 8:64 0 20G 0 disk sde # grep . /sys/block/{sda,sde,dm-0,dm-2}/queue/write_zeroes_max_bytes /sys/block/{sda,sde}/device/scsi_disk/*/max_write_same_blocks /sys/block/sda/queue/write_zeroes_max_bytes:33553920 /sys/block/sde/queue/write_zeroes_max_bytes:33553920 /sys/block/dm-0/queue/write_zeroes_max_bytes:33553920 /sys/block/dm-2/queue/write_zeroes_max_bytes:33553920 /sys/block/sda/device/scsi_disk/0:0:0:1079787648/max_write_same_blocks:65535 /sys/block/sde/device/scsi_disk/1:0:0:1079787648/max_write_same_blocks:65535 # for fl in /sys/block/sda/device/scsi_disk/*/device/inquiry /sys/block/sda/device/scsi_disk/*/device/vpd_*; do (set -x; xxd "${fl}"); done + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/inquiry 00000000: 0000 0532 9f10 1002 4942 4d20 2020 2020 ...2....IBM 00000010: 3231 3037 3930 3020 2020 2020 2020 2020 2107900 00000020: 3130 3630 3735 444c 3234 3138 3035 4320 106075DL241805C 00000030: 2020 2020 2020 2020 0000 0060 0da0 0a00 ...`.... 00000040: 0300 0320 0000 0000 0000 0000 0000 0000 ... ............ 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000060: 0101 3037 3500 3034 3731 3400 0000 8800 ..075.04714..... 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000000a0: 0002 0800 .... + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg0 00000000: 0000 000a 0080 8386 b0b1 b2c0 c1c2 .............. + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg80 00000000: 0080 0010 3735 444c 3234 3138 3035 4320 ....75DL241805C 00000010: 2020 2020 + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg83 00000000: 0083 0024 0103 0010 6005 0763 07ff c5e3 ...$....`..c.... 00000010: 0000 0000 0000 805c 0114 0004 0000 0101 .......\........ 00000020: 0115 0004 0000 0000 ........ + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb0 00000000: 00b0 003c 0001 0000 0000 0000 0000 0000 ...<............ 00000010: 0000 0000 ffff ffff 0000 0000 0020 0000 ............. .. 00000020: 8000 0000 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb1 00000000: 00b1 003c 1c20 0000 0000 0000 0000 0000 ...<. .......... 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb2 00000000: 00b2 0004 1540 0000 .....@.. Maybe this also helps, I took a snapshot of dmesg when the devices are sensed: ... [ 3.228175] SCSI subsystem initialized ... [ 3.942878] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. [ 3.942912] device-mapper: uevent: version 1.0.3 [ 3.942981] device-mapper: ioctl: 4.46.0-ioctl (2022-02-22) initialised: dm-devel@redhat.com ... [ 31.563828] zfcp 0.0.1900: qdio: ZFCP on SC 15b using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W [ 32.688192] scsi host0: scsi_eh_0: sleeping [ 32.688238] scsi host0: zfcp ... [ 32.725595] scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36 [ 32.725921] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0 [ 32.725931] scsi 0:0:0:0: scsi scan: INQUIRY pass 2 length 164 [ 32.726133] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0 [ 32.726139] scsi 0:0:0:0: scsi scan: peripheral device type of 31, no device added [ 32.726560] scsi 0:0:0:0: scsi scan: Sending REPORT LUNS to (try 0) [ 32.727220] scsi 0:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0 [ 32.727222] scsi 0:0:0:0: scsi scan: REPORT LUN scan [ 32.727475] scsi 0:0:0:1079787648: scsi scan: INQUIRY pass 1 length 36 [ 32.727717] scsi 0:0:0:1079787648: scsi scan: INQUIRY successful with code 0x0 [ 32.727723] scsi 0:0:0:1079787648: scsi scan: INQUIRY pass 2 length 164 [ 32.727927] scsi 0:0:0:1079787648: scsi scan: INQUIRY successful with code 0x0 [ 32.727935] scsi 0:0:0:1079787648: Direct-Access IBM 2107900 1060 PQ: 0 ANSI: 5 [ 32.729982] scsi 0:0:0:1079787648: sg_alloc: dev=0 [ 32.730027] sd 0:0:0:1079787648: Attached scsi generic sg0 type 0 ... [ 32.730502] sd 0:0:0:1079787648: Power-on or device reset occurred [ 32.730508] sd 0:0:0:1079787648: [sda] tag#2176 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [ 32.730511] sd 0:0:0:1079787648: [sda] tag#2176 CDB: Test Unit Ready 00 00 00 00 00 00 [ 32.730514] sd 0:0:0:1079787648: [sda] tag#2176 Sense Key : Unit Attention [current] [ 32.730516] sd 0:0:0:1079787648: [sda] tag#2176 Add. Sense: Power on, reset, or bus device reset occurred ... [ 32.730774] sd 0:0:0:1079787648: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB) [ 32.730928] sd 0:0:0:1079787648: [sda] Write Protect is off [ 32.730930] sd 0:0:0:1079787648: [sda] Mode Sense: ed 00 00 08 [ 32.731243] sd 0:0:0:1079787648: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 32.731376] sd 0:0:0:1079787648: [sda] tag#3781 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s [ 32.731379] sd 0:0:0:1079787648: [sda] tag#3781 CDB: Report supported operation codes a3 0c 01 12 00 00 00 00 00 0a 00 00 [ 32.731381] sd 0:0:0:1079787648: [sda] tag#3781 Sense Key : Illegal Request [current] [ 32.731383] sd 0:0:0:1079787648: [sda] tag#3781 Add. Sense: Invalid field in cdb ... [ 32.734915] sd 0:0:0:1079787648: [sda] tag#3784 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s [ 32.734917] sd 0:0:0:1079787648: [sda] tag#3784 CDB: Inquiry 12 01 b9 00 04 00 [ 32.734919] sd 0:0:0:1079787648: [sda] tag#3784 Sense Key : Illegal Request [current] [ 32.734921] sd 0:0:0:1079787648: [sda] tag#3784 Add. Sense: Invalid field in cdb ... [ 32.737032] sda: sda1 ... [ 32.737185] sd 0:0:0:1079787648: [sda] Attached SCSI disk ... [ 37.294939] alua: device handler registered [ 37.296390] emc: device handler registered [ 37.297819] rdac: device handler registered [ 37.351199] device-mapper: multipath service-time: version 0.3.0 loaded [ 37.351424] sd 0:0:0:1079787648: alua: supports implicit TPGS [ 37.351427] sd 0:0:0:1079787648: alua: device naa.6005076307ffc5e3000000000000805c port group 0 rel port 101 ... [ 37.378201] sd 0:0:0:1079787648: alua: transition timeout set to 60 seconds [ 37.378204] sd 0:0:0:1079787648: alua: port group 00 state A preferred supports tolusnA ... Thanks for the support Martin.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index eae5c81ae515..ee4f4aea5f0f 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -101,6 +101,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC); #define SD_MINORS 16 static void sd_config_discard(struct scsi_disk *, enum sd_lbp_mode); +static void sd_config_write_zeroes(struct scsi_disk *, enum sd_zeroing_mode); static void sd_config_write_same(struct scsi_disk *); static int sd_revalidate_disk(struct gendisk *); static void sd_unlock_native_capacity(struct gendisk *disk); @@ -428,10 +429,12 @@ static DEVICE_ATTR_RW(provisioning_mode); /* sysfs_match_string() requires dense arrays */ static const char *zeroing_mode[] = { + [SD_ZERO_DEFAULT] = "default", [SD_ZERO_WRITE] = "write", [SD_ZERO_WS] = "writesame", [SD_ZERO_WS16_UNMAP] = "writesame_16_unmap", [SD_ZERO_WS10_UNMAP] = "writesame_10_unmap", + [SD_ZERO_DISABLE] = "disabled", }; static ssize_t @@ -457,7 +460,12 @@ zeroing_mode_store(struct device *dev, struct device_attribute *attr, if (mode < 0) return -EINVAL; - sdkp->zeroing_mode = mode; + if (mode == SD_ZERO_DEFAULT) + sdkp->zeroing_override = false; + else + sdkp->zeroing_override = true; + + sd_config_write_zeroes(sdkp, mode); return count; } @@ -1049,6 +1057,31 @@ static blk_status_t sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, return scsi_alloc_sgtables(cmd); } +static void sd_config_write_zeroes(struct scsi_disk *sdkp, + enum sd_zeroing_mode mode) +{ + struct request_queue *q = sdkp->disk->queue; + unsigned int logical_block_size = sdkp->device->sector_size; + + if (mode == SD_ZERO_DEFAULT && !sdkp->zeroing_override) { + if (sdkp->lbprz && sdkp->lbpws) + mode = SD_ZERO_WS16_UNMAP; + else if (sdkp->lbprz && sdkp->lbpws10) + mode = SD_ZERO_WS10_UNMAP; + else if (sdkp->max_ws_blocks) + mode = SD_ZERO_WS; + else + mode = SD_ZERO_WRITE; + } + + if (mode == SD_ZERO_DISABLE) + sdkp->zeroing_override = true; + + sdkp->zeroing_mode = mode; + blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks * + (logical_block_size >> 9)); +} + static blk_status_t sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd) { struct request *rq = scsi_cmd_to_rq(cmd); @@ -1079,12 +1112,11 @@ static blk_status_t sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd) static void sd_config_write_same(struct scsi_disk *sdkp) { - struct request_queue *q = sdkp->disk->queue; unsigned int logical_block_size = sdkp->device->sector_size; if (sdkp->device->no_write_same) { sdkp->max_ws_blocks = 0; - goto out; + return; } /* Some devices can not handle block counts above 0xffff despite @@ -1103,15 +1135,6 @@ static void sd_config_write_same(struct scsi_disk *sdkp) sdkp->max_ws_blocks = 0; } - if (sdkp->lbprz && sdkp->lbpws) - sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP; - else if (sdkp->lbprz && sdkp->lbpws10) - sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP; - else if (sdkp->max_ws_blocks) - sdkp->zeroing_mode = SD_ZERO_WS; - else - sdkp->zeroing_mode = SD_ZERO_WRITE; - if (sdkp->max_ws_blocks && sdkp->physical_block_size > logical_block_size) { /* @@ -1131,10 +1154,6 @@ static void sd_config_write_same(struct scsi_disk *sdkp) bytes_to_logical(sdkp->device, sdkp->physical_block_size)); } - -out: - blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks * - (logical_block_size >> 9)); } static blk_status_t sd_setup_flush_cmnd(struct scsi_cmnd *cmd) @@ -2126,6 +2145,8 @@ static int sd_done(struct scsi_cmnd *SCpnt) case WRITE_SAME: if (SCpnt->cmnd[1] & 8) { /* UNMAP */ sd_config_discard(sdkp, SD_LBP_DISABLE); + sd_config_write_zeroes(sdkp, + SD_ZERO_DISABLE); } else { sdkp->device->no_write_same = 1; sd_config_write_same(sdkp); @@ -3352,7 +3373,9 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_write_same(sdkp, buffer); sd_read_security(sdkp, buffer); sd_read_cpr(sdkp); + sd_config_write_same(sdkp); sd_config_discard(sdkp, SD_LBP_DEFAULT); + sd_config_write_zeroes(sdkp, SD_ZERO_DEFAULT); } /* @@ -3398,7 +3421,6 @@ static int sd_revalidate_disk(struct gendisk *disk) sdkp->first_scan = 0; set_capacity_and_notify(disk, logical_to_sectors(sdp, sdkp->capacity)); - sd_config_write_same(sdkp); kfree(buffer); /* diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 57a8241163c5..e0ee4215a3b4 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -61,11 +61,13 @@ enum sd_lbp_mode { SD_LBP_DISABLE, /* Discard disabled due to failed cmd */ }; -enum { - SD_ZERO_WRITE = 0, /* Use WRITE(10/16) command */ +enum sd_zeroing_mode { + SD_ZERO_DEFAULT = 0, /* Default mode based on what device reports */ + SD_ZERO_WRITE, /* Use WRITE(10/16) command */ SD_ZERO_WS, /* Use WRITE SAME(10/16) command */ SD_ZERO_WS16_UNMAP, /* Use WRITE SAME(16) with UNMAP */ SD_ZERO_WS10_UNMAP, /* Use WRITE SAME(10) with UNMAP */ + SD_ZERO_DISABLE, /* Write Zeroes disabled due to failed cmd */ }; struct scsi_disk { @@ -111,6 +113,7 @@ struct scsi_disk { u8 nr_actuators; /* Number of actuators */ bool lblvpd; bool provisioning_override; + bool zeroing_override; unsigned ATO : 1; /* state of disk ATO bit */ unsigned cache_override : 1; /* temp override of WCE,RCD */ unsigned WCE : 1; /* state of disk WCE bit */
In preparation for adding support for the WRITE SAME(16) NDOB flag, move configuration of the WRITE_ZEROES operation to a separate function. This is done to facilitate fetching all VPD pages before choosing the appropriate zeroing method for a given device. The deferred configuration also allows us to mirror the discard behavior and permit the user to revert a device to the kernel default configuration by echoing "default" to the sysfs file. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- drivers/scsi/sd.c | 56 +++++++++++++++++++++++++++++++++-------------- drivers/scsi/sd.h | 7 ++++-- 2 files changed, 44 insertions(+), 19 deletions(-)