diff mbox series

[10/14] scsi: sd: Move WRITE_ZEROES configuration to a separate function

Message ID 20220302053559.32147-11-martin.petersen@oracle.com (mailing list archive)
State Changes Requested
Headers show
Series [01/14] scsi: mpt3sas: Use cached ATA Information VPD page | expand

Commit Message

Martin K. Petersen March 2, 2022, 5:35 a.m. UTC
In preparation for adding support for the WRITE SAME(16) NDOB flag,
move configuration of the WRITE_ZEROES operation to a separate
function. This is done to facilitate fetching all VPD pages before
choosing the appropriate zeroing method for a given device.

The deferred configuration also allows us to mirror the discard
behavior and permit the user to revert a device to the kernel default
configuration by echoing "default" to the sysfs file.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/sd.c | 56 +++++++++++++++++++++++++++++++++--------------
 drivers/scsi/sd.h |  7 ++++--
 2 files changed, 44 insertions(+), 19 deletions(-)

Comments

Christoph Hellwig March 2, 2022, 9:53 a.m. UTC | #1
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Bart Van Assche March 3, 2022, 12:52 a.m. UTC | #2
On 3/1/22 21:35, Martin K. Petersen wrote:
> +	if (mode == SD_ZERO_DEFAULT)
> +		sdkp->zeroing_override = false;
> +	else
> +		sdkp->zeroing_override = true;

Hmm ... since sd_config_write_zeroes() special-cases mode == 
SD_ZERO_DEFAULT, does the value of zeroing_override matter if mode == 
SD_ZERO_DEFAULT?

> +static void sd_config_write_zeroes(struct scsi_disk *sdkp,
> +				   enum sd_zeroing_mode mode)
> +{
> +	struct request_queue *q = sdkp->disk->queue;
> +	unsigned int logical_block_size = sdkp->device->sector_size;
> +
> +	if (mode == SD_ZERO_DEFAULT && !sdkp->zeroing_override) {
> +		if (sdkp->lbprz && sdkp->lbpws)
> +			mode = SD_ZERO_WS16_UNMAP;
> +		else if (sdkp->lbprz && sdkp->lbpws10)
> +			mode = SD_ZERO_WS10_UNMAP;
> +		else if (sdkp->max_ws_blocks)
> +			mode = SD_ZERO_WS;
> +		else
> +			mode = SD_ZERO_WRITE;
> +	}
> +
> +	if (mode == SD_ZERO_DISABLE)
> +		sdkp->zeroing_override = true;

What does "zeroing_override" mean? I would expect if mode == 
SD_ZERO_DISABLE that that choice from the user is not overridden.

Thanks,

Bart.
Benjamin Block July 18, 2022, 4:51 p.m. UTC | #3
On Wed, Mar 02, 2022 at 12:35:55AM -0500, Martin K. Petersen wrote:
> In preparation for adding support for the WRITE SAME(16) NDOB flag,
> move configuration of the WRITE_ZEROES operation to a separate
> function. This is done to facilitate fetching all VPD pages before
> choosing the appropriate zeroing method for a given device.
> 
> The deferred configuration also allows us to mirror the discard
> behavior and permit the user to revert a device to the kernel default
> configuration by echoing "default" to the sysfs file.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  drivers/scsi/sd.c | 56 +++++++++++++++++++++++++++++++++--------------
>  drivers/scsi/sd.h |  7 ++++--
>  2 files changed, 44 insertions(+), 19 deletions(-)
> 

Hello Martin,

somehow this patch triggers a regression on s390x with zFCP in
`next-20220714`.

In our daily regression test suite a simple:

  # mkfs.ext4 -F /dev/mapper/mpathc1

causes the block layer to trip with this when trying to discard blocks
(at least that's my assumption about what its trying to do) from a SCSI
disk:

  [   33.042224] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.042228] device-mapper: multipath: 251:0: Failing path 8:0.
  [   33.042239] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.042267] device-mapper: multipath: 251:0: Failing path 8:64.
  [   33.197329] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   33.198850] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   33.210742] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.210752] device-mapper: multipath: 251:0: Failing path 8:0.
  [   33.210771] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.210792] device-mapper: multipath: 251:0: Failing path 8:64.
  [   38.200929] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   38.201489] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   38.220039] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   38.220045] device-mapper: multipath: 251:0: Failing path 8:0.
  [   38.220056] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   38.220060] device-mapper: multipath: 251:0: Failing path 8:64.
  [   43.202538] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   43.203015] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   43.219877] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   43.219881] device-mapper: multipath: 251:0: Failing path 8:0.
  [   43.219889] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   43.219892] device-mapper: multipath: 251:0: Failing path 8:64.
  [   48.204035] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   48.204526] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   48.219951] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   48.219964] device-mapper: multipath: 251:0: Failing path 8:0.
  [   48.219990] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   48.219996] device-mapper: multipath: 251:0: Failing path 8:64.
  [   53.205456] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   53.206950] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   53.219820] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   53.219824] device-mapper: multipath: 251:0: Failing path 8:0.
  [   53.219834] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   53.219837] device-mapper: multipath: 251:0: Failing path 8:64.
  [   58.209693] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   58.210143] device-mapper: multipath: 251:0: Reinstating path 8:64.

This continues ad infinitum.

I suspected this patchset as it's new in next-20220714, and next-20220704
didn't have this bug in our regression runs. I didn't see any other 'obvious'
patch in scsi or block that has a diff between those two tags.

I started bisecting between:
  9821106213c8 ("scsi: zfcp: Drop redundant "the" in the comments")
as good, and
  f095c3cd1b69 ("scsi: qla2xxx: Update version to 10.02.07.800-k")
as bad; and ended up at:
  1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function")

I ran this on Fedora 36, with mkfs.ext4 1.46.5 (30-Dec-2021).

The multipath device:

  create: mpathc (36005076307ffc5e3000000000000805c) dm-0 IBM,2107900
  size=20G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  `-+- policy='service-time 0' prio=50 status=active
    |- 0:0:0:1079787648 sda 8:0   active ready running
    `- 1:0:0:1079787648 sde 8:64  active ready running

Some information on the block devices and topology:

  # lsblk /dev/sda /dev/sde
  NAME        MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS
  sda           8:0    0  20G  0 disk
  └─mpathc    251:0    0  20G  0 mpath
    └─mpathc1 251:2    0  20G  0 part
  sde           8:64   0  20G  0 disk
  └─mpathc    251:0    0  20G  0 mpath
    └─mpathc1 251:2    0  20G  0 part
  # lsblk -t /dev/sda /dev/sde
  NAME        ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE  RA WSAME
  sda                 0    512      0     512     512    1 bfq             256 512    0B
  └─mpathc            0    512      0     512     512    1 mq-deadline     256 128    0B
    └─mpathc1         0    512      0     512     512    1                 128 128    0B
  sde                 0    512      0     512     512    1 bfq             256 512    0B
  └─mpathc            0    512      0     512     512    1 mq-deadline     256 128    0B
    └─mpathc1         0    512      0     512     512    1                 128 128    0B
  # lsblk -D /dev/sda /dev/sde
  NAME        DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
  sda                0        1G       4G         0
  └─mpathc           0        1G       4G         0
    └─mpathc1        0        1G       4G         0
  sde                0        1G       4G         0
  └─mpathc           0        1G       4G         0
    └─mpathc1        0        1G       4G         0
  # lsblk -S /dev/sda /dev/sde
  NAME HCTL             TYPE VENDOR   MODEL    REV SERIAL      TRAN
  sda  0:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc
  sde  1:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc

Any idea why this is happening? In case you need more details, this
reproduces very reliably here.
Martin K. Petersen July 19, 2022, 2:23 a.m. UTC | #4
Benjamin,

>   # lsblk -D /dev/sda /dev/sde
>   NAME        DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
>   sda                0        1G       4G         0
>   └─mpathc           0        1G       4G         0
>     └─mpathc1        0        1G       4G         0
>   sde                0        1G       4G         0
>   └─mpathc           0        1G       4G         0
>     └─mpathc1        0        1G       4G         0

>   # lsblk -S /dev/sda /dev/sde
>   NAME HCTL             TYPE VENDOR   MODEL    REV SERIAL      TRAN
>   sda  0:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc
>   sde  1:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc
>
> Any idea why this is happening? In case you need more details, this
> reproduces very reliably here.

Are the reported discard limits different on kernels predating that
commit?

Please send the output of:

# grep . /sys/block/sdN/queue/discard_* /sys/block/sdN/device/scsi_disk/*/*_mode
# sg_readcap -l /dev/sdN
# sg_vpg -p bl /dev/sdN
# sg_vpg -p lbpv /dev/sdN

Ideally (for the grep) before and after the offending commit.

Thanks!
Benjamin Block July 19, 2022, 11:37 a.m. UTC | #5
On Mon, Jul 18, 2022 at 10:23:26PM -0400, Martin K. Petersen wrote:
> Please send the output of:
> 
> # grep . /sys/block/sdN/queue/discard_* /sys/block/sdN/device/scsi_disk/*/*_mode
> # sg_readcap -l /dev/sdN
> # sg_vpg -p bl /dev/sdN
> # sg_vpg -p lbpv /dev/sdN
> 
> Ideally (for the grep) before and after the offending commit.

Sure,

I assume with `sg_vpg` you mean `sg_vpd`.

1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function")
---------------------------------------------------------------------------------

This is the first bad commit 

  # lsblk -s /dev/mapper/mpathe1
  NAME     MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS
  mpathe1  251:5    0  20G  0 part
  └─mpathe 251:2    0  20G  0 mpath
    ├─sde    8:64   0  20G  0 disk
    └─sdi    8:128  0  20G  0 disk
  
  # ll /dev/mapper/{mpathe1,mpathe}
  lrwxrwxrwx. 1 root root 7 Jul 19 12:52 /dev/mapper/mpathe -> ../dm-2
  lrwxrwxrwx. 1 root root 7 Jul 19 12:52 /dev/mapper/mpathe1 -> ../dm-5
  
  # lsblk -st /dev/mapper/mpathe1
  NAME     ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE  RA WSAME
  mpathe1          0    512      0     512     512    1                 128 128    0B
  └─mpathe         0    512      0     512     512    1 mq-deadline     256 128    0B
    ├─sde          0    512      0     512     512    1 bfq             256 512    0B
    └─sdi          0    512      0     512     512    1 bfq             256 512    0B
  
  # lsblk -sD /dev/mapper/mpathe1
  NAME     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
  mpathe1         0        1G      32M         0
  └─mpathe        0        1G      32M         0
    ├─sde         0        1G      32M         0
    └─sdi         0        1G      32M         0
  
  # grep -H . /sys/block/{sde,sdi,dm-2,dm-5}/queue/discard_* /sys/block/{sde,sdi}/device/scsi_disk/*/*_mode
  /sys/block/sde/queue/discard_granularity:1073741824
  /sys/block/sde/queue/discard_max_bytes:33553920
  /sys/block/sde/queue/discard_max_hw_bytes:33553920
  /sys/block/sde/queue/discard_zeroes_data:0
  /sys/block/sdi/queue/discard_granularity:1073741824
  /sys/block/sdi/queue/discard_max_bytes:33553920
  /sys/block/sdi/queue/discard_max_hw_bytes:33553920
  /sys/block/sdi/queue/discard_zeroes_data:0
  /sys/block/dm-2/queue/discard_granularity:1073741824
  /sys/block/dm-2/queue/discard_max_bytes:33553920
  /sys/block/dm-2/queue/discard_max_hw_bytes:33553920
  /sys/block/dm-2/queue/discard_zeroes_data:0
  /sys/block/dm-5/queue/discard_granularity:1073741824
  /sys/block/dm-5/queue/discard_max_bytes:33553920
  /sys/block/dm-5/queue/discard_max_hw_bytes:33553920
  /sys/block/dm-5/queue/discard_zeroes_data:0
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/protection_mode:none
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/provisioning_mode:writesame_16
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/zeroing_mode:writesame_16_unmap
  /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/protection_mode:none
  /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/provisioning_mode:writesame_16
  /sys/block/sdi/device/scsi_disk/3:0:0:1083719810/zeroing_mode:writesame_16_unmap
  
  # sg_readcap -l /dev/sde
  Read Capacity results:
     Protection: prot_en=0, p_type=0, p_i_exponent=0
     Logical block provisioning: lbpme=1, lbprz=1
     Last LBA=41943039 (0x27fffff), Number of logical blocks=41943040
     Logical block length=512 bytes
     Logical blocks per physical block exponent=0
     Lowest aligned LBA=0
  Hence:
     Device size: 21474836480 bytes, 20480.0 MiB, 21.47 GB
  
  # dmesg | tail -n 5
  [  111.308428] sd 2:0:0:1083719810: [sde] tag#2053 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
  [  111.308438] sd 2:0:0:1083719810: [sde] tag#2053 CDB: Inquiry 12 01 b9 00 04 00
  [  111.308441] sd 2:0:0:1083719810: [sde] tag#2053 Sense Key : Illegal Request [current]
  [  111.308444] sd 2:0:0:1083719810: [sde] tag#2053 Add. Sense: Invalid field in cdb
  [  111.311099]  sde: sde1
  
  # sg_readcap -l /dev/sdi
  Read Capacity results:
     Protection: prot_en=0, p_type=0, p_i_exponent=0
     Logical block provisioning: lbpme=1, lbprz=1
     Last LBA=41943039 (0x27fffff), Number of logical blocks=41943040
     Logical block length=512 bytes
     Logical blocks per physical block exponent=0
     Lowest aligned LBA=0
  Hence:
     Device size: 21474836480 bytes, 20480.0 MiB, 21.47 GB
  
  # dmesg | tail -n 5
  [  125.621343] sd 3:0:0:1083719810: [sdi] tag#2325 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
  [  125.621352] sd 3:0:0:1083719810: [sdi] tag#2325 CDB: Inquiry 12 01 b9 00 04 00
  [  125.621355] sd 3:0:0:1083719810: [sdi] tag#2325 Sense Key : Illegal Request [current]
  [  125.621358] sd 3:0:0:1083719810: [sdi] tag#2325 Add. Sense: Invalid field in cdb
  [  125.623898]  sdi: sdi1
  
  # sg_vpd -p bl /dev/sde
  Block limits VPD page (SBC):
    Write same non-zero (WSNZ): 0
    Maximum compare and write length: 1 blocks
    Optimal transfer length granularity: 0 blocks [not reported]
    Maximum transfer length: 0 blocks [not reported]
    Optimal transfer length: 0 blocks [not reported]
    Maximum prefetch transfer length: 0 blocks [ignored]
    Maximum unmap LBA count: -1 [unbounded]
    Maximum unmap block descriptor count: 0 [Unmap command not implemented]
    Optimal unmap granularity: 2097152 blocks
    Unmap granularity alignment valid: true
    Unmap granularity alignment: 0
    Maximum write same length: 0 blocks [not reported]
    Maximum atomic transfer length: 0 blocks [not reported]
    Atomic alignment: 0 [unaligned atomic writes permitted]
    Atomic transfer length granularity: 0 [no granularity requirement
    Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
    Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]
  
  # sg_vpd -p lbpv /dev/sde
  Logical block provisioning VPD page (SBC):
    Unmap command supported (LBPU): 0
    Write same (16) with unmap bit supported (LBPWS): 1
    Write same (10) with unmap bit supported (LBPWS10): 0
    Logical block provisioning read zeros (LBPRZ): 0
    Anchored LBAs supported (ANC_SUP): 0
    Threshold exponent: 21
    Descriptor present (DP): 0
    Minimum percentage: 0 [not reported]
    Provisioning type: 0 (not known or fully provisioned)
    Threshold percentage: 0 [percentages not supported]
  
  # sg_vpd -p bl /dev/sdi
  Block limits VPD page (SBC):
    Write same non-zero (WSNZ): 0
    Maximum compare and write length: 1 blocks
    Optimal transfer length granularity: 0 blocks [not reported]
    Maximum transfer length: 0 blocks [not reported]
    Optimal transfer length: 0 blocks [not reported]
    Maximum prefetch transfer length: 0 blocks [ignored]
    Maximum unmap LBA count: -1 [unbounded]
    Maximum unmap block descriptor count: 0 [Unmap command not implemented]
    Optimal unmap granularity: 2097152 blocks
    Unmap granularity alignment valid: true
    Unmap granularity alignment: 0
    Maximum write same length: 0 blocks [not reported]
    Maximum atomic transfer length: 0 blocks [not reported]
    Atomic alignment: 0 [unaligned atomic writes permitted]
    Atomic transfer length granularity: 0 [no granularity requirement
    Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
    Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]
  
  # sg_vpd -p lbpv /dev/sdi
  Logical block provisioning VPD page (SBC):
    Unmap command supported (LBPU): 0
    Write same (16) with unmap bit supported (LBPWS): 1
    Write same (10) with unmap bit supported (LBPWS10): 0
    Logical block provisioning read zeros (LBPRZ): 0
    Anchored LBAs supported (ANC_SUP): 0
    Threshold exponent: 21
    Descriptor present (DP): 0
    Minimum percentage: 0 [not reported]
    Provisioning type: 0 (not known or fully provisioned)
    Threshold percentage: 0 [percentages not supported]
  
  # mkfs.ext4 -F /dev/mapper/mpathe1
  ...
  [  307.192885] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [  307.192892] device-mapper: multipath: 251:2: Failing path 8:128.
  [  307.192938] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [  307.192941] device-mapper: multipath: 251:2: Failing path 8:64.
  [  311.548555] device-mapper: multipath: 251:2: Reinstating path 8:128.
  [  311.548883] device-mapper: multipath: 251:2: Reinstating path 8:64.
  [  311.562499] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [  311.562521] device-mapper: multipath: 251:2: Failing path 8:128.
  [  311.562553] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [  311.562557] device-mapper: multipath: 251:2: Failing path 8:64.
  ...

5be0f08e9d95 ("scsi: sd: Fix discard errors during revalidate")
---------------------------------------------------------------

This is the last good commit

  # lsblk -s /dev/mapper/mpathe1
  NAME     MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS
  mpathe1  251:6    0  20G  0 part
  └─mpathe 251:2    0  20G  0 mpath
    ├─sde    8:64   0  20G  0 disk
    └─sdf    8:80   0  20G  0 disk
  
  # ll /dev/mapper/{mpathe1,mpathe}
  lrwxrwxrwx. 1 root root 7 Jul 19 12:29 /dev/mapper/mpathe -> ../dm-2
  lrwxrwxrwx. 1 root root 7 Jul 19 12:37 /dev/mapper/mpathe1 -> ../dm-6
  
  # lsblk -st /dev/mapper/mpathe1
  NAME     ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE  RA WSAME
  mpathe1          0    512      0     512     512    1                 128 128    0B
  └─mpathe         0    512      0     512     512    1 mq-deadline     256 128    0B
    ├─sde          0    512      0     512     512    1 bfq             256 512    0B
    └─sdf          0    512      0     512     512    1 bfq             256 512    0B
  
  # lsblk -sD /dev/mapper/mpathe1
  NAME     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
  mpathe1         0        1G       4G         0
  └─mpathe        0        1G       4G         0
    ├─sde         0        1G       4G         0
    └─sdf         0        1G       4G         0
  
  # grep -H . /sys/block/{sde,sdf,dm-2,dm-6}/queue/discard_* /sys/block/{sde,sdf}/device/scsi_disk/*/*_mode
  /sys/block/sde/queue/discard_granularity:1073741824
  /sys/block/sde/queue/discard_max_bytes:4294966784
  /sys/block/sde/queue/discard_max_hw_bytes:4294966784
  /sys/block/sde/queue/discard_zeroes_data:0
  /sys/block/sdf/queue/discard_granularity:1073741824
  /sys/block/sdf/queue/discard_max_bytes:4294966784
  /sys/block/sdf/queue/discard_max_hw_bytes:4294966784
  /sys/block/sdf/queue/discard_zeroes_data:0
  /sys/block/dm-2/queue/discard_granularity:1073741824
  /sys/block/dm-2/queue/discard_max_bytes:4294966784
  /sys/block/dm-2/queue/discard_max_hw_bytes:4294966784
  /sys/block/dm-2/queue/discard_zeroes_data:0
  /sys/block/dm-6/queue/discard_granularity:1073741824
  /sys/block/dm-6/queue/discard_max_bytes:4294966784
  /sys/block/dm-6/queue/discard_max_hw_bytes:4294966784
  /sys/block/dm-6/queue/discard_zeroes_data:0
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/protection_mode:none
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/provisioning_mode:writesame_16
  /sys/block/sde/device/scsi_disk/2:0:0:1083719810/zeroing_mode:writesame_16_unmap
  /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/protection_mode:none
  /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/provisioning_mode:writesame_16
  /sys/block/sdf/device/scsi_disk/3:0:0:1083719810/zeroing_mode:writesame_16_unmap
  
  # mkfs.ext4 -F /dev/mapper/mpathe1
  mke2fs 1.46.5 (30-Dec-2021)
  Discarding device blocks: done
  Creating filesystem with 5242368 4k blocks and 1310720 inodes
  Filesystem UUID: 5d0dc4c2-445c-4a90-aaa1-0998459497c5
  Superblock backups stored on blocks:
  	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
  	4096000
  
  Allocating group tables: done
  Writing inode tables: done
  Creating journal (32768 blocks): done
  Writing superblocks and filesystem accounting information: done

This is a IBM DS8870 (first announced in 2012):
https://www.ibm.com/common/ssi/rep_sm/4/877/ENUS2424-_h04/index.html

This is one of the oldest storage boxes we have right now, and this
regression it doesn't seem to happen on newer models as far as I can
see.
Martin K. Petersen July 21, 2022, 2:29 a.m. UTC | #6
Benjamin,

> This is one of the oldest storage boxes we have right now, and this
> regression it doesn't seem to happen on newer models as far as I can
> see.

I have not had much luck reproducing your results today despite
reporting the same parameters in the VPD pages as your device.

I would appreciate if you send me the output of:

# grep . /sys/block/{sd,dm}*/queue/write_zeroes_max_bytes /sys/block/sd*/device/scsi_disk/*/max_write_same_blocks

for the failing configuration.

Thanks!
Benjamin Block July 21, 2022, 1:25 p.m. UTC | #7
On Wed, Jul 20, 2022 at 10:29:06PM -0400, Martin K. Petersen wrote:
> > This is one of the oldest storage boxes we have right now, and this
> > regression it doesn't seem to happen on newer models as far as I can
> > see.
> 
> I have not had much luck reproducing your results today despite
> reporting the same parameters in the VPD pages as your device.
> 
> I would appreciate if you send me the output of:
> 
> # grep . /sys/block/{sd,dm}*/queue/write_zeroes_max_bytes /sys/block/sd*/device/scsi_disk/*/max_write_same_blocks
> 
> for the failing configuration.

Again at commit
1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function")

The device names keep changing like that because these are CI systems
that are reset and re-installed every night, and I don't get the same
system every day; just in case this looks strange.

  # lsblk -s -o '+KNAME' /dev/mapper/mpathd1
  NAME     MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS KNAME
  mpathd1  251:2    0  20G  0 part              dm-2
  └─mpathd 251:0    0  20G  0 mpath             dm-0
    ├─sda    8:0    0  20G  0 disk              sda
    └─sde    8:64   0  20G  0 disk              sde
  
  # grep . /sys/block/{sda,sde,dm-0,dm-2}/queue/write_zeroes_max_bytes /sys/block/{sda,sde}/device/scsi_disk/*/max_write_same_blocks
  /sys/block/sda/queue/write_zeroes_max_bytes:33553920
  /sys/block/sde/queue/write_zeroes_max_bytes:33553920
  /sys/block/dm-0/queue/write_zeroes_max_bytes:33553920
  /sys/block/dm-2/queue/write_zeroes_max_bytes:33553920
  /sys/block/sda/device/scsi_disk/0:0:0:1079787648/max_write_same_blocks:65535
  /sys/block/sde/device/scsi_disk/1:0:0:1079787648/max_write_same_blocks:65535
  
  # for fl in /sys/block/sda/device/scsi_disk/*/device/inquiry /sys/block/sda/device/scsi_disk/*/device/vpd_*; do (set -x; xxd "${fl}"); done
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/inquiry
  00000000: 0000 0532 9f10 1002 4942 4d20 2020 2020  ...2....IBM
  00000010: 3231 3037 3930 3020 2020 2020 2020 2020  2107900
  00000020: 3130 3630 3735 444c 3234 3138 3035 4320  106075DL241805C
  00000030: 2020 2020 2020 2020 0000 0060 0da0 0a00          ...`....
  00000040: 0300 0320 0000 0000 0000 0000 0000 0000  ... ............
  00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000060: 0101 3037 3500 3034 3731 3400 0000 8800  ..075.04714.....
  00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  000000a0: 0002 0800                                ....
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg0
  00000000: 0000 000a 0080 8386 b0b1 b2c0 c1c2       ..............
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg80
  00000000: 0080 0010 3735 444c 3234 3138 3035 4320  ....75DL241805C
  00000010: 2020 2020
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pg83
  00000000: 0083 0024 0103 0010 6005 0763 07ff c5e3  ...$....`..c....
  00000010: 0000 0000 0000 805c 0114 0004 0000 0101  .......\........
  00000020: 0115 0004 0000 0000                      ........
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb0
  00000000: 00b0 003c 0001 0000 0000 0000 0000 0000  ...<............
  00000010: 0000 0000 ffff ffff 0000 0000 0020 0000  ............. ..
  00000020: 8000 0000 0000 0000 0000 0000 0000 0000  ................
  00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb1
  00000000: 00b1 003c 1c20 0000 0000 0000 0000 0000  ...<. ..........
  00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  + xxd /sys/block/sda/device/scsi_disk/0:0:0:1079787648/device/vpd_pgb2
  00000000: 00b2 0004 1540 0000                      .....@..

Maybe this also helps, I took a snapshot of dmesg when the devices are
sensed:

  ...
  [    3.228175] SCSI subsystem initialized
  ...
  [    3.942878] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
  [    3.942912] device-mapper: uevent: version 1.0.3
  [    3.942981] device-mapper: ioctl: 4.46.0-ioctl (2022-02-22) initialised: dm-devel@redhat.com
  ...
  [   31.563828] zfcp 0.0.1900: qdio: ZFCP on SC 15b using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W
  [   32.688192] scsi host0: scsi_eh_0: sleeping
  [   32.688238] scsi host0: zfcp
  ...
  [   32.725595] scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
  [   32.725921] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
  [   32.725931] scsi 0:0:0:0: scsi scan: INQUIRY pass 2 length 164
  [   32.726133] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
  [   32.726139] scsi 0:0:0:0: scsi scan: peripheral device type of 31, no device added
  [   32.726560] scsi 0:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
  [   32.727220] scsi 0:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
  [   32.727222] scsi 0:0:0:0: scsi scan: REPORT LUN scan
  [   32.727475] scsi 0:0:0:1079787648: scsi scan: INQUIRY pass 1 length 36
  [   32.727717] scsi 0:0:0:1079787648: scsi scan: INQUIRY successful with code 0x0
  [   32.727723] scsi 0:0:0:1079787648: scsi scan: INQUIRY pass 2 length 164
  [   32.727927] scsi 0:0:0:1079787648: scsi scan: INQUIRY successful with code 0x0
  [   32.727935] scsi 0:0:0:1079787648: Direct-Access     IBM      2107900          1060 PQ: 0 ANSI: 5
  [   32.729982] scsi 0:0:0:1079787648: sg_alloc: dev=0
  [   32.730027] sd 0:0:0:1079787648: Attached scsi generic sg0 type 0
  ...
  [   32.730502] sd 0:0:0:1079787648: Power-on or device reset occurred
  [   32.730508] sd 0:0:0:1079787648: [sda] tag#2176 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
  [   32.730511] sd 0:0:0:1079787648: [sda] tag#2176 CDB: Test Unit Ready 00 00 00 00 00 00
  [   32.730514] sd 0:0:0:1079787648: [sda] tag#2176 Sense Key : Unit Attention [current]
  [   32.730516] sd 0:0:0:1079787648: [sda] tag#2176 Add. Sense: Power on, reset, or bus device reset occurred
  ...
  [   32.730774] sd 0:0:0:1079787648: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [   32.730928] sd 0:0:0:1079787648: [sda] Write Protect is off
  [   32.730930] sd 0:0:0:1079787648: [sda] Mode Sense: ed 00 00 08
  [   32.731243] sd 0:0:0:1079787648: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  [   32.731376] sd 0:0:0:1079787648: [sda] tag#3781 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
  [   32.731379] sd 0:0:0:1079787648: [sda] tag#3781 CDB: Report supported operation codes a3 0c 01 12 00 00 00 00 00 0a 00 00
  [   32.731381] sd 0:0:0:1079787648: [sda] tag#3781 Sense Key : Illegal Request [current]
  [   32.731383] sd 0:0:0:1079787648: [sda] tag#3781 Add. Sense: Invalid field in cdb
  ...
  [   32.734915] sd 0:0:0:1079787648: [sda] tag#3784 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
  [   32.734917] sd 0:0:0:1079787648: [sda] tag#3784 CDB: Inquiry 12 01 b9 00 04 00
  [   32.734919] sd 0:0:0:1079787648: [sda] tag#3784 Sense Key : Illegal Request [current]
  [   32.734921] sd 0:0:0:1079787648: [sda] tag#3784 Add. Sense: Invalid field in cdb
  ...
  [   32.737032]  sda: sda1
  ...
  [   32.737185] sd 0:0:0:1079787648: [sda] Attached SCSI disk
  ...
  [   37.294939] alua: device handler registered
  [   37.296390] emc: device handler registered
  [   37.297819] rdac: device handler registered
  [   37.351199] device-mapper: multipath service-time: version 0.3.0 loaded
  [   37.351424] sd 0:0:0:1079787648: alua: supports implicit TPGS
  [   37.351427] sd 0:0:0:1079787648: alua: device naa.6005076307ffc5e3000000000000805c port group 0 rel port 101
  ...
  [   37.378201] sd 0:0:0:1079787648: alua: transition timeout set to 60 seconds
  [   37.378204] sd 0:0:0:1079787648: alua: port group 00 state A preferred supports tolusnA
  ...

Thanks for the support Martin.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index eae5c81ae515..ee4f4aea5f0f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -101,6 +101,7 @@  MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 #define SD_MINORS	16
 
 static void sd_config_discard(struct scsi_disk *, enum sd_lbp_mode);
+static void sd_config_write_zeroes(struct scsi_disk *, enum sd_zeroing_mode);
 static void sd_config_write_same(struct scsi_disk *);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
@@ -428,10 +429,12 @@  static DEVICE_ATTR_RW(provisioning_mode);
 
 /* sysfs_match_string() requires dense arrays */
 static const char *zeroing_mode[] = {
+	[SD_ZERO_DEFAULT]	= "default",
 	[SD_ZERO_WRITE]		= "write",
 	[SD_ZERO_WS]		= "writesame",
 	[SD_ZERO_WS16_UNMAP]	= "writesame_16_unmap",
 	[SD_ZERO_WS10_UNMAP]	= "writesame_10_unmap",
+	[SD_ZERO_DISABLE]	= "disabled",
 };
 
 static ssize_t
@@ -457,7 +460,12 @@  zeroing_mode_store(struct device *dev, struct device_attribute *attr,
 	if (mode < 0)
 		return -EINVAL;
 
-	sdkp->zeroing_mode = mode;
+	if (mode == SD_ZERO_DEFAULT)
+		sdkp->zeroing_override = false;
+	else
+		sdkp->zeroing_override = true;
+
+	sd_config_write_zeroes(sdkp, mode);
 
 	return count;
 }
@@ -1049,6 +1057,31 @@  static blk_status_t sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd,
 	return scsi_alloc_sgtables(cmd);
 }
 
+static void sd_config_write_zeroes(struct scsi_disk *sdkp,
+				   enum sd_zeroing_mode mode)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned int logical_block_size = sdkp->device->sector_size;
+
+	if (mode == SD_ZERO_DEFAULT && !sdkp->zeroing_override) {
+		if (sdkp->lbprz && sdkp->lbpws)
+			mode = SD_ZERO_WS16_UNMAP;
+		else if (sdkp->lbprz && sdkp->lbpws10)
+			mode = SD_ZERO_WS10_UNMAP;
+		else if (sdkp->max_ws_blocks)
+			mode = SD_ZERO_WS;
+		else
+			mode = SD_ZERO_WRITE;
+	}
+
+	if (mode == SD_ZERO_DISABLE)
+		sdkp->zeroing_override = true;
+
+	sdkp->zeroing_mode = mode;
+	blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks *
+					   (logical_block_size >> 9));
+}
+
 static blk_status_t sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = scsi_cmd_to_rq(cmd);
@@ -1079,12 +1112,11 @@  static blk_status_t sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
 
 static void sd_config_write_same(struct scsi_disk *sdkp)
 {
-	struct request_queue *q = sdkp->disk->queue;
 	unsigned int logical_block_size = sdkp->device->sector_size;
 
 	if (sdkp->device->no_write_same) {
 		sdkp->max_ws_blocks = 0;
-		goto out;
+		return;
 	}
 
 	/* Some devices can not handle block counts above 0xffff despite
@@ -1103,15 +1135,6 @@  static void sd_config_write_same(struct scsi_disk *sdkp)
 		sdkp->max_ws_blocks = 0;
 	}
 
-	if (sdkp->lbprz && sdkp->lbpws)
-		sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP;
-	else if (sdkp->lbprz && sdkp->lbpws10)
-		sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP;
-	else if (sdkp->max_ws_blocks)
-		sdkp->zeroing_mode = SD_ZERO_WS;
-	else
-		sdkp->zeroing_mode = SD_ZERO_WRITE;
-
 	if (sdkp->max_ws_blocks &&
 	    sdkp->physical_block_size > logical_block_size) {
 		/*
@@ -1131,10 +1154,6 @@  static void sd_config_write_same(struct scsi_disk *sdkp)
 				   bytes_to_logical(sdkp->device,
 						    sdkp->physical_block_size));
 	}
-
-out:
-	blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks *
-					 (logical_block_size >> 9));
 }
 
 static blk_status_t sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
@@ -2126,6 +2145,8 @@  static int sd_done(struct scsi_cmnd *SCpnt)
 			case WRITE_SAME:
 				if (SCpnt->cmnd[1] & 8) { /* UNMAP */
 					sd_config_discard(sdkp, SD_LBP_DISABLE);
+					sd_config_write_zeroes(sdkp,
+							       SD_ZERO_DISABLE);
 				} else {
 					sdkp->device->no_write_same = 1;
 					sd_config_write_same(sdkp);
@@ -3352,7 +3373,9 @@  static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_write_same(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
 		sd_read_cpr(sdkp);
+		sd_config_write_same(sdkp);
 		sd_config_discard(sdkp, SD_LBP_DEFAULT);
+		sd_config_write_zeroes(sdkp, SD_ZERO_DEFAULT);
 	}
 
 	/*
@@ -3398,7 +3421,6 @@  static int sd_revalidate_disk(struct gendisk *disk)
 	sdkp->first_scan = 0;
 
 	set_capacity_and_notify(disk, logical_to_sectors(sdp, sdkp->capacity));
-	sd_config_write_same(sdkp);
 	kfree(buffer);
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 57a8241163c5..e0ee4215a3b4 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -61,11 +61,13 @@  enum sd_lbp_mode {
 	SD_LBP_DISABLE,		/* Discard disabled due to failed cmd */
 };
 
-enum {
-	SD_ZERO_WRITE = 0,	/* Use WRITE(10/16) command */
+enum sd_zeroing_mode {
+	SD_ZERO_DEFAULT = 0,	/* Default mode based on what device reports */
+	SD_ZERO_WRITE,		/* Use WRITE(10/16) command */
 	SD_ZERO_WS,		/* Use WRITE SAME(10/16) command */
 	SD_ZERO_WS16_UNMAP,	/* Use WRITE SAME(16) with UNMAP */
 	SD_ZERO_WS10_UNMAP,	/* Use WRITE SAME(10) with UNMAP */
+	SD_ZERO_DISABLE,	/* Write Zeroes disabled due to failed cmd */
 };
 
 struct scsi_disk {
@@ -111,6 +113,7 @@  struct scsi_disk {
 	u8		nr_actuators;		/* Number of actuators */
 	bool		lblvpd;
 	bool		provisioning_override;
+	bool		zeroing_override;
 	unsigned	ATO : 1;	/* state of disk ATO bit */
 	unsigned	cache_override : 1; /* temp override of WCE,RCD */
 	unsigned	WCE : 1;	/* state of disk WCE bit */