diff mbox

[2/2] sd_zbc: Disable zone locking with scsi-mq enabled

Message ID 20170801093917.4131-3-damien.lemoal@wdc.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Damien Le Moal Aug. 1, 2017, 9:39 a.m. UTC
Unlike the legacy blk-sq infrastructure, blk-mq cannot provide any sort
of guarantees to well behaved users regarding write command ordering for
zoned block devices. This is due to the lockless manipulation of the dispatch
queue as well as the use of different work queues for requeuing requests and
running queues. Request in flight may exist and a combination of context
preemption and request requeue event can easily reorder a sequential
write sequence in the dispatch queue.

This problem is not solved by the zone write lock mechanism in place in
sd_zbc.c (called from sd_init_cmnd() and sd_done()). On the contrary this
locking makes the problem even worse due to the high number of requeue events
it causes under heavy write workloads. Furthermore, this zone locking can
generate dispatch queue deadlocks if a write command that acquired a
zone lock is requeued and ends up behind a write command which has not been
yet prepared but is directed to the same locked zone. In such case, trying to
dispatch this write command will systematically return BLKPREP_DEFER and force
a requeue at the head of the dispatch queue. Subsequent dispatch execution will
fail again to issue any request in the same manner, stalling request dispatch
to the device endlessly.

This patch does not solve the blk-mq reordering problem, but prevent this
deadlock problem from hapenning by disabling zone locking when scsi-mq is
enabled.

Take away from this: ZBC/ZAC host managed drive cannot be used safely
with scsi-mq enabled for now, unless the drive users limit the number of
writes per sequential zone to 1 at all times. Both f2fs and dm-zoned do not
do this and so cannot be used safely with scsi-mq enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Aug. 1, 2017, 1:46 p.m. UTC | #1
On Tue, 2017-08-01 at 18:39 +0900, Damien Le Moal wrote:
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c

> index 96855df9f49d..78fb51a37e86 100644

> --- a/drivers/scsi/sd_zbc.c

> +++ b/drivers/scsi/sd_zbc.c

> @@ -528,15 +528,20 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)

>  

>  static int sd_zbc_setup(struct scsi_disk *sdkp)

>  {

> +	struct request_queue *q = sdkp->disk->queue;

>  

>  	/* chunk_sectors indicates the zone size */

> -	blk_queue_chunk_sectors(sdkp->disk->queue,

> +	blk_queue_chunk_sectors(q,

>  			logical_to_sectors(sdkp->device, sdkp->zone_blocks));

>  	sdkp->zone_shift = ilog2(sdkp->zone_blocks);

>  	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;

>  	if (sdkp->capacity & (sdkp->zone_blocks - 1))

>  		sdkp->nr_zones++;

>  

> +	/* Do not use zone locking in mq case */

> +	if (q->mq_ops)

> +		return 0;

> +

>  	if (!sdkp->zones_wlock) {

>  		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),

>  					    sizeof(unsigned long),


Hello Damien,

Are you aware that the blk-sq / scsi-sq code will be removed once all block
drivers have been converted? Are you aware that we don't want any differences
in behavior like the above between the single queue and multiqueue paths? Can
you check whether the patch series Ming Lei posted earlier this week solves
the frequent requeueing? See also "[PATCH 00/14] blk-mq-sched: fix SCSI-MQ
performance regression" (http://marc.info/?l=linux-block&m=150151989915776).

Thanks,

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 96855df9f49d..78fb51a37e86 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -528,15 +528,20 @@  static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
 
 	/* chunk_sectors indicates the zone size */
-	blk_queue_chunk_sectors(sdkp->disk->queue,
+	blk_queue_chunk_sectors(q,
 			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
 	sdkp->zone_shift = ilog2(sdkp->zone_blocks);
 	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
 	if (sdkp->capacity & (sdkp->zone_blocks - 1))
 		sdkp->nr_zones++;
 
+	/* Do not use zone locking in mq case */
+	if (q->mq_ops)
+		return 0;
+
 	if (!sdkp->zones_wlock) {
 		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
 					    sizeof(unsigned long),