Message ID | 20240815082755.105242-2-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Fix __blkdev_issue_write_zeroes() limit handling | expand |
On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote: > +/* > + * Pass bio_write_zeroes_limit() return value in @limit, as the return > + * value may change after a REQ_OP_WRITE_ZEROES is issued. > + */ I don't think that really helps all that much to explain the issue, which is about SCSI not having an ahead of time flag that reliably works for write same support, which makes it clear the limit to 0 on the first I/O completion. Maybe you can actually spell this out? Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@lst.de>
On 15/08/2024 13:40, Christoph Hellwig wrote: > On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote: >> +/* >> + * Pass bio_write_zeroes_limit() return value in @limit, as the return >> + * value may change after a REQ_OP_WRITE_ZEROES is issued. >> + */ > I don't think that really helps all that much to explain the issue, > which is about SCSI not having an ahead of time flag that reliably > works for write same support, which makes it clear the limit to 0 on > the first I/O completion. Maybe you can actually spell this out? Please just tell me what you would like to see, and I will copy verbatim. cheers
On Thu, Aug 15, 2024 at 02:29:46PM +0100, John Garry wrote: > On 15/08/2024 13:40, Christoph Hellwig wrote: >> On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote: >>> +/* >>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return >>> + * value may change after a REQ_OP_WRITE_ZEROES is issued. >>> + */ >> I don't think that really helps all that much to explain the issue, >> which is about SCSI not having an ahead of time flag that reliably >> works for write same support, which makes it clear the limit to 0 on >> the first I/O completion. Maybe you can actually spell this out? > > Please just tell me what you would like to see, and I will copy verbatim. Probably just what I just wrote. Unless Martin can come up with better language.
On 15/08/2024 16:26, Christoph Hellwig wrote: >>>> +/* >>>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return >>>> + * value may change after a REQ_OP_WRITE_ZEROES is issued. >>>> + */ >>> I don't think that really helps all that much to explain the issue, >>> which is about SCSI not having an ahead of time flag that reliably >>> works for write same support, which makes it clear the limit to 0 on >>> the first I/O completion. Maybe you can actually spell this out? >> Please just tell me what you would like to see, and I will copy verbatim. > Probably just what I just wrote. Unless Martin can come up with > better language. /* * SCSI does not have a reliable ahead of time flag that reliably * works for write same support; instead, the limit is set to zero for * the first failed I/O completion. As such, only read the limit prior * to submission, as it may later change. */ ok?
Christoph, >>>> +/* >>>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return >>>> + * value may change after a REQ_OP_WRITE_ZEROES is issued. >>>> + */ >>> I don't think that really helps all that much to explain the issue, >>> which is about SCSI not having an ahead of time flag that reliably >>> works for write same support, which makes it clear the limit to 0 on >>> the first I/O completion. Maybe you can actually spell this out? >> >> Please just tell me what you would like to see, and I will copy verbatim. > > Probably just what I just wrote. Unless Martin can come up with > better language. There is no reliable way for the SCSI subsystem to determine whether a device supports a WRITE SAME operation without actually performing a write to media. As a result, write_zeroes is enabled by default and will be disabled if a zeroing operation subsequently fails. This means that this queue limit is likely to change at runtime. Something like that, perhaps?
diff --git a/block/blk-lib.c b/block/blk-lib.c index 9f735efa6c94..00cee94e5d42 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -111,13 +111,17 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev) (UINT_MAX >> SECTOR_SHIFT) & ~bs_mask); } +/* + * Pass bio_write_zeroes_limit() return value in @limit, as the return + * value may change after a REQ_OP_WRITE_ZEROES is issued. + */ static void __blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, - struct bio **biop, unsigned flags) + struct bio **biop, unsigned flags, sector_t limit) { + while (nr_sects) { - unsigned int len = min_t(sector_t, nr_sects, - bio_write_zeroes_limit(bdev)); + unsigned int len = min(nr_sects, limit); struct bio *bio; if ((flags & BLKDEV_ZERO_KILLABLE) && @@ -141,12 +145,14 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev, static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp, unsigned flags) { + sector_t limit = bio_write_zeroes_limit(bdev); struct bio *bio = NULL; struct blk_plug plug; int ret = 0; blk_start_plug(&plug); - __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags); + __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, + flags, limit); if (bio) { if ((flags & BLKDEV_ZERO_KILLABLE) && fatal_signal_pending(current)) { @@ -165,7 +171,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, * on an I/O error, in which case we'll turn any error into * "not supported" here. */ - if (ret && !bdev_write_zeroes_sectors(bdev)) + if (ret && !limit) return -EOPNOTSUPP; return ret; } @@ -265,12 +271,14 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, unsigned flags) { + sector_t limit = bio_write_zeroes_limit(bdev); + if (bdev_read_only(bdev)) return -EPERM; - if (bdev_write_zeroes_sectors(bdev)) { + if (limit) { __blkdev_issue_write_zeroes(bdev, sector, nr_sects, - gfp_mask, biop, flags); + gfp_mask, biop, flags, limit); } else { if (flags & BLKDEV_ZERO_NOFALLBACK) return -EOPNOTSUPP;
As reported in [0], we may get a hang when formatting a XFS FS on a RAID0 drive. Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper") changed __blkdev_issue_write_zeroes() to read the max write zeroes value in the loop. This is not safe as max write zeroes may change in value. Specifically for the case of [0], the value goes to 0, and we get an infinite loop. Lift the limit reading out of the loop. [0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper") Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-lib.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)