Message ID | 20241208225758.219228-5-dlemoal@kernel.org (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | Zone write plugging fixes | expand |
On Mon, Dec 09, 2024 at 07:57:58AM +0900, Damien Le Moal wrote: > +int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector, > + sector_t nr_sects, gfp_t gfp_mask) Nit: Would blk_zone_issue_zeroout be a better name? Also I think this needs to be re-ordered before the previous patch to preserve bisectability. > +EXPORT_SYMBOL(blkdev_issue_zone_zeroout); EXPORT_SYMBOL_GPL is used for all other zoned code, I'd do the same here for consitency.
On 12/9/24 16:44, Christoph Hellwig wrote: > On Mon, Dec 09, 2024 at 07:57:58AM +0900, Damien Le Moal wrote: >> +int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector, >> + sector_t nr_sects, gfp_t gfp_mask) > > Nit: Would blk_zone_issue_zeroout be a better name? Yes. > Also I think this needs to be re-ordered before the previous patch to > preserve bisectability. The problem with doing that is that there is absolutely nothing to patch/fix before the previous patch, since the "recovery/report zones" was done automatically. So if anything, maybe I should just squash this patch together with the previous one to be consistent against bisect ? That does make sense since this patch is needed *because* of the previous patch change. >> +EXPORT_SYMBOL(blkdev_issue_zone_zeroout); > > EXPORT_SYMBOL_GPL is used for all other zoned code, I'd do the same > here for consitency. Indeed. Will do.
On Mon, Dec 09, 2024 at 05:38:40PM +0900, Damien Le Moal wrote: > > The problem with doing that is that there is absolutely nothing to patch/fix > before the previous patch, since the "recovery/report zones" was done > automatically. So if anything, maybe I should just squash this patch together > with the previous one to be consistent against bisect ? That does make sense > since this patch is needed *because* of the previous patch change. But it's also harmless to do the extra zone report. So just state that in the commit log.
On 12/9/24 17:39, Christoph Hellwig wrote: > On Mon, Dec 09, 2024 at 05:38:40PM +0900, Damien Le Moal wrote: >> >> The problem with doing that is that there is absolutely nothing to patch/fix >> before the previous patch, since the "recovery/report zones" was done >> automatically. So if anything, maybe I should just squash this patch together >> with the previous one to be consistent against bisect ? That does make sense >> since this patch is needed *because* of the previous patch change. > > But it's also harmless to do the extra zone report. So just state that > in the commit log. OK.
diff --git a/block/blk-zoned.c b/block/blk-zoned.c index d709f12d5935..c5400792de13 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -129,6 +129,9 @@ static int disk_report_zones_cb(struct blk_zone *zone, unsigned int idx, if (disk->zone_wplugs_hash) disk_zone_wplug_sync_wp_offset(disk, zone); + if (!args->user_cb) + return 0; + return args->user_cb(zone, idx, args->user_data); } @@ -663,6 +666,16 @@ static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, disk_put_zone_wplug(zwplug); } +static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector) +{ + struct disk_report_zones_cb_args args = { + .disk = disk, + }; + + return disk->fops->report_zones(disk, sector, 1, + disk_report_zones_cb, &args); +} + static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, unsigned int wp_offset) { @@ -1720,6 +1733,48 @@ int blk_revalidate_disk_zones(struct gendisk *disk) } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones); +/** + * blkdev_issue_zone_zeroout - zero-fill a block range in a zone + * @bdev: blockdev to write + * @sector: start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Zero-fill a block range in a zone (@sector must be equal to the zone write + * pointer), handling potential errors due to the (initially unknown) lack of + * hardware offload (See blkdev_issue_zeroout()). + */ +int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask) +{ + int ret; + + if (WARN_ON_ONCE(!bdev_is_zoned(bdev))) + return -EIO; + + ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, + BLKDEV_ZERO_NOFALLBACK); + if (ret != -EOPNOTSUPP) + return ret; + + /* + * The failed call to blkdev_issue_zeroout() advanced the zone write + * pointer. Undo this using a report zone to update the zone write + * pointer to the correct current value. + */ + ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector); + if (ret != 1) + return ret < 0 ? ret : -EIO; + + /* + * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a + * regular write with zero-pages. + */ + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0); +} +EXPORT_SYMBOL(blkdev_issue_zone_zeroout); + #ifdef CONFIG_BLK_DEBUG_FS int queue_zone_wplugs_show(void *data, struct seq_file *m) diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c index d58db9a27e6c..b085a929e64e 100644 --- a/drivers/md/dm-zoned-reclaim.c +++ b/drivers/md/dm-zoned-reclaim.c @@ -76,9 +76,9 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone, * pointer and the requested position. */ nr_blocks = block - wp_block; - ret = blkdev_issue_zeroout(dev->bdev, + ret = blkdev_issue_zone_zeroout(dev->bdev, dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block), - dmz_blk2sect(nr_blocks), GFP_NOIO, 0); + dmz_blk2sect(nr_blocks), GFP_NOIO); if (ret) { dmz_dev_err(dev, "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d", diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 15e7dfc013b7..696fdafcfe91 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1419,6 +1419,9 @@ static inline bool bdev_zone_is_seq(struct block_device *bdev, sector_t sector) return is_seq; } +int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask); + static inline unsigned int queue_dma_alignment(const struct request_queue *q) { return q->limits.dma_alignment;
The zone reclaim processing of the dm-zoned device mapper uses blkdev_issue_zeroout() to align the write pointer of a zone being used for reclaiming zones, to write valid data blocks at the correct zone start relative offset. However, the first call to blkdev_issue_zeroout() will try to use hardware offload through a REQ_OP_WRITE_ZEROES operation, if the device reports a non-zero max zero sectors limit. If this operation fails, blkdev_issue_zeroout() falls back to using a regular write operation with zeror-pages. With the removal of the zone write plug automatic write pointer recovery after a write error, the first attempt using a REQ_OP_WRITE_ZEROES leaves the target zone write pointer out of sync with the drive current value as the zone writ eplugging code advanced the zone write plug write pointer offset on submission of the REQ_OP_WRITE_ZEROES operation. The target zone is marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE, which causes the fallback regular write operation to also fail. blkdev_issue_zeroout() callers such as dmz_reclaim_align_wp() can recover from such situation by executing a report zones and retrying the call to blkdev_issue_zeroout() to handle this recoverable error situation. Given that such pattern will be common to all users of blkdev_issue_zeroout(), introduce the function blkdev_issue_zone_zeroout() to automatically handle such recovery. This function calls blkdev_issue_zeroout() with the BLKDEV_ZERO_NOFALLBACK flag to intercept failures on the first execution which attempt to use the device hardware offload with a REQ_OP_WRITE_ZEROES. If this attempt fails, blkdev_report_zones() is executed to recover the target zone to a good state and execute again blkdev_issue_zeroout() without the BLKDEV_ZERO_NOFALLBACK flag. Replacing the call to blkdev_issue_zeroout() with a call to blkdev_issue_zone_zeroout() in dmz_reclaim_align_wp() thus solves irrecoverable write errors triggered by the removal of the zone write plugging automatic recovery (commit "block: Prevent potential deadlocks in zone write plug error recovery"). Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> --- block/blk-zoned.c | 55 +++++++++++++++++++++++++++++++++++ drivers/md/dm-zoned-reclaim.c | 4 +-- include/linux/blkdev.h | 3 ++ 3 files changed, 60 insertions(+), 2 deletions(-)