diff mbox series

[4/4] dm: Fix dm-zoned-reclaim zone write pointer alignment

Message ID 20241206015240.6862-5-dlemoal@kernel.org (mailing list archive)
State New
Headers show
Series Zone write plugging fixes | expand

Commit Message

Damien Le Moal Dec. 6, 2024, 1:52 a.m. UTC
The zone reclaim processing of the dm-zoned device mapper uses
blkdev_issue_zeroout() to align the write pointer of a zone being used
for reclaiming zones, to write valid data blocks at the correct zone
start relative offset. However, the first call to blkdev_issue_zeroout()
will try to use hardware offload through a REQ_OP_WRITE_ZEROES
operation, if the device reports a non-zero max zero sectors limit. If
this operation fails, blkdev_issue_zeroout() falls back to using a
regular write operation with zeror-pages.

With the removal of the zone write plug automatic write pointer recovery
after a write error, the first attempt using a REQ_OP_WRITE_ZEROES
leaves the target zone write pointer out of sync with the drive current
value as the zone writ eplugging code advanced the zone write plug write
pointer offset on submission of the REQ_OP_WRITE_ZEROES operation. The
target zone is marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE, which causes
the fallback regular write operation to also fail.

blkdev_issue_zeroout() callers such as dmz_reclaim_align_wp() can
recover from such situation by executing a report zones and retrying the
call to blkdev_issue_zeroout() to handle this recoverable error
situation. Given that such pattern will be common to all users of
blkdev_issue_zeroout(), introduce the function
blkdev_issue_zone_zeroout() to automatically handle such recovery. This
function calls blkdev_issue_zeroout() with the BLKDEV_ZERO_NOFALLBACK
flag to intercept failures on the first execution which attempt to use
the device hardware offload with a REQ_OP_WRITE_ZEROES. If this attempt
fails, blkdev_report_zones() is executed to recover the target zone to a
good state and execute again blkdev_issue_zeroout() without the
BLKDEV_ZERO_NOFALLBACK flag.

Replacing the call to blkdev_issue_zeroout() with a call to
blkdev_issue_zone_zeroout() in dmz_reclaim_align_wp() thus solves
irrecoverable write errors triggered by the removal of the zone write
plugging automatic recovery (commit "block: Prevent potential deadlocks
in zone write plug error recovery").

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-zoned.c             | 56 +++++++++++++++++++++++++++++++++++
 drivers/md/dm-zoned-reclaim.c |  4 +--
 include/linux/blkdev.h        |  3 ++
 3 files changed, 61 insertions(+), 2 deletions(-)

Comments

kernel test robot Dec. 6, 2024, 6:10 a.m. UTC | #1
Hi Damien,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.13-rc1 next-20241205]
[cannot apply to device-mapper-dm/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Damien-Le-Moal/block-Use-a-zone-write-plug-BIO-work-for-REQ_NOWAIT-BIOs/20241206-095452
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20241206015240.6862-5-dlemoal%40kernel.org
patch subject: [PATCH 4/4] dm: Fix dm-zoned-reclaim zone write pointer alignment
config: arc-randconfig-002-20241206 (https://download.01.org/0day-ci/archive/20241206/202412061404.Utec5WgH-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241206/202412061404.Utec5WgH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412061404.Utec5WgH-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> block/blk-zoned.c:1751: warning: Excess function parameter 'flags' description in 'blkdev_issue_zone_zeroout'


vim +1751 block/blk-zoned.c

  1735	
  1736	/**
  1737	 * blkdev_issue_zone_zeroout - zero-fill a block range in a zone
  1738	 * @bdev:	blockdev to write
  1739	 * @sector:	start sector
  1740	 * @nr_sects:	number of sectors to write
  1741	 * @gfp_mask:	memory allocation flags (for bio_alloc)
  1742	 * @flags:	controls detailed behavior
  1743	 *
  1744	 * Description:
  1745	 *  Zero-fill a block range in a zone (@sector must be equal to the zone write
  1746	 *  pointer), handling potential errors due to the (initially unknown) lack of
  1747	 *  hardware offload (See blkdev_issue_zeroout()).
  1748	 */
  1749	int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector,
  1750			sector_t nr_sects, gfp_t gfp_mask)
> 1751	{
  1752		int ret;
  1753	
  1754		if (WARN_ON_ONCE(!bdev_is_zoned(bdev)))
  1755			return -EIO;
  1756	
  1757		ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
  1758					   BLKDEV_ZERO_NOFALLBACK);
  1759		if (ret != -EOPNOTSUPP)
  1760			return ret;
  1761	
  1762		/*
  1763		 * The failed call to blkdev_issue_zeroout() advanced the zone write
  1764		 * pointer. Undo this using a report zone to update the zone write
  1765		 * pointer to the correct current value.
  1766		 */
  1767		ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector);
  1768		if (ret != 1)
  1769			return ret < 0 ? ret : -EIO;
  1770	
  1771		/*
  1772		 * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a
  1773		 * regular write with zero-pages.
  1774		 */
  1775		return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0);
  1776	}
  1777	EXPORT_SYMBOL(blkdev_issue_zone_zeroout);
  1778
diff mbox series

Patch

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index d709f12d5935..38a7cd3fefa9 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -129,6 +129,9 @@  static int disk_report_zones_cb(struct blk_zone *zone, unsigned int idx,
 	if (disk->zone_wplugs_hash)
 		disk_zone_wplug_sync_wp_offset(disk, zone);
 
+	if (!args->user_cb)
+		return 0;
+
 	return args->user_cb(zone, idx, args->user_data);
 }
 
@@ -663,6 +666,16 @@  static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk,
 	disk_put_zone_wplug(zwplug);
 }
 
+static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector)
+{
+	struct disk_report_zones_cb_args args = {
+		.disk = disk,
+	};
+
+	return disk->fops->report_zones(disk, sector, 1,
+					disk_report_zones_cb, &args);
+}
+
 static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio,
 						  unsigned int wp_offset)
 {
@@ -1720,6 +1733,49 @@  int blk_revalidate_disk_zones(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones);
 
+/**
+ * blkdev_issue_zone_zeroout - zero-fill a block range in a zone
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to write
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @flags:	controls detailed behavior
+ *
+ * Description:
+ *  Zero-fill a block range in a zone (@sector must be equal to the zone write
+ *  pointer), handling potential errors due to the (initially unknown) lack of
+ *  hardware offload (See blkdev_issue_zeroout()).
+ */
+int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask)
+{
+	int ret;
+
+	if (WARN_ON_ONCE(!bdev_is_zoned(bdev)))
+		return -EIO;
+
+	ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
+				   BLKDEV_ZERO_NOFALLBACK);
+	if (ret != -EOPNOTSUPP)
+		return ret;
+
+	/*
+	 * The failed call to blkdev_issue_zeroout() advanced the zone write
+	 * pointer. Undo this using a report zone to update the zone write
+	 * pointer to the correct current value.
+	 */
+	ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector);
+	if (ret != 1)
+		return ret < 0 ? ret : -EIO;
+
+	/*
+	 * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a
+	 * regular write with zero-pages.
+	 */
+	return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0);
+}
+EXPORT_SYMBOL(blkdev_issue_zone_zeroout);
+
 #ifdef CONFIG_BLK_DEBUG_FS
 
 int queue_zone_wplugs_show(void *data, struct seq_file *m)
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index d58db9a27e6c..b085a929e64e 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -76,9 +76,9 @@  static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
 	 * pointer and the requested position.
 	 */
 	nr_blocks = block - wp_block;
-	ret = blkdev_issue_zeroout(dev->bdev,
+	ret = blkdev_issue_zone_zeroout(dev->bdev,
 				   dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block),
-				   dmz_blk2sect(nr_blocks), GFP_NOIO, 0);
+				   dmz_blk2sect(nr_blocks), GFP_NOIO);
 	if (ret) {
 		dmz_dev_err(dev,
 			    "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d",
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 15e7dfc013b7..696fdafcfe91 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1419,6 +1419,9 @@  static inline bool bdev_zone_is_seq(struct block_device *bdev, sector_t sector)
 	return is_seq;
 }
 
+int blkdev_issue_zone_zeroout(struct block_device *bdev, sector_t sector,
+			      sector_t nr_sects, gfp_t gfp_mask);
+
 static inline unsigned int queue_dma_alignment(const struct request_queue *q)
 {
 	return q->limits.dma_alignment;