From patchwork Mon Jun 10 07:56:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Damien Le Moal X-Patchwork-Id: 13691629 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD28A2CCB7; Mon, 10 Jun 2024 07:56:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006218; cv=none; b=pOr0wlSJigL5rI56j663+i4Dd5FAqDKg1Wjh5WWHc9324qO5/vEZP0WxGYh8xYad3OQ353eFBWeO3/WLrz2ykA0Whbg9B9vG37gN58qq2wA67TRsvCW6tNVK8fMwyTe/d+dmlhtxN1idpBE0yhx7Yvj859qbQ74jS0t3fpHX39c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006218; c=relaxed/simple; bh=fU7TfXef3r8dQkWoSseolnUvXb1vhzxWz6xAwj65vPw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SSJpXCLoq8xwRQqdd7lpF1Vdn+4eHUdhJh8dvTzDEqsEgjfaZz8A7rI1n2d1TlRDqBVWEFB8zvJGhl5szf97hvTvh/2ZFi0mhfusyQL+I1jF6WvKJB1QDhge3MgH4iWFcae4bY/l2m+wLd4xr42Ys+2Wdxd5LozAJUvNXAnyYxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=b+olpxU2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b+olpxU2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 836A9C4AF48; Mon, 10 Jun 2024 07:56:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718006218; bh=fU7TfXef3r8dQkWoSseolnUvXb1vhzxWz6xAwj65vPw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b+olpxU2Y9AQYgkzuwmXywbMCMRZ8zDrgTjOuhSgJUwycOs5qc5HBx7ob+LBh9i4D HT8KNghb7CMwjEhd+ZpNssdmM8y27Cm7FM+GVUnqCrAnvMWfxFq4tRXMKgYCi3bTNG c+76n2u8qrVNCpIXrj+HXjitX9QAtKCz8VOecivZn7zyAMwTIJ6bg8DQDDLvqn6iUr zq0rR2QmJNVGSsF2QiZBtKPAAy6enwkmOsi8EgymFUl8hkqKPjUd7iRwsC1iyuOFYp EjgeZ/IW+y/S6e5vFnvTPjTDtwJ/wfwtkRz3TybVa5MMRBIAgs/9uXOIRKDJhlJeJx 1im45hx1otGMA== From: Damien Le Moal To: Jens Axboe , linux-block@vger.kernel.org, dm-devel@lists.linux.dev, Mike Snitzer , Mikulas Patocka Cc: Christoph Hellwig , Benjamin Marzinski Subject: [PATCH v7 1/4] block: Improve checks on zone resource limits Date: Mon, 10 Jun 2024 16:56:52 +0900 Message-ID: <20240610075655.249301-2-dlemoal@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240610075655.249301-1-dlemoal@kernel.org> References: <20240610075655.249301-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Make sure that the zone resource limits of a zoned block device are correct by checking that: (a) If the device has a max active zones limit, make sure that the max open zones limit is lower than the max active zones limit. (b) If the device has zone resource limits, check that the limits values are lower than the number of sequential zones of the device. If it is not, assume that the zoned device has no limits by setting the limits to 0. For (a), a check is added to blk_validate_zoned_limits() and an error returned if the max open zones limit exceeds the value of the max active zone limit (if there is one). For (b), given that we need the number of sequential zones of the zoned device, this check is added to disk_update_zone_resources(). This is safe to do as that function is executed with the disk queue frozen and the check executed after queue_limits_start_update() which takes the queue limits lock. Of note is that the early return in this function for zoned devices that do not use zone write plugging (e.g. DM devices using native zone append) is moved to after the new check and adjustment of the zone resource limits so that the check applies to any zoned device. Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Niklas Cassel --- block/blk-settings.c | 8 ++++++++ block/blk-zoned.c | 20 ++++++++++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index effeb9a639bb..607f888fe93b 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -80,6 +80,14 @@ static int blk_validate_zoned_limits(struct queue_limits *lim) if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED))) return -EINVAL; + /* + * Given that active zones include open zones, the maximum number of + * open zones cannot be larger than the maximum number of active zones. + */ + if (lim->max_active_zones && + lim->max_open_zones > lim->max_active_zones) + return -EINVAL; + if (lim->zone_write_granularity < lim->logical_block_size) lim->zone_write_granularity = lim->logical_block_size; diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 08d7dfe8bd93..137842dbb59a 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -1650,8 +1650,22 @@ static int disk_update_zone_resources(struct gendisk *disk, return -ENODEV; } + lim = queue_limits_start_update(q); + + /* + * Some devices can advertize zone resource limits that are larger than + * the number of sequential zones of the zoned block device, e.g. a + * small ZNS namespace. For such case, assume that the zoned device has + * no zone resource limits. + */ + nr_seq_zones = disk->nr_zones - nr_conv_zones; + if (lim.max_open_zones >= nr_seq_zones) + lim.max_open_zones = 0; + if (lim.max_active_zones >= nr_seq_zones) + lim.max_active_zones = 0; + if (!disk->zone_wplugs_pool) - return 0; + goto commit; /* * If the device has no limit on the maximum number of open and active @@ -1660,9 +1674,6 @@ static int disk_update_zone_resources(struct gendisk *disk, * dynamic zone write plug allocation when simultaneously writing to * more zones than the size of the mempool. */ - lim = queue_limits_start_update(q); - - nr_seq_zones = disk->nr_zones - nr_conv_zones; pool_size = max(lim.max_open_zones, lim.max_active_zones); if (!pool_size) pool_size = min(BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE, nr_seq_zones); @@ -1676,6 +1687,7 @@ static int disk_update_zone_resources(struct gendisk *disk, lim.max_open_zones = 0; } +commit: return queue_limits_commit_update(q, &lim); } From patchwork Mon Jun 10 07:56:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Damien Le Moal X-Patchwork-Id: 13691630 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2B4C2CCB7; Mon, 10 Jun 2024 07:56:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006219; cv=none; b=iKVPvGf+TYYyJ+TKy77faYIuSswpZ6LaN+Rm/kKjA1KX6ZVNXjOsF+OXlVU4Rwi0S5KWE8uxtDtpXVgt3K8ykug/Nhw2/CrEKsqNTk7f7F3aqmbfICc6OBsDSMPTkxzO80TsQQKSGDO90L28feZv7g8hkBI12/AuOLX2O1wnQgY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006219; c=relaxed/simple; bh=cuBijLhy38z5QDUbyu6FZU4RwSoVcWll8pILiB4XYLs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=anVooge+lh0Pmq7HM8DM99WI4XT1+XJX0qCpyoXMZQh0NVAWNqkadKK+Kd8TWsIp+s7J2sdjeW0ktV8l12IQEucqa/8bhETUKFXT92aJgYmIETK6BrZpmUiyNGqhX8UXHvrPpWOmJ9Vq0gn53K/+CSuc96fDm4T3J7M2pcRge+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Gb8oT7xq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Gb8oT7xq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9A38FC4AF1C; Mon, 10 Jun 2024 07:56:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718006219; bh=cuBijLhy38z5QDUbyu6FZU4RwSoVcWll8pILiB4XYLs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Gb8oT7xqk+AXmDQGAZqtXZlZ8QMOAQYEmjjBXoq1medFQ+0bsc++3jZmM/C4rA9y7 6ZDWgcZXijq1gxj5w+K373bDvPO3CoaB8+4ou8PCOG6EuGP8YABWEuC1SYMph5yUxt RX1Mw5aMC8UYMBbXpZ4GHgNqVm6oexVNmf3BaUlwdOS/FqU3nzICDTVmD0rHbotoEw 9xMcosxA6mLi7TKvbyew0UtGqu/XcQ+ECdSjydUjdibeVgLnMXR+YQQsPaJiSf1I+H KBEnFuZX0aDUYXlcEBLeZPJOy5Nr9bNDQ3tjrTXr7hTwK1JaZbwaH/JdMJ0DyN6Y+v CaE+va2K22EIg== From: Damien Le Moal To: Jens Axboe , linux-block@vger.kernel.org, dm-devel@lists.linux.dev, Mike Snitzer , Mikulas Patocka Cc: Christoph Hellwig , Benjamin Marzinski Subject: [PATCH v7 2/4] dm: Call dm_revalidate_zones() after setting the queue limits Date: Mon, 10 Jun 2024 16:56:53 +0900 Message-ID: <20240610075655.249301-3-dlemoal@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240610075655.249301-1-dlemoal@kernel.org> References: <20240610075655.249301-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 dm_revalidate_zones() is called from dm_set_zone_restrictions() when the mapped device queue limits are not yet set. However, dm_revalidate_zones() calls blk_revalidate_disk_zones() and this function consults and modifies the mapped device queue limits. Thus, currently, blk_revalidate_disk_zones() operates on limits that are not yet initialized. Fix this by moving the call to dm_revalidate_zones() out of dm_set_zone_restrictions() and into dm_table_set_restrictions() after executing queue_limits_set(). To further cleanup dm_set_zones_restrictions(), the message about the type of zone append (native or emulated) is also moved inside dm_revalidate_zones(). Fixes: 1c0e720228ad ("dm: use queue_limits_set") Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Benjamin Marzinski Reviewed-by: Niklas Cassel --- drivers/md/dm-table.c | 15 +++++++++++---- drivers/md/dm-zone.c | 25 ++++++++++--------------- drivers/md/dm.h | 1 + 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index b2d5246cff21..2fc847af254d 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -2028,10 +2028,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, dm_table_any_dev_attr(t, device_is_not_random, NULL)) blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q); - /* - * For a zoned target, setup the zones related queue attributes - * and resources necessary for zone append emulation if necessary. - */ + /* For a zoned table, setup the zone related queue attributes. */ if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && limits->zoned) { r = dm_set_zones_restrictions(t, q, limits); if (r) @@ -2042,6 +2039,16 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, if (r) return r; + /* + * Now that the limits are set, check the zones mapped by the table + * and setup the resources for zone append emulation if necessary. + */ + if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && limits->zoned) { + r = dm_revalidate_zones(t, q); + if (r) + return r; + } + dm_update_crypto_profile(q, t); /* diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c index 5d66d916730e..75d0019a0649 100644 --- a/drivers/md/dm-zone.c +++ b/drivers/md/dm-zone.c @@ -166,14 +166,22 @@ static int dm_check_zoned_cb(struct blk_zone *zone, unsigned int idx, * blk_revalidate_disk_zones() function here as the mapped device is suspended * (this is called from __bind() context). */ -static int dm_revalidate_zones(struct mapped_device *md, struct dm_table *t) +int dm_revalidate_zones(struct dm_table *t, struct request_queue *q) { + struct mapped_device *md = t->md; struct gendisk *disk = md->disk; int ret; + if (!get_capacity(disk)) + return 0; + /* Revalidate only if something changed. */ - if (!disk->nr_zones || disk->nr_zones != md->nr_zones) + if (!disk->nr_zones || disk->nr_zones != md->nr_zones) { + DMINFO("%s using %s zone append", + disk->disk_name, + queue_emulates_zone_append(q) ? "emulated" : "native"); md->nr_zones = 0; + } if (md->nr_zones) return 0; @@ -240,9 +248,6 @@ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q, lim->max_zone_append_sectors = 0; } - if (!get_capacity(md->disk)) - return 0; - /* * Count conventional zones to check that the mapped device will indeed * have sequential write required zones. @@ -269,16 +274,6 @@ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q, return 0; } - if (!md->disk->nr_zones) { - DMINFO("%s using %s zone append", - md->disk->disk_name, - queue_emulates_zone_append(q) ? "emulated" : "native"); - } - - ret = dm_revalidate_zones(md, t); - if (ret < 0) - return ret; - if (!static_key_enabled(&zoned_enabled.key)) static_branch_enable(&zoned_enabled); return 0; diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 53ef8207fe2c..c984ecb64b1e 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -103,6 +103,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t); */ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *lim); +int dm_revalidate_zones(struct dm_table *t, struct request_queue *q); void dm_zone_endio(struct dm_io *io, struct bio *clone); #ifdef CONFIG_BLK_DEV_ZONED int dm_blk_report_zones(struct gendisk *disk, sector_t sector, From patchwork Mon Jun 10 07:56:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Damien Le Moal X-Patchwork-Id: 13691631 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09FF22CCB7; Mon, 10 Jun 2024 07:57:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006221; cv=none; b=sK8gDLam5xBVbGX/sMkHf+4n/fDoIs+twsfkonuv6QhchuTxZ8KJe/D+BUwOefmLKqd3EDDOSt7TjT/KGkTAf58iuowrPdRMITARhpWUs+BZq4himSZNbG6Q58osjEqatSI45N/uOiFj0Drqs4JF88VKV+lzm5TM09oNMVRrCDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006221; c=relaxed/simple; bh=Artj7voUvbLkdDZtJcuixaabN69fVR+zC5HYBC+Nx+Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nHfkmWr81BsazNBAHTh6+KQa0f9aXksnl4dHvEogdH5DM1W8Zg6lGUQ8Kb7MbICpXyFJFa4tvUBP6cw3cMZSke0qVmv1u6kaeaYf6eu+jQQBVEsx4iY84ASS8vqtKPrdJA66oXUIWyhX3uEfvy5yxn94PLI5XU7EhEpXNZgveww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qvR2LjUn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qvR2LjUn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9A4AC4AF48; Mon, 10 Jun 2024 07:56:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718006220; bh=Artj7voUvbLkdDZtJcuixaabN69fVR+zC5HYBC+Nx+Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qvR2LjUnp60Zq4YvvIp+iWZxoqkxCUzjaQyY/mImx/3L9K1RtuSW9yb8rcAT6jNcv vNOWPqJclzGtR92McG834INQtrk1aK/dfPbrWj1V1DLzzesXyYC1bGx7xPqD5Ql46+ Zh2xN9climkksKgDemAPocNhd5ll6sY3GsExqEBXInUZBswYmlOxDuERjfYcwqs3GT nbYMfoHqf/atgP/LfkqMS3JRPiHeJT81WM8yLdaDufekLYyHQ9CXs2p0dceLsm1pmR /opxb+uJiThGCPz7ed8NP5uApzKFAUEcsCYx2VvCWU/r7my9P+vxy8oN6F0Uu0CnlK Yn21LMSDN/9Ew== From: Damien Le Moal To: Jens Axboe , linux-block@vger.kernel.org, dm-devel@lists.linux.dev, Mike Snitzer , Mikulas Patocka Cc: Christoph Hellwig , Benjamin Marzinski Subject: [PATCH v7 3/4] dm: Improve zone resource limits handling Date: Mon, 10 Jun 2024 16:56:54 +0900 Message-ID: <20240610075655.249301-4-dlemoal@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240610075655.249301-1-dlemoal@kernel.org> References: <20240610075655.249301-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The generic stacking of limits implemented in the block layer cannot correctly handle stacking of zone resource limits (max open zones and max active zones) because these limits are for an entire device but the stacking may be for a portion of that device (e.g. a dm-linear target that does not cover an entire block device). As a result, when DM devices are created on top of zoned block devices, the DM device never has any zone resource limits advertized, which is only correct if all underlying target devices also have no zone resource limits. If at least one target device has resource limits, the user may see either performance issues (if the max open zone limit of the device is exceeded) or write I/O errors if the max active zone limit of one of the underlying target devices is exceeded. While it is very difficult to correctly and reliably stack zone resource limits in general, cases where targets are not sharing zone resources of the same device can be dealt with relatively easily. Such situation happens when a target maps all sequential zones of a zoned block device: for such mapping, other targets mapping other parts of the same zoned block device can only contain conventional zones and thus will not require any zone resource to correctly handle write operations. For a mapped device constructed with such targets, which includes mapped devices constructed with targets mapping entire zoned block devices, the zone resource limits can be reliably determined using the non-zero minimum of the zone resource limits of all targets. For mapped devices that include targets partially mapping the set of sequential write required zones of zoned block devices, instead of advertizing no zone resource limits, it is also better to set the mapped device limits to the non-zero minimum of the limits of all targets. In this case the limits for a target depend on the number of sequential zones being mapped: if this number of zone is larger than the limits, then the limits of the device apply and can be used. If on the other hand the target maps a number of zones smaller than the limits, then no limits is needed and we can assume that the target has no limits (limits set to 0). This commit improves zone resource limits handling as described above by modifying dm_set_zones_restrictions() to iterate the targets of a mapped device to evaluate the max open and max active zone limits. This relies on an internal "stacking" of the limits of the target devices combined with a direct counting of the number of sequential zones mapped by the targets. 1) For a target mapping an entire zoned block device, the limits for the target are set to the limits of the device. 2) For a target partially mapping a zoned block device, the number of mapped sequential zones is used to determine the limits: if the target maps more sequential write required zones than the device limits, then the limits of the device are used as-is. If the number of mapped sequential zones is lower than the limits, then we assume that the target has no limits (limits set to 0). As this evaluation is done for each target, the zone resource limits for the mapped device are evaluated as the non-zero minimum of the limits of all the targets. For configurations resulting in unreliable limits, i.e. a table containing a target partially mapping a zoned device, a warning message is issued. The counting of mapped sequential zones for the target is done using the new function dm_device_count_zones() which performs a report zones on the entire block device with the callback dm_device_count_zones_cb(). This count of mapped sequential zones is also used to determine if the mapped device contains only conventional zones. This allows simplifying dm_set_zones_restrictions() to not do a report zones just for this. For mapped devices mapping only conventional zones, as before, the mapped device is changed to a regular device by setting its zoned limit to false and clearing all its zone related limits. Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Benjamin Marzinski Reviewed-by: Niklas Cassel --- drivers/md/dm-zone.c | 175 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 145 insertions(+), 30 deletions(-) diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c index 75d0019a0649..d9f8b7c0957a 100644 --- a/drivers/md/dm-zone.c +++ b/drivers/md/dm-zone.c @@ -145,21 +145,6 @@ bool dm_is_zone_write(struct mapped_device *md, struct bio *bio) } } -/* - * Count conventional zones of a mapped zoned device. If the device - * only has conventional zones, do not expose it as zoned. - */ -static int dm_check_zoned_cb(struct blk_zone *zone, unsigned int idx, - void *data) -{ - unsigned int *nr_conv_zones = data; - - if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) - (*nr_conv_zones)++; - - return 0; -} - /* * Revalidate the zones of a mapped device to initialize resource necessary * for zone append emulation. Note that we cannot simply use the block layer @@ -228,13 +213,122 @@ static bool dm_table_supports_zone_append(struct dm_table *t) return true; } +struct dm_device_zone_count { + sector_t start; + sector_t len; + unsigned int total_nr_seq_zones; + unsigned int target_nr_seq_zones; +}; + +/* + * Count the total number of and the number of mapped sequential zones of a + * target zoned device. + */ +static int dm_device_count_zones_cb(struct blk_zone *zone, + unsigned int idx, void *data) +{ + struct dm_device_zone_count *zc = data; + + if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) { + zc->total_nr_seq_zones++; + if (zone->start >= zc->start && + zone->start < zc->start + zc->len) + zc->target_nr_seq_zones++; + } + + return 0; +} + +static int dm_device_count_zones(struct dm_dev *dev, + struct dm_device_zone_count *zc) +{ + int ret; + + ret = blkdev_report_zones(dev->bdev, 0, BLK_ALL_ZONES, + dm_device_count_zones_cb, zc); + if (ret < 0) + return ret; + if (!ret) + return -EIO; + return 0; +} + +struct dm_zone_resource_limits { + unsigned int mapped_nr_seq_zones; + struct queue_limits *lim; + bool reliable_limits; +}; + +static int device_get_zone_resource_limits(struct dm_target *ti, + struct dm_dev *dev, sector_t start, + sector_t len, void *data) +{ + struct dm_zone_resource_limits *zlim = data; + struct gendisk *disk = dev->bdev->bd_disk; + unsigned int max_open_zones, max_active_zones; + int ret; + struct dm_device_zone_count zc = { + .start = start, + .len = len, + }; + + /* + * If the target is not the whole device, the device zone resources may + * be shared between different targets. Check this by counting the + * number of mapped sequential zones: if this number is smaller than the + * total number of sequential zones of the target device, then resource + * sharing may happen and the zone limits will not be reliable. + */ + ret = dm_device_count_zones(dev, &zc); + if (ret) { + DMERR("Count %s zones failed %d", disk->disk_name, ret); + return ret; + } + + zlim->mapped_nr_seq_zones += zc.target_nr_seq_zones; + + /* + * If the target does not map any sequential zones, then we do not need + * any zone resource limits. + */ + if (!zc.target_nr_seq_zones) + return 0; + + /* + * If the target does not map all sequential zones, the limits + * will not be reliable. + */ + if (zc.target_nr_seq_zones < zc.total_nr_seq_zones) + zlim->reliable_limits = false; + + /* + * If the target maps less sequential zones than the limit values, then + * we do not have limits for this target. + */ + max_active_zones = disk->queue->limits.max_active_zones; + if (max_active_zones >= zc.target_nr_seq_zones) + max_active_zones = 0; + zlim->lim->max_active_zones = + min_not_zero(max_active_zones, zlim->lim->max_active_zones); + + max_open_zones = disk->queue->limits.max_open_zones; + if (max_open_zones >= zc.target_nr_seq_zones) + max_open_zones = 0; + zlim->lim->max_open_zones = + min_not_zero(max_open_zones, zlim->lim->max_open_zones); + + return 0; +} + int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *lim) { struct mapped_device *md = t->md; struct gendisk *disk = md->disk; - unsigned int nr_conv_zones = 0; - int ret; + struct dm_zone_resource_limits zlim = { + .reliable_limits = true, + .lim = lim, + }; /* * Check if zone append is natively supported, and if not, set the @@ -249,32 +343,53 @@ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q, } /* - * Count conventional zones to check that the mapped device will indeed - * have sequential write required zones. + * Determine the max open and max active zone limits for the mapped + * device by inspecting the zone resource limits and the zones mapped + * by each target. */ - md->zone_revalidate_map = t; - ret = dm_blk_report_zones(disk, 0, UINT_MAX, - dm_check_zoned_cb, &nr_conv_zones); - md->zone_revalidate_map = NULL; - if (ret < 0) { - DMERR("Check zoned failed %d", ret); - return ret; + for (unsigned int i = 0; i < t->num_targets; i++) { + struct dm_target *ti = dm_table_get_target(t, i); + + if (!ti->type->iterate_devices || + ti->type->iterate_devices(ti, + device_get_zone_resource_limits, &zlim)) { + DMERR("Could not determine %s zone resource limits", + disk->disk_name); + return -ENODEV; + } } /* - * If we only have conventional zones, expose the mapped device as - * a regular device. + * If we only have conventional zones mapped, expose the mapped device + + as a regular device. */ - if (nr_conv_zones >= ret) { + if (!zlim.mapped_nr_seq_zones) { lim->max_open_zones = 0; lim->max_active_zones = 0; + lim->max_zone_append_sectors = 0; + lim->zone_write_granularity = 0; + lim->chunk_sectors = 0; lim->zoned = false; clear_bit(DMF_EMULATE_ZONE_APPEND, &md->flags); + md->nr_zones = 0; disk->nr_zones = 0; return 0; } - if (!static_key_enabled(&zoned_enabled.key)) + /* + * Warn once (when the capacity is not yet set) if the mapped device is + * partially using zone resources of the target devices as that leads to + * unreliable limits, i.e. if another mapped device uses the same + * underlying devices, we cannot enforce zone limits to guarantee that + * writing will not lead to errors. Note that we really should return + * an error for such case but there is no easy way to find out if + * another mapped device uses the same underlying zoned devices. + */ + if (!get_capacity(disk) && !zlim.reliable_limits) + DMWARN("%s zone resource limits may be unreliable", + disk->disk_name); + + if (lim->zoned && !static_key_enabled(&zoned_enabled.key)) static_branch_enable(&zoned_enabled); return 0; } From patchwork Mon Jun 10 07:56:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Damien Le Moal X-Patchwork-Id: 13691632 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB3234DA0E; Mon, 10 Jun 2024 07:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006221; cv=none; b=P5l+7qUTwHS7iY2rhO5T2DYp9pMSLcbRWVcmliqYxxzk1rMfkuAXA6eA2NWzmHaatjj6i7bdZ/gUg1Dkj0L2lyVk9Obzv6dJgBsjBmlolKijCBM2SkXms0apRirn575DuHLjwIVVcRXF4L8iagIbDLZrGYOLQoIfrDe4e+wxIwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718006221; c=relaxed/simple; bh=WVsi4aWk9uObUnNrLTcjWz0sEzVnOanotwjk196nfqU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c2sWOvsYAwf+bk6usDiomS2RtQMzfvUX3R4Ychb70GDYI7bw1403wmpids95R8NZ4d8g7abvTkoW8Hl3nlYmVdZ6fSnxem/1McDmyrp4geEn3i8UCI7/54rOwW33UU1D13waCeU4LLfVhhoas5G5i0N+pqcr+tiJozJ6IdxUixM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uzjxPlTj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uzjxPlTj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9686C4AF1C; Mon, 10 Jun 2024 07:57:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718006221; bh=WVsi4aWk9uObUnNrLTcjWz0sEzVnOanotwjk196nfqU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uzjxPlTjvFA26OYwYWn3Qm0qzObwapG/N9liaedG119s+DkP1Thxx/bZkrm1MGs9w GF/QPMn9xBJttemnGABSImCGbW7CfBS1uMmcdJg6BBOwGnmCMZnYl9DvsiIZvetXdK iilt7x0UcUF/RPvupEkJoFNWiE1z6XDyBXoXNYr9DHSjkLsaPsVfVfQzAds8mhZBj+ v+H4CnDKPzreu/qiFsTiaD3RoC+StbFTISG6kJjClaaXZ2K6MmSv2rGeHRzfqwnZjf 1uLY9Wh+5CTFg/K3qgNuIWrYXefGbnx6iR6M7gBwLK/imBVt9AQ65oKYUg13ZpG4De e2DV1zo7oqOMQ== From: Damien Le Moal To: Jens Axboe , linux-block@vger.kernel.org, dm-devel@lists.linux.dev, Mike Snitzer , Mikulas Patocka Cc: Christoph Hellwig , Benjamin Marzinski Subject: [PATCH v7 4/4] dm: Remove unused macro DM_ZONE_INVALID_WP_OFST Date: Mon, 10 Jun 2024 16:56:55 +0900 Message-ID: <20240610075655.249301-5-dlemoal@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240610075655.249301-1-dlemoal@kernel.org> References: <20240610075655.249301-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 With the switch to using the zone append emulation of the block layer zone write plugging, the macro DM_ZONE_INVALID_WP_OFST is no longer used in dm-zone.c. Remove its definition. Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation") Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Benjamin Marzinski Reviewed-by: Niklas Cassel --- drivers/md/dm-zone.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c index d9f8b7c0957a..70719bf32a2e 100644 --- a/drivers/md/dm-zone.c +++ b/drivers/md/dm-zone.c @@ -13,8 +13,6 @@ #define DM_MSG_PREFIX "zone" -#define DM_ZONE_INVALID_WP_OFST UINT_MAX - /* * For internal zone reports bypassing the top BIO submission path. */ @@ -285,8 +283,6 @@ static int device_get_zone_resource_limits(struct dm_target *ti, return ret; } - zlim->mapped_nr_seq_zones += zc.target_nr_seq_zones; - /* * If the target does not map any sequential zones, then we do not need * any zone resource limits. @@ -317,6 +313,13 @@ static int device_get_zone_resource_limits(struct dm_target *ti, zlim->lim->max_open_zones = min_not_zero(max_open_zones, zlim->lim->max_open_zones); + /* + * Also count the total number of sequential zones for the mapped + * device so that when we are done inspecting all its targets, we are + * able to check if the mapped device actually has any sequential zones. + */ + zlim->mapped_nr_seq_zones += zc.target_nr_seq_zones; + return 0; }