diff mbox series

[v10,05/13] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size

Message ID 20220811143043.126029-6-p.raghav@samsung.com (mailing list archive)
State New, archived
Headers show
Series support zoned block devices with non-power-of-2 zone sizes | expand

Commit Message

Pankaj Raghav Aug. 11, 2022, 2:30 p.m. UTC
Remove the condition which disallows non-power_of_2 zone size ZNS drive
to be updated and use generic method to calculate number of zones
instead of relying on log and shift based calculation on zone size.

The power_of_2 calculation has been replaced directly with generic
calculation without special handling. Both modified functions are not
used in hot paths, they are only used during initialization &
revalidation of the ZNS device.

As rounddown macro from math.h does not work for 32 bit architectures,
round down operation is open coded.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 drivers/nvme/host/zns.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

Comments

Keith Busch Aug. 16, 2022, 9:14 p.m. UTC | #1
On Thu, Aug 11, 2022 at 04:30:35PM +0200, Pankaj Raghav wrote:
> @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
>  	}
>  
>  	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
> -	if (!is_power_of_2(ns->zsze)) {
> -		dev_warn(ns->ctrl->device,
> -			"invalid zone size:%llu for namespace:%u\n",
> -			ns->zsze, ns->head->ns_id);
> -		status = -ENODEV;
> -		goto free_data;
> -	}

We used to bail out early if the format wasn't supported, which wouldn't create
the namespace.

Now you are relying on blk_revalidate_disk_zones() to report an error for
invalid format, but the handling for an error there is different: the namespace
still gets created. I'm not sure if that's a problem, but it's different.
Pankaj Raghav Aug. 17, 2022, 7:28 a.m. UTC | #2
Hi Keith,

On 2022-08-16 23:14, Keith Busch wrote:
> On Thu, Aug 11, 2022 at 04:30:35PM +0200, Pankaj Raghav wrote:
>> @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
>>  	}
>>  
>>  	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
>> -	if (!is_power_of_2(ns->zsze)) {
>> -		dev_warn(ns->ctrl->device,
>> -			"invalid zone size:%llu for namespace:%u\n",
>> -			ns->zsze, ns->head->ns_id);
>> -		status = -ENODEV;
>> -		goto free_data;
>> -	}
> 
> We used to bail out early if the format wasn't supported, which wouldn't create
> the namespace.
> 
> Now you are relying on blk_revalidate_disk_zones() to report an error for
> invalid format, but the handling for an error there is different: the namespace
> still gets created. I'm not sure if that's a problem, but it's different.
That is right but it is not a problem. Once the block layer supports the
non-po2 zone sizes, we don't need to bail out here because it will be
compatible. If unequal zone sizes are found, block layer will report an
error during blk_revalidate_disk_zones() which is the current behavior anyway.

Other checks such as zone operation support are still in
nvme_update_zone_info() resulting in an early bail out if violated.
diff mbox series

Patch

diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 12316ab51bda..73e4ad495ae8 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -101,13 +101,6 @@  int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
 	}
 
 	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
-	if (!is_power_of_2(ns->zsze)) {
-		dev_warn(ns->ctrl->device,
-			"invalid zone size:%llu for namespace:%u\n",
-			ns->zsze, ns->head->ns_id);
-		status = -ENODEV;
-		goto free_data;
-	}
 
 	disk_set_zoned(ns->disk, BLK_ZONED_HM);
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
@@ -129,7 +122,7 @@  static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
 				   sizeof(struct nvme_zone_descriptor);
 
 	nr_zones = min_t(unsigned int, nr_zones,
-			 get_capacity(ns->disk) >> ilog2(ns->zsze));
+			 div64_u64(get_capacity(ns->disk), ns->zsze));
 
 	bufsize = sizeof(struct nvme_zone_report) +
 		nr_zones * sizeof(struct nvme_zone_descriptor);
@@ -182,6 +175,7 @@  int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
 	int ret, zone_idx = 0;
 	unsigned int nz, i;
 	size_t buflen;
+	u64 remainder = 0;
 
 	if (ns->head->ids.csi != NVME_CSI_ZNS)
 		return -EINVAL;
@@ -197,7 +191,11 @@  int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
 	c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
 	c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
 
-	sector &= ~(ns->zsze - 1);
+	/*
+	 * Round down the sector value to the nearest zone start
+	 */
+	div64_u64_rem(sector, ns->zsze, &remainder);
+	sector -= remainder;
 	while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) {
 		memset(report, 0, buflen);