diff mbox series

dm-table:fix zone block_device not aligned with zonesize

Message ID 20240704151549.1365-1-lidong@vivo.com (mailing list archive)
State New
Headers show
Series dm-table:fix zone block_device not aligned with zonesize | expand

Commit Message

Li Dong July 4, 2024, 3:15 p.m. UTC
For zone block devices, device_area_is_invalid may return an incorrect 
value.

Failure log:
[   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
h/w zone size 3244032 of sde
[   19.337665]: device-mapper: core: Cannot calculate initial queue limits
[   19.337667]: device-mapper: ioctl: unable to set up device queue for 
new table.

Actually, the device's zone length is aligned to the zonesize.

Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
Signed-off-by: Li Dong <lidong@vivo.com>
---
 drivers/md/dm-table.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mikulas Patocka July 4, 2024, 3:29 p.m. UTC | #1
On Thu, 4 Jul 2024, Li Dong wrote:

> For zone block devices, device_area_is_invalid may return an incorrect 
> value.
> 
> Failure log:
> [   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
> h/w zone size 3244032 of sde
> [   19.337665]: device-mapper: core: Cannot calculate initial queue limits
> [   19.337667]: device-mapper: ioctl: unable to set up device queue for 
> new table.
> 
> Actually, the device's zone length is aligned to the zonesize.
> 
> Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
> Signed-off-by: Li Dong <lidong@vivo.com>
> ---
>  drivers/md/dm-table.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 33b7a1844ed4..0bddae0bee3c 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -257,7 +257,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>  	if (bdev_is_zoned(bdev)) {
>  		unsigned int zone_sectors = bdev_zone_sectors(bdev);
>  
> -		if (start & (zone_sectors - 1)) {
> +		if (start % zone_sectors) {
>  			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
>  			      dm_device_name(ti->table->md),
>  			      (unsigned long long)start,
> @@ -274,7 +274,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>  		 * devices do not end up with a smaller zone in the middle of
>  		 * the sector range.
>  		 */
> -		if (len & (zone_sectors - 1)) {
> +		if (len % zone_sectors) {
>  			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
>  			      dm_device_name(ti->table->md),
>  			      (unsigned long long)len,
> -- 
> 2.31.1.windows.1

Hi

This probably won't compile on 32-bit architectures because the operators 
for 64-bit divide and modulo don't work there.

Please, rework the patch using dm_sector_div64.

Mikulas
Mikulas Patocka July 4, 2024, 7 p.m. UTC | #2
> On Thu, 4 Jul 2024, Li Dong wrote:
> 
> > For zone block devices, device_area_is_invalid may return an incorrect 
> > value.
> > 
> > Failure log:
> > [   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
> > h/w zone size 3244032 of sde
> > [   19.337665]: device-mapper: core: Cannot calculate initial queue limits
> > [   19.337667]: device-mapper: ioctl: unable to set up device queue for 
> > new table.
> > 
> > Actually, the device's zone length is aligned to the zonesize.
> > 
> > Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
> > Signed-off-by: Li Dong <lidong@vivo.com>
> > ---
> >  drivers/md/dm-table.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 33b7a1844ed4..0bddae0bee3c 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -257,7 +257,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
> >  	if (bdev_is_zoned(bdev)) {
> >  		unsigned int zone_sectors = bdev_zone_sectors(bdev);
> >  
> > -		if (start & (zone_sectors - 1)) {
> > +		if (start % zone_sectors) {
> >  			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
> >  			      dm_device_name(ti->table->md),
> >  			      (unsigned long long)start,
> > @@ -274,7 +274,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
> >  		 * devices do not end up with a smaller zone in the middle of
> >  		 * the sector range.
> >  		 */
> > -		if (len & (zone_sectors - 1)) {
> > +		if (len % zone_sectors) {
> >  			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
> >  			      dm_device_name(ti->table->md),
> >  			      (unsigned long long)len,
> > -- 
> > 2.31.1.windows.1

I grepped the kernel for bdev_zone_sectors and there are more assumptions 
that bdev_zone_sectors is a power of 2.

drivers/md/dm-zone.c:           sector_t mask = bdev_zone_sectors(disk->part0) - 1
drivers/nvme/target/zns.c:      if (get_capacity(bd_disk) & (bdev_zone_sectors(ns->bdev) - 1))
drivers/nvme/target/zns.c:      if (sect & (bdev_zone_sectors(req->ns->bdev) - 1)) {
fs/zonefs/super.c:      sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
fs/btrfs/zoned.c:       return (sector_t)zone_number << ilog2(bdev_zone_sectors(bdev));
fs/btrfs/zoned.c:	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
include/linux/blkdev.h: return sector & (bdev_zone_sectors(bdev) - 1);
fs/f2fs/super.c:	if (nr_sectors & (zone_sectors - 1))

So, if we want to support non-power-of-2 zone size, we need some 
systematic fix. Now it appears that Linux doesn't even attempt to support 
disks non-power-of-2 zone size.

I added Damien Le Moal so that he can help with testing disks with 
non-power-of-2 zone size (if WD is actually making them).

Mikulas
Pankaj Raghav (Samsung) July 4, 2024, 9:43 p.m. UTC | #3
> I grepped the kernel for bdev_zone_sectors and there are more assumptions 
> that bdev_zone_sectors is a power of 2.
> 
> drivers/md/dm-zone.c:           sector_t mask = bdev_zone_sectors(disk->part0) - 1
> drivers/nvme/target/zns.c:      if (get_capacity(bd_disk) & (bdev_zone_sectors(ns->bdev) - 1))
> drivers/nvme/target/zns.c:      if (sect & (bdev_zone_sectors(req->ns->bdev) - 1)) {
> fs/zonefs/super.c:      sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
> fs/btrfs/zoned.c:       return (sector_t)zone_number << ilog2(bdev_zone_sectors(bdev));
> fs/btrfs/zoned.c:	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
> include/linux/blkdev.h: return sector & (bdev_zone_sectors(bdev) - 1);
> fs/f2fs/super.c:	if (nr_sectors & (zone_sectors - 1))
> 
> So, if we want to support non-power-of-2 zone size, we need some 
> systematic fix. Now it appears that Linux doesn't even attempt to support 
> disks non-power-of-2 zone size.

FYI, I sent the patches to add non-power-of-2 zone size support to the
block layer [0] a long time back. Sadly, it was not finally merged upstream.
Android wanted this support, so for now it is in the Android tree.
Damien Le Moal July 5, 2024, 12:13 a.m. UTC | #4
On 7/5/24 04:00, Mikulas Patocka wrote:
> 
> 
>> On Thu, 4 Jul 2024, Li Dong wrote:
>>
>>> For zone block devices, device_area_is_invalid may return an incorrect 
>>> value.
>>>
>>> Failure log:
>>> [   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
>>> h/w zone size 3244032 of sde
>>> [   19.337665]: device-mapper: core: Cannot calculate initial queue limits
>>> [   19.337667]: device-mapper: ioctl: unable to set up device queue for 
>>> new table.
>>>
>>> Actually, the device's zone length is aligned to the zonesize.
>>>
>>> Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
>>> Signed-off-by: Li Dong <lidong@vivo.com>
>>> ---
>>>  drivers/md/dm-table.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>>> index 33b7a1844ed4..0bddae0bee3c 100644
>>> --- a/drivers/md/dm-table.c
>>> +++ b/drivers/md/dm-table.c
>>> @@ -257,7 +257,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>>  	if (bdev_is_zoned(bdev)) {
>>>  		unsigned int zone_sectors = bdev_zone_sectors(bdev);
>>>  
>>> -		if (start & (zone_sectors - 1)) {
>>> +		if (start % zone_sectors) {
>>>  			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
>>>  			      dm_device_name(ti->table->md),
>>>  			      (unsigned long long)start,
>>> @@ -274,7 +274,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>>  		 * devices do not end up with a smaller zone in the middle of
>>>  		 * the sector range.
>>>  		 */
>>> -		if (len & (zone_sectors - 1)) {
>>> +		if (len % zone_sectors) {
>>>  			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
>>>  			      dm_device_name(ti->table->md),
>>>  			      (unsigned long long)len,
>>> -- 
>>> 2.31.1.windows.1
> 
> I grepped the kernel for bdev_zone_sectors and there are more assumptions 
> that bdev_zone_sectors is a power of 2.
> 
> drivers/md/dm-zone.c:           sector_t mask = bdev_zone_sectors(disk->part0) - 1
> drivers/nvme/target/zns.c:      if (get_capacity(bd_disk) & (bdev_zone_sectors(ns->bdev) - 1))
> drivers/nvme/target/zns.c:      if (sect & (bdev_zone_sectors(req->ns->bdev) - 1)) {
> fs/zonefs/super.c:      sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
> fs/btrfs/zoned.c:       return (sector_t)zone_number << ilog2(bdev_zone_sectors(bdev));
> fs/btrfs/zoned.c:	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
> include/linux/blkdev.h: return sector & (bdev_zone_sectors(bdev) - 1);
> fs/f2fs/super.c:	if (nr_sectors & (zone_sectors - 1))
> 
> So, if we want to support non-power-of-2 zone size, we need some 
> systematic fix. Now it appears that Linux doesn't even attempt to support 
> disks non-power-of-2 zone size.

Correct. We currently do not support zoned devices with a zone size that is not
a power of 2 number of LBAs. Such drives are rejected by the block layer. So I
am surprised that Li could even reach a DM error. It means that his kernel was
already patched to accept zoned drives with a zone size that is not a power of 2.

> I added Damien Le Moal so that he can help with testing disks with 
> non-power-of-2 zone size (if WD is actually making them).

No, I do not have such drive. The vast majority of zoned device users today are
SMR HDD users, and in that space, no one wants a non power of 2 zone size drive.
As far as I know, the main push is from zoned-UFS users for Android, as Pankaj
mentioned.

As you rightly guessed, and as Pankaj confirmed, supporting non power of 2 zone
size drives requires a lot more changes than just the fix Li sent for DM. Pankaj
initial patch set that did not make it upstream (but is in Android) would need
to be reworked given the extensive changes that happened since then (zone write
locking replaced with zone write plugging, queue limits changes, etc).
Damien Le Moal July 5, 2024, 1:08 a.m. UTC | #5
On 7/5/24 00:29, Mikulas Patocka wrote:
> 
> 
> On Thu, 4 Jul 2024, Li Dong wrote:
> 
>> For zone block devices, device_area_is_invalid may return an incorrect 
>> value.
>>
>> Failure log:
>> [   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
>> h/w zone size 3244032 of sde
>> [   19.337665]: device-mapper: core: Cannot calculate initial queue limits
>> [   19.337667]: device-mapper: ioctl: unable to set up device queue for 
>> new table.
>>
>> Actually, the device's zone length is aligned to the zonesize.
>>
>> Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
>> Signed-off-by: Li Dong <lidong@vivo.com>
>> ---
>>  drivers/md/dm-table.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index 33b7a1844ed4..0bddae0bee3c 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -257,7 +257,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>  	if (bdev_is_zoned(bdev)) {
>>  		unsigned int zone_sectors = bdev_zone_sectors(bdev);
>>  
>> -		if (start & (zone_sectors - 1)) {
>> +		if (start % zone_sectors) {
>>  			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
>>  			      dm_device_name(ti->table->md),
>>  			      (unsigned long long)start,
>> @@ -274,7 +274,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>  		 * devices do not end up with a smaller zone in the middle of
>>  		 * the sector range.
>>  		 */
>> -		if (len & (zone_sectors - 1)) {
>> +		if (len % zone_sectors) {
>>  			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
>>  			      dm_device_name(ti->table->md),
>>  			      (unsigned long long)len,
>> -- 
>> 2.31.1.windows.1
> 
> Hi
> 
> This probably won't compile on 32-bit architectures because the operators 
> for 64-bit divide and modulo don't work there.
> 
> Please, rework the patch using dm_sector_div64.

This patch alone will not allow non power of 2 zone size drives to be supported
as the block layer will reject such drive in the first place. This patch should
be part of a larger series enabling such support. On its own, it is a NACK from
me for this change.
diff mbox series

Patch

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 33b7a1844ed4..0bddae0bee3c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -257,7 +257,7 @@  static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
 	if (bdev_is_zoned(bdev)) {
 		unsigned int zone_sectors = bdev_zone_sectors(bdev);
 
-		if (start & (zone_sectors - 1)) {
+		if (start % zone_sectors) {
 			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
 			      dm_device_name(ti->table->md),
 			      (unsigned long long)start,
@@ -274,7 +274,7 @@  static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
 		 * devices do not end up with a smaller zone in the middle of
 		 * the sector range.
 		 */
-		if (len & (zone_sectors - 1)) {
+		if (len % zone_sectors) {
 			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
 			      dm_device_name(ti->table->md),
 			      (unsigned long long)len,