diff mbox series

[4/4] dm-zoned: allow for device size smaller than the capacity

Message ID 20200327071459.67796-5-hare@suse.de (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series dm-zoned: Metadata V2 | expand

Commit Message

Hannes Reinecke March 27, 2020, 7:14 a.m. UTC
dm-zoned requires several zones for metadata and chunk bitmaps,
so it cannot expose the entire capacity as the device size.
Originally the code would check for the capacity being equal to
the device size, which is arguably wrong.
So relax this check and increase the interface version number
to signal to userspace that it can set a smaller device size.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Damien Le Moal March 31, 2020, 12:49 a.m. UTC | #1
On 2020/03/27 16:15, Hannes Reinecke wrote:
> dm-zoned requires several zones for metadata and chunk bitmaps,
> so it cannot expose the entire capacity as the device size.
> Originally the code would check for the capacity being equal to
> the device size, which is arguably wrong.
> So relax this check and increase the interface version number
> to signal to userspace that it can set a smaller device size.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-target.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 7ec9dde24516..89a825d1034e 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -715,7 +715,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path)
>  	aligned_capacity = dev->capacity &
>  				~((sector_t)blk_queue_zone_sectors(q) - 1);
>  	if (ti->begin ||
> -	    ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) {
> +	    ((ti->len > dev->capacity) && (ti->len > aligned_capacity))) {
>  		ti->error = "Partial mapping not supported";

The message is now wrong. Also, the condition can now be simplified to:

if ((ti->begin + ti->len) > aligned_capacity) {

Since aligned capacity is equal or smaller than dev capacity. And we have to
account for the potential non-zero begin.

>  		ret = -EINVAL;
>  		goto err;
> @@ -1008,7 +1008,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>  
>  static struct target_type dmz_type = {
>  	.name		 = "zoned",
> -	.version	 = {1, 2, 0},
> +	.version	 = {1, 3, 0},
>  	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
>  	.module		 = THIS_MODULE,
>  	.ctr		 = dmz_ctr,
> 

I do not think this is nearly enough: dmz_init_zones() is still considering the
entire drive and does a zone report from 0 on the backend bdev. It should start
at ti->begin and up to ti->begin+ti->len, no ?

Furthermore, this introduce a change in the meaning of the zone ID. Since this
is set to the index of the zone in the report (patch 1), if the mapping is
partial and the zone report does not start at 0, then zone ID is not zone number
on the device anymore. So dmz_start_block() needs to be offset by ti->begin
otherwise IOs will go to the wrong zones.
Hannes Reinecke March 31, 2020, 8:53 a.m. UTC | #2
On 3/31/20 2:49 AM, Damien Le Moal wrote:
> On 2020/03/27 16:15, Hannes Reinecke wrote:
>> dm-zoned requires several zones for metadata and chunk bitmaps,
>> so it cannot expose the entire capacity as the device size.
>> Originally the code would check for the capacity being equal to
>> the device size, which is arguably wrong.
>> So relax this check and increase the interface version number
>> to signal to userspace that it can set a smaller device size.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-target.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>> index 7ec9dde24516..89a825d1034e 100644
>> --- a/drivers/md/dm-zoned-target.c
>> +++ b/drivers/md/dm-zoned-target.c
>> @@ -715,7 +715,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path)
>>   	aligned_capacity = dev->capacity &
>>   				~((sector_t)blk_queue_zone_sectors(q) - 1);
>>   	if (ti->begin ||
>> -	    ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) {
>> +	    ((ti->len > dev->capacity) && (ti->len > aligned_capacity))) {
>>   		ti->error = "Partial mapping not supported";
> 
> The message is now wrong. Also, the condition can now be simplified to:
> 
> if ((ti->begin + ti->len) > aligned_capacity) {
> 
> Since aligned capacity is equal or smaller than dev capacity. And we have to
> account for the potential non-zero begin.
> 
_Actually_ I would forbid for 'ti->begin' to be anything other than 0.
For a zoned device there is no point in allowing for partial handling at 
all.
Problem is that 'dev->capacity' is the capacity of the zoned block 
device, whereas 'ti-len' is the exported capacity of the device-mapper 
device, which is smaller than the device capacity by the number of 
metadata blocks/zones.

>>   		ret = -EINVAL;
>>   		goto err;
>> @@ -1008,7 +1008,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>>   
>>   static struct target_type dmz_type = {
>>   	.name		 = "zoned",
>> -	.version	 = {1, 2, 0},
>> +	.version	 = {1, 3, 0},
>>   	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
>>   	.module		 = THIS_MODULE,
>>   	.ctr		 = dmz_ctr,
>>
> 
> I do not think this is nearly enough: dmz_init_zones() is still considering the
> entire drive and does a zone report from 0 on the backend bdev. It should start
> at ti->begin and up to ti->begin+ti->len, no ?
> 
Yes, and no. I want to disallow 'ti->begin' to be anything else than 0 
as ti->begin and ti->len are relative to exported device-mapper device, 
and we always want to have block 0 mapped :-)

And as such dmz_init_zones() needs to cover all zones, as this is 
relative to the zoned block device.

> Furthermore, this introduce a change in the meaning of the zone ID. Since this
> is set to the index of the zone in the report (patch 1), if the mapping is
> partial and the zone report does not start at 0, then zone ID is not zone number
> on the device anymore. So dmz_start_block() needs to be offset by ti->begin
> otherwise IOs will go to the wrong zones.
> 
As I said: We will never do a partial mapping.
What this patch does it to bring the device-mapper mapping in-line with 
the exported device size.

Originally we would export a device-mapper mapping for blocks up to the 
zone-device capacity. As the resulting device-mapper block device has a 
smaller capacity than the mapping would allow for those 'spare' blocks 
would never been used, thus the invalid mapping was never triggered.

What this patch does is to bring the device-mapper mapping in-line with 
the exported block device capacity, so that we don't have an invalid 
mapping. Nothing else has (and should) be changed.

Especially not the partial zoned device handling :-)

Cheers,

Hannes
Damien Le Moal April 2, 2020, 2:45 a.m. UTC | #3
On 2020/03/31 17:53, Hannes Reinecke wrote:
> On 3/31/20 2:49 AM, Damien Le Moal wrote:
>> On 2020/03/27 16:15, Hannes Reinecke wrote:
>>> dm-zoned requires several zones for metadata and chunk bitmaps,
>>> so it cannot expose the entire capacity as the device size.
>>> Originally the code would check for the capacity being equal to
>>> the device size, which is arguably wrong.
>>> So relax this check and increase the interface version number
>>> to signal to userspace that it can set a smaller device size.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/md/dm-zoned-target.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>>> index 7ec9dde24516..89a825d1034e 100644
>>> --- a/drivers/md/dm-zoned-target.c
>>> +++ b/drivers/md/dm-zoned-target.c
>>> @@ -715,7 +715,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path)
>>>   	aligned_capacity = dev->capacity &
>>>   				~((sector_t)blk_queue_zone_sectors(q) - 1);
>>>   	if (ti->begin ||
>>> -	    ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) {
>>> +	    ((ti->len > dev->capacity) && (ti->len > aligned_capacity))) {
>>>   		ti->error = "Partial mapping not supported";
>>
>> The message is now wrong. Also, the condition can now be simplified to:
>>
>> if ((ti->begin + ti->len) > aligned_capacity) {
>>
>> Since aligned capacity is equal or smaller than dev capacity. And we have to
>> account for the potential non-zero begin.
>>
> _Actually_ I would forbid for 'ti->begin' to be anything other than 0.
> For a zoned device there is no point in allowing for partial handling at 
> all.
> Problem is that 'dev->capacity' is the capacity of the zoned block 
> device, whereas 'ti-len' is the exported capacity of the device-mapper 
> device, which is smaller than the device capacity by the number of 
> metadata blocks/zones.

OK. Got it. I misunderstood that. Sounds good to me.

> 
>>>   		ret = -EINVAL;
>>>   		goto err;
>>> @@ -1008,7 +1008,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>   
>>>   static struct target_type dmz_type = {
>>>   	.name		 = "zoned",
>>> -	.version	 = {1, 2, 0},
>>> +	.version	 = {1, 3, 0},
>>>   	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
>>>   	.module		 = THIS_MODULE,
>>>   	.ctr		 = dmz_ctr,
>>>
>>
>> I do not think this is nearly enough: dmz_init_zones() is still considering the
>> entire drive and does a zone report from 0 on the backend bdev. It should start
>> at ti->begin and up to ti->begin+ti->len, no ?
>>
> Yes, and no. I want to disallow 'ti->begin' to be anything else than 0 
> as ti->begin and ti->len are relative to exported device-mapper device, 
> and we always want to have block 0 mapped :-)
> 
> And as such dmz_init_zones() needs to cover all zones, as this is 
> relative to the zoned block device.

Yes. Which is already the case since we need to see the metatdata and reserved
zones too. Makes sense.

> 
>> Furthermore, this introduce a change in the meaning of the zone ID. Since this
>> is set to the index of the zone in the report (patch 1), if the mapping is
>> partial and the zone report does not start at 0, then zone ID is not zone number
>> on the device anymore. So dmz_start_block() needs to be offset by ti->begin
>> otherwise IOs will go to the wrong zones.
>>
> As I said: We will never do a partial mapping.
> What this patch does it to bring the device-mapper mapping in-line with 
> the exported device size.

Got it !

> Originally we would export a device-mapper mapping for blocks up to the 
> zone-device capacity. As the resulting device-mapper block device has a 
> smaller capacity than the mapping would allow for those 'spare' blocks 
> would never been used, thus the invalid mapping was never triggered.
> 
> What this patch does is to bring the device-mapper mapping in-line with 
> the exported block device capacity, so that we don't have an invalid 
> mapping. Nothing else has (and should) be changed.
> 
> Especially not the partial zoned device handling :-)

OK. Thanks for doing that ! I  totally missed these points when I wrote the
mapper :)

Cheers.
diff mbox series

Patch

diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 7ec9dde24516..89a825d1034e 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -715,7 +715,7 @@  static int dmz_get_zoned_device(struct dm_target *ti, char *path)
 	aligned_capacity = dev->capacity &
 				~((sector_t)blk_queue_zone_sectors(q) - 1);
 	if (ti->begin ||
-	    ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) {
+	    ((ti->len > dev->capacity) && (ti->len > aligned_capacity))) {
 		ti->error = "Partial mapping not supported";
 		ret = -EINVAL;
 		goto err;
@@ -1008,7 +1008,7 @@  static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
 
 static struct target_type dmz_type = {
 	.name		 = "zoned",
-	.version	 = {1, 2, 0},
+	.version	 = {1, 3, 0},
 	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
 	.module		 = THIS_MODULE,
 	.ctr		 = dmz_ctr,