diff mbox

[0/4] dm: zoned block device fixes

Message ID 20170530202031.GA52190@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Mike Snitzer May 30, 2017, 8:20 p.m. UTC
On Mon, May 29 2017 at  6:23P -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> Mike,
> 
> The first 3 patches of this series are incremental fixes for the zoned block
> device support patches that you committed to the for-4.13/dm branch.
> 
> The first patch correct the zone alignement checks so that the check is
> performed for any device, regardless of the device LBA size (it is skipped for
> 512B LBA devices otherwise).

I folded this first patch into the original commit (baf844bf4ae3).

> The second patch is a fix for commit baf844bf4ae3 "dm table: add zoned block
> devices validation". In that commit, the stacked limits zoned model was not
> set to the zoned model of the table target devices, leading to the exposed
> device always being exposed as a regular block device. With this fix, dm-flaky
> and dm-linear work fine on top of host-managed zoned block devices.
> 
> The third patch fixes zoned model validation again to allow for target types
> emulating a different zoned model than the model of the table target devices,
> e.g. dm-zoned.

The 2nd and 3rd seem over-done to me.  After spending more time than
ideal, the following patch would seem to be the equivalent to what
you've done in patches 2 and 3 (sans the "cleanup" of passing limits to
validate_hardware_zoned_model):

Anyway, I've folded this into the original commit too.  If you could
verify it works with your scenarios I'd appreciate it.

FYI, any additional cosmetic cleanup can wait (I happen to think this
code is clearer than how you relied on the matches functions to
initialize a temporary value).

I also folded in an validate_hardware_zoned_model() optimization to
return early if zoned_model == BLK_ZONED_NONE, please see/test the
rolled-up end result here:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.13/dm&id=9a6b54360f147c2d25fba7debc31a3251b804cc2

Also, please note that I've forcibly rebased linux-dm.git's
'for-4.13/dm' and staged it in 'for-next'.

> The last patch is dm-zoned with various fixes (mainly crashes on setup error
> and handling of the metadata cache shrinker). For your review, please use this
> version.

Will do, thanks.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Damien Le Moal May 31, 2017, 4:29 a.m. UTC | #1
Mike,

On 5/31/17 05:20, Mike Snitzer wrote:
> On Mon, May 29 2017 at  6:23P -0400,
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> 
>> Mike,
>>
>> The first 3 patches of this series are incremental fixes for the zoned block
>> device support patches that you committed to the for-4.13/dm branch.
>>
>> The first patch correct the zone alignement checks so that the check is
>> performed for any device, regardless of the device LBA size (it is skipped for
>> 512B LBA devices otherwise).
> 
> I folded this first patch into the original commit (baf844bf4ae3).

Great. Thanks.

>> The second patch is a fix for commit baf844bf4ae3 "dm table: add zoned block
>> devices validation". In that commit, the stacked limits zoned model was not
>> set to the zoned model of the table target devices, leading to the exposed
>> device always being exposed as a regular block device. With this fix, dm-flaky
>> and dm-linear work fine on top of host-managed zoned block devices.
>>
>> The third patch fixes zoned model validation again to allow for target types
>> emulating a different zoned model than the model of the table target devices,
>> e.g. dm-zoned.
> 
> The 2nd and 3rd seem over-done to me.  After spending more time than
> ideal, the following patch would seem to be the equivalent to what
> you've done in patches 2 and 3 (sans the "cleanup" of passing limits to
> validate_hardware_zoned_model):
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 6545150..a39bcd9 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1523,19 +1524,39 @@ int dm_calculate_queue_limits(struct dm_table *table,
>  			       dm_device_name(table->md),
>  			       (unsigned long long) ti->begin,
>  			       (unsigned long long) ti->len);
> +
> +		/*
> +		 * FIXME: this should likely be moved to blk_stack_limits(), would
> +		 * also eliminate limits->zoned stacking hack in dm_set_device_limits()
> +		 */
> +		if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) {
> +			/*
> +			 * By default, the stacked limits zoned model is set to
> +			 * BLK_ZONED_NONE in blk_set_stacking_limits(). Update
> +			 * this model using the first target model reported
> +			 * that is not BLK_ZONED_NONE. This will be either the
> +			 * first target device zoned model or the model reported
> +			 * by the target .io_hints.
> +			 */
> +			limits->zoned = ti_limits.zoned;
> +		}
>  	}
>  
>  	/*
>  	 * Verify that the zoned model and zone sectors, as determined before
>  	 * any .io_hints override, are the same across all devices in the table.
> -	 * - but if limits->zoned is not BLK_ZONED_NONE validate match for it
> -	 * - simillarly, check all devices conform to limits->chunk_sectors if
> -	 *   .io_hints altered them
> +	 * - this is especially relevant if .io_hints is emulating a disk-managed
> +	 *   zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices.
> +	 * BUT...
>  	 */
> -	if (limits->zoned != BLK_ZONED_NONE)
> +	if (limits->zoned != BLK_ZONED_NONE) {
> +		/*
> +		 * ...IF the above limits stacking determined a zoned model
> +		 * validate that all of the table's devices conform to it.
> +		 */
>  		zoned_model = limits->zoned;
> -	if (limits->chunk_sectors != zone_sectors)
>  		zone_sectors = limits->chunk_sectors;
> +	}
>  	if (validate_hardware_zoned_model(table, zoned_model, zone_sectors))
>  		return -EINVAL;
>  
> Anyway, I've folded this into the original commit too.  If you could
> verify it works with your scenarios I'd appreciate it.

I tested with dm-linear, dm-flakey and dm-zoned. No problems detected,
the end device zone model and zone size was always correct. I also tried
all invalid setup I could generate and all were properly caught.

There is however one case that will not work: an HM (or HA) emulating
target on top of a regular (NONE) block device. In that case, we will
end up checking that the underlying devices are compatible HM/HA, whih
will fail. But since none of the existing targets currently do this, I
guess the code is OK as is. What do you think ?

> FYI, any additional cosmetic cleanup can wait (I happen to think this
> code is clearer than how you relied on the matches functions to
> initialize a temporary value).

OK. No problem.

> I also folded in an validate_hardware_zoned_model() optimization to
> return early if zoned_model == BLK_ZONED_NONE, please see/test the
> rolled-up end result here:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.13/dm&id=9a6b54360f147c2d25fba7debc31a3251b804cc2
> 
> Also, please note that I've forcibly rebased linux-dm.git's
> 'for-4.13/dm' and staged it in 'for-next'.

I tested this tree unmodified. No problem detected.
Thank you.

Best regards.
Mike Snitzer May 31, 2017, 2:39 p.m. UTC | #2
On Wed, May 31 2017 at 12:29am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> Mike,
> 
> On 5/31/17 05:20, Mike Snitzer wrote:
> > On Mon, May 29 2017 at  6:23P -0400,
> > Damien Le Moal <damien.lemoal@wdc.com> wrote:
> > 
> >> Mike,
> >>
> >> The first 3 patches of this series are incremental fixes for the zoned block
> >> device support patches that you committed to the for-4.13/dm branch.
> >>
> >> The first patch correct the zone alignement checks so that the check is
> >> performed for any device, regardless of the device LBA size (it is skipped for
> >> 512B LBA devices otherwise).
> > 
> > I folded this first patch into the original commit (baf844bf4ae3).
> 
> Great. Thanks.
> 
> >> The second patch is a fix for commit baf844bf4ae3 "dm table: add zoned block
> >> devices validation". In that commit, the stacked limits zoned model was not
> >> set to the zoned model of the table target devices, leading to the exposed
> >> device always being exposed as a regular block device. With this fix, dm-flaky
> >> and dm-linear work fine on top of host-managed zoned block devices.
> >>
> >> The third patch fixes zoned model validation again to allow for target types
> >> emulating a different zoned model than the model of the table target devices,
> >> e.g. dm-zoned.
> > 
> > The 2nd and 3rd seem over-done to me.  After spending more time than
> > ideal, the following patch would seem to be the equivalent to what
> > you've done in patches 2 and 3 (sans the "cleanup" of passing limits to
> > validate_hardware_zoned_model):
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 6545150..a39bcd9 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -1523,19 +1524,39 @@ int dm_calculate_queue_limits(struct dm_table *table,
> >  			       dm_device_name(table->md),
> >  			       (unsigned long long) ti->begin,
> >  			       (unsigned long long) ti->len);
> > +
> > +		/*
> > +		 * FIXME: this should likely be moved to blk_stack_limits(), would
> > +		 * also eliminate limits->zoned stacking hack in dm_set_device_limits()
> > +		 */
> > +		if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) {
> > +			/*
> > +			 * By default, the stacked limits zoned model is set to
> > +			 * BLK_ZONED_NONE in blk_set_stacking_limits(). Update
> > +			 * this model using the first target model reported
> > +			 * that is not BLK_ZONED_NONE. This will be either the
> > +			 * first target device zoned model or the model reported
> > +			 * by the target .io_hints.
> > +			 */
> > +			limits->zoned = ti_limits.zoned;
> > +		}
> >  	}
> >  
> >  	/*
> >  	 * Verify that the zoned model and zone sectors, as determined before
> >  	 * any .io_hints override, are the same across all devices in the table.
> > -	 * - but if limits->zoned is not BLK_ZONED_NONE validate match for it
> > -	 * - simillarly, check all devices conform to limits->chunk_sectors if
> > -	 *   .io_hints altered them
> > +	 * - this is especially relevant if .io_hints is emulating a disk-managed
> > +	 *   zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices.
> > +	 * BUT...
> >  	 */
> > -	if (limits->zoned != BLK_ZONED_NONE)
> > +	if (limits->zoned != BLK_ZONED_NONE) {
> > +		/*
> > +		 * ...IF the above limits stacking determined a zoned model
> > +		 * validate that all of the table's devices conform to it.
> > +		 */
> >  		zoned_model = limits->zoned;
> > -	if (limits->chunk_sectors != zone_sectors)
> >  		zone_sectors = limits->chunk_sectors;
> > +	}
> >  	if (validate_hardware_zoned_model(table, zoned_model, zone_sectors))
> >  		return -EINVAL;
> >  
> > Anyway, I've folded this into the original commit too.  If you could
> > verify it works with your scenarios I'd appreciate it.
> 
> I tested with dm-linear, dm-flakey and dm-zoned. No problems detected,
> the end device zone model and zone size was always correct. I also tried
> all invalid setup I could generate and all were properly caught.
> 
> There is however one case that will not work: an HM (or HA) emulating
> target on top of a regular (NONE) block device. In that case, we will
> end up checking that the underlying devices are compatible HM/HA, whih
> will fail. But since none of the existing targets currently do this, I
> guess the code is OK as is. What do you think ?

Yeah, I think it best to fix that if/when there is a need.

> > FYI, any additional cosmetic cleanup can wait (I happen to think this
> > code is clearer than how you relied on the matches functions to
> > initialize a temporary value).
> 
> OK. No problem.
> 
> > I also folded in an validate_hardware_zoned_model() optimization to
> > return early if zoned_model == BLK_ZONED_NONE, please see/test the
> > rolled-up end result here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.13/dm&id=9a6b54360f147c2d25fba7debc31a3251b804cc2
> > 
> > Also, please note that I've forcibly rebased linux-dm.git's
> > 'for-4.13/dm' and staged it in 'for-next'.
> 
> I tested this tree unmodified. No problem detected.
> Thank you.

Good news, thanks for the review/testing.

FYI: my review of dm-zoned will be focused on DM target correctness
(suspend/resume quirks, no allocations in the IO path that aren't backed
by a mempool, coding style nits, etc).  I don't know enough about zoned
block devices to weigh-in on those details.  Ultimately I'll be
deferring to you, others on your team, and others in the community that
are more invested in zoned block devices to steer and stabilize this
target.

Anyway, hopefully my review will be fairly quick and I can get dm-zoned
staged for 4.13 by end of day tomorrow.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 2, 2017, 12:36 a.m. UTC | #3
On Wed, May 31 2017 at 10:39am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> FYI: my review of dm-zoned will be focused on DM target correctness
> (suspend/resume quirks, no allocations in the IO path that aren't backed
> by a mempool, coding style nits, etc).  I don't know enough about zoned
> block devices to weigh-in on those details.  Ultimately I'll be
> deferring to you, others on your team, and others in the community that
> are more invested in zoned block devices to steer and stabilize this
> target.
> 
> Anyway, hopefully my review will be fairly quick and I can get dm-zoned
> staged for 4.13 by end of day tomorrow.

I made a go of it but I'm getting hung up on quite a lot of code that
doesn't conform to, what I'd like to think is, the cleaner nature of how
DM targets that are split across multiple files should be.

You basically slammed everything into 'struct dmz_target' and passed dmz
everywhere.  I tried to split out a 'struct dmz_metadata' (and got quite
far!) but finally gave up because affecting that churn was killing me
slowly.  Anyway, here is where I left off:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned

In hindsight, maybe I should've just responded with the laundry list of
things I saw so that you could fix them.  But if you see changes that
you like in that branch feel free to pull them in to a new version of
dm-zoned that you resubmit.

As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_
separation but there is little point with doing so if we're going to
just half-ass it and add in a back-pointer to the 'struct dmz_target' to
access certain members.  I was left unhappy with my attempt.. again, was
a shit-show of churn.

I think this target needs a more critical eye on the various places IO
is being submitted and where allocations are occuring.  I allowed myself
to get hung up on code movement when I should've focused on more
constructive design choices you made.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Damien Le Moal June 5, 2017, 10:48 a.m. UTC | #4
Mike,

Thank you very much for the very detailed review and all the cleanup.

On 6/2/17 09:36, Mike Snitzer wrote:
> On Wed, May 31 2017 at 10:39am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>  
>> FYI: my review of dm-zoned will be focused on DM target correctness
>> (suspend/resume quirks, no allocations in the IO path that aren't backed
>> by a mempool, coding style nits, etc).  I don't know enough about zoned
>> block devices to weigh-in on those details.  Ultimately I'll be
>> deferring to you, others on your team, and others in the community that
>> are more invested in zoned block devices to steer and stabilize this
>> target.
>>
>> Anyway, hopefully my review will be fairly quick and I can get dm-zoned
>> staged for 4.13 by end of day tomorrow.
> 
> I made a go of it but I'm getting hung up on quite a lot of code that
> doesn't conform to, what I'd like to think is, the cleaner nature of how
> DM targets that are split across multiple files should be.
> 
> You basically slammed everything into 'struct dmz_target' and passed dmz
> everywhere.  I tried to split out a 'struct dmz_metadata' (and got quite
> far!) but finally gave up because affecting that churn was killing me
> slowly.  Anyway, here is where I left off:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned
> 
> In hindsight, maybe I should've just responded with the laundry list of
> things I saw so that you could fix them.  But if you see changes that
> you like in that branch feel free to pull them in to a new version of
> dm-zoned that you resubmit.
> 
> As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_
> separation but there is little point with doing so if we're going to
> just half-ass it and add in a back-pointer to the 'struct dmz_target' to
> access certain members.  I was left unhappy with my attempt.. again, was
> a shit-show of churn.

I continued and finished the separation. There is no back-pointer
anymore. All functions in dm-zoned-metadata.c take the metadata struct
pointer as argument. Other functions in dm-zoned-target.c and
dm-zoned-reclaim.c take the dmz_target pointer as argument. To do this,
I had to add a "struct dmz_dev" to store the static information about
the device being used (bdev, name, capacity, number of zones, zone
size). This does cleanup further the initialization and tear-down path.
I also moved around and renamed some functions to further cleanup the
code and make it easier to read. At this point, I think it would be easy
to also separate all fields needed for reclaim from the dmz_target
structure. But I have not pushed changes this far as the amount of data
needed for reclaim is rather small.

> I think this target needs a more critical eye on the various places IO
> is being submitted and where allocations are occuring.  I allowed myself
> to get hung up on code movement when I should've focused on more
> constructive design choices you made.

I did address some of the "FIXME" notes you added. The main one is the
BIO cloning in the I/O path. I removed most of that and added a .end_io
method for completion processing. The only place were I do not see how
to remove the call to bio_clone() is during read BIO processing: since a
read BIO may end up being split between buffer zone, sequential zone and
simple buffer zero-out, fragmentation of the read BIO is sometimes
necessary and so need a clone.

There is one "FIXME" that I did not address: the allocation of metadata
blocks on cache miss. This is in the I/O path, but called only from the
context of the chunk workers, so a different context than the BIO submit
one. I do not see a problem with this. Please let me know if you would
prefer another solution.

I am running tests against the new version created with all these
changes. If everything goes well, I will send it out tomorrow.

Best regards.
Mike Snitzer June 6, 2017, 2:18 p.m. UTC | #5
On Mon, Jun 05 2017 at  6:48am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> Mike,
> 
> Thank you very much for the very detailed review and all the cleanup.
> 
> On 6/2/17 09:36, Mike Snitzer wrote:
> > On Wed, May 31 2017 at 10:39am -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >  
> >> FYI: my review of dm-zoned will be focused on DM target correctness
> >> (suspend/resume quirks, no allocations in the IO path that aren't backed
> >> by a mempool, coding style nits, etc).  I don't know enough about zoned
> >> block devices to weigh-in on those details.  Ultimately I'll be
> >> deferring to you, others on your team, and others in the community that
> >> are more invested in zoned block devices to steer and stabilize this
> >> target.
> >>
> >> Anyway, hopefully my review will be fairly quick and I can get dm-zoned
> >> staged for 4.13 by end of day tomorrow.
> > 
> > I made a go of it but I'm getting hung up on quite a lot of code that
> > doesn't conform to, what I'd like to think is, the cleaner nature of how
> > DM targets that are split across multiple files should be.
> > 
> > You basically slammed everything into 'struct dmz_target' and passed dmz
> > everywhere.  I tried to split out a 'struct dmz_metadata' (and got quite
> > far!) but finally gave up because affecting that churn was killing me
> > slowly.  Anyway, here is where I left off:
> > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned
> > 
> > In hindsight, maybe I should've just responded with the laundry list of
> > things I saw so that you could fix them.  But if you see changes that
> > you like in that branch feel free to pull them in to a new version of
> > dm-zoned that you resubmit.
> > 
> > As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_
> > separation but there is little point with doing so if we're going to
> > just half-ass it and add in a back-pointer to the 'struct dmz_target' to
> > access certain members.  I was left unhappy with my attempt.. again, was
> > a shit-show of churn.
> 
> I continued and finished the separation. There is no back-pointer
> anymore. All functions in dm-zoned-metadata.c take the metadata struct
> pointer as argument. Other functions in dm-zoned-target.c and
> dm-zoned-reclaim.c take the dmz_target pointer as argument. To do this,
> I had to add a "struct dmz_dev" to store the static information about
> the device being used (bdev, name, capacity, number of zones, zone
> size). This does cleanup further the initialization and tear-down path.
> I also moved around and renamed some functions to further cleanup the
> code and make it easier to read. At this point, I think it would be easy
> to also separate all fields needed for reclaim from the dmz_target
> structure. But I have not pushed changes this far as the amount of data
> needed for reclaim is rather small.

Sounds good.

> > I think this target needs a more critical eye on the various places IO
> > is being submitted and where allocations are occuring.  I allowed myself
> > to get hung up on code movement when I should've focused on more
> > constructive design choices you made.
> 
> I did address some of the "FIXME" notes you added. The main one is the
> BIO cloning in the I/O path. I removed most of that and added a .end_io
> method for completion processing. The only place were I do not see how
> to remove the call to bio_clone() is during read BIO processing: since a
> read BIO may end up being split between buffer zone, sequential zone and
> simple buffer zero-out, fragmentation of the read BIO is sometimes
> necessary and so need a clone.
> 
> There is one "FIXME" that I did not address: the allocation of metadata
> blocks on cache miss. This is in the I/O path, but called only from the
> context of the chunk workers, so a different context than the BIO submit
> one. I do not see a problem with this. Please let me know if you would
> prefer another solution.
> 
> I am running tests against the new version created with all these
> changes. If everything goes well, I will send it out tomorrow.

Great, thanks for carrying it forward.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 8, 2017, 8:21 p.m. UTC | #6
On Mon, Jun 05 2017 at  6:48am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:
 
> I did address some of the "FIXME" notes you added. The main one is the
> BIO cloning in the I/O path. I removed most of that and added a .end_io
> method for completion processing. The only place were I do not see how
> to remove the call to bio_clone() is during read BIO processing: since a
> read BIO may end up being split between buffer zone, sequential zone and
> simple buffer zero-out, fragmentation of the read BIO is sometimes
> necessary and so need a clone.

So shouldn't it be possible to not allow a given bio to cross zone
boundaries by using dm_accept_partial_bio()?

Like you're already doing in dmz_map() actually... so why do you need to
account for crossing zone boundaries on read later on in
dmz_submit_read_bio()?

Is it that these zones aren't easily known up front (in dmz_map)?

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Damien Le Moal June 9, 2017, 4:25 a.m. UTC | #7
Mike,

On 6/9/17 05:21, Mike Snitzer wrote:
> On Mon, Jun 05 2017 at  6:48am -0400,
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>  
>> I did address some of the "FIXME" notes you added. The main one is the
>> BIO cloning in the I/O path. I removed most of that and added a .end_io
>> method for completion processing. The only place were I do not see how
>> to remove the call to bio_clone() is during read BIO processing: since a
>> read BIO may end up being split between buffer zone, sequential zone and
>> simple buffer zero-out, fragmentation of the read BIO is sometimes
>> necessary and so need a clone.
> 
> So shouldn't it be possible to not allow a given bio to cross zone
> boundaries by using dm_accept_partial_bio()?
> 
> Like you're already doing in dmz_map() actually... so why do you need to
> account for crossing zone boundaries on read later on in
> dmz_submit_read_bio()?

It is not crossing of zone or chunk boundaries that is being delt with
here. When the read BIO is being processed, we already know that it does
not cross chunk boundaries thanks to dmz_map(). Since chunks are mapped
to entire zones, the BIO does not cross a zone boundary either.

But the blocks to read within the chunk may be (1) invalid if they were
never written, (2) valid in the chunk data zone or (3) valid in the
chunk write buffer zone (this case exists only if the chunk is mapped to
a sequential zone.
So we need to examine the zones block bitmaps to discover this and
eventually split the BIO in several fragments if needed. Hence the need
for bio_clone().

If this processing was done within the context of dmz_map(), we could of
course use dm_accept_partial_bio() for splitting the BIO. But I avoided
this method as access to the zone bitmap may trigger metadata I/Os,
which is not very nice in the context of the user bio_submit(). There is
also the fact that dm-zoned does not have per zone locks to deal with
concurrent accesses to the same chunk. All BIO processing for a chunk
get serialized in the chunk work. This is mandatory to ensure sequential
write to sequential zones whenever possible.

> Is it that these zones aren't easily known up front (in dmz_map)?

dmz_map() does the mapping discovery using the mapping table that is
pinned-down in memory. So no metadata I/O trigger. But doing more than
that could trigger metadata I/Os, which I wanted to avoid.

On another note, I just posted an additional small patch for fixing 2
problems: an overflow in the target length calculation and a bad mistake
in the patch with the suspend method setup that caused compilation
error. My apologies for these mistakes. They were not in my local tree
when I tested. I think I messed up with the merging of all my local patches.

I checked your tree again and can confirm now that I am in sync. I will
keep testing anyway to make sure I did not do anything else wrong.

Thank you.

Best regards.
diff mbox

Patch

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 6545150..a39bcd9 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1523,19 +1524,39 @@  int dm_calculate_queue_limits(struct dm_table *table,
 			       dm_device_name(table->md),
 			       (unsigned long long) ti->begin,
 			       (unsigned long long) ti->len);
+
+		/*
+		 * FIXME: this should likely be moved to blk_stack_limits(), would
+		 * also eliminate limits->zoned stacking hack in dm_set_device_limits()
+		 */
+		if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) {
+			/*
+			 * By default, the stacked limits zoned model is set to
+			 * BLK_ZONED_NONE in blk_set_stacking_limits(). Update
+			 * this model using the first target model reported
+			 * that is not BLK_ZONED_NONE. This will be either the
+			 * first target device zoned model or the model reported
+			 * by the target .io_hints.
+			 */
+			limits->zoned = ti_limits.zoned;
+		}
 	}
 
 	/*
 	 * Verify that the zoned model and zone sectors, as determined before
 	 * any .io_hints override, are the same across all devices in the table.
-	 * - but if limits->zoned is not BLK_ZONED_NONE validate match for it
-	 * - simillarly, check all devices conform to limits->chunk_sectors if
-	 *   .io_hints altered them
+	 * - this is especially relevant if .io_hints is emulating a disk-managed
+	 *   zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices.
+	 * BUT...
 	 */
-	if (limits->zoned != BLK_ZONED_NONE)
+	if (limits->zoned != BLK_ZONED_NONE) {
+		/*
+		 * ...IF the above limits stacking determined a zoned model
+		 * validate that all of the table's devices conform to it.
+		 */
 		zoned_model = limits->zoned;
-	if (limits->chunk_sectors != zone_sectors)
 		zone_sectors = limits->chunk_sectors;
+	}
 	if (validate_hardware_zoned_model(table, zoned_model, zone_sectors))
 		return -EINVAL;