Message ID | 20170530202031.GA52190@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Mike, On 5/31/17 05:20, Mike Snitzer wrote: > On Mon, May 29 2017 at 6:23P -0400, > Damien Le Moal <damien.lemoal@wdc.com> wrote: > >> Mike, >> >> The first 3 patches of this series are incremental fixes for the zoned block >> device support patches that you committed to the for-4.13/dm branch. >> >> The first patch correct the zone alignement checks so that the check is >> performed for any device, regardless of the device LBA size (it is skipped for >> 512B LBA devices otherwise). > > I folded this first patch into the original commit (baf844bf4ae3). Great. Thanks. >> The second patch is a fix for commit baf844bf4ae3 "dm table: add zoned block >> devices validation". In that commit, the stacked limits zoned model was not >> set to the zoned model of the table target devices, leading to the exposed >> device always being exposed as a regular block device. With this fix, dm-flaky >> and dm-linear work fine on top of host-managed zoned block devices. >> >> The third patch fixes zoned model validation again to allow for target types >> emulating a different zoned model than the model of the table target devices, >> e.g. dm-zoned. > > The 2nd and 3rd seem over-done to me. After spending more time than > ideal, the following patch would seem to be the equivalent to what > you've done in patches 2 and 3 (sans the "cleanup" of passing limits to > validate_hardware_zoned_model): > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 6545150..a39bcd9 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -1523,19 +1524,39 @@ int dm_calculate_queue_limits(struct dm_table *table, > dm_device_name(table->md), > (unsigned long long) ti->begin, > (unsigned long long) ti->len); > + > + /* > + * FIXME: this should likely be moved to blk_stack_limits(), would > + * also eliminate limits->zoned stacking hack in dm_set_device_limits() > + */ > + if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) { > + /* > + * By default, the stacked limits zoned model is set to > + * BLK_ZONED_NONE in blk_set_stacking_limits(). Update > + * this model using the first target model reported > + * that is not BLK_ZONED_NONE. This will be either the > + * first target device zoned model or the model reported > + * by the target .io_hints. > + */ > + limits->zoned = ti_limits.zoned; > + } > } > > /* > * Verify that the zoned model and zone sectors, as determined before > * any .io_hints override, are the same across all devices in the table. > - * - but if limits->zoned is not BLK_ZONED_NONE validate match for it > - * - simillarly, check all devices conform to limits->chunk_sectors if > - * .io_hints altered them > + * - this is especially relevant if .io_hints is emulating a disk-managed > + * zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices. > + * BUT... > */ > - if (limits->zoned != BLK_ZONED_NONE) > + if (limits->zoned != BLK_ZONED_NONE) { > + /* > + * ...IF the above limits stacking determined a zoned model > + * validate that all of the table's devices conform to it. > + */ > zoned_model = limits->zoned; > - if (limits->chunk_sectors != zone_sectors) > zone_sectors = limits->chunk_sectors; > + } > if (validate_hardware_zoned_model(table, zoned_model, zone_sectors)) > return -EINVAL; > > Anyway, I've folded this into the original commit too. If you could > verify it works with your scenarios I'd appreciate it. I tested with dm-linear, dm-flakey and dm-zoned. No problems detected, the end device zone model and zone size was always correct. I also tried all invalid setup I could generate and all were properly caught. There is however one case that will not work: an HM (or HA) emulating target on top of a regular (NONE) block device. In that case, we will end up checking that the underlying devices are compatible HM/HA, whih will fail. But since none of the existing targets currently do this, I guess the code is OK as is. What do you think ? > FYI, any additional cosmetic cleanup can wait (I happen to think this > code is clearer than how you relied on the matches functions to > initialize a temporary value). OK. No problem. > I also folded in an validate_hardware_zoned_model() optimization to > return early if zoned_model == BLK_ZONED_NONE, please see/test the > rolled-up end result here: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.13/dm&id=9a6b54360f147c2d25fba7debc31a3251b804cc2 > > Also, please note that I've forcibly rebased linux-dm.git's > 'for-4.13/dm' and staged it in 'for-next'. I tested this tree unmodified. No problem detected. Thank you. Best regards.
On Wed, May 31 2017 at 12:29am -0400, Damien Le Moal <damien.lemoal@wdc.com> wrote: > Mike, > > On 5/31/17 05:20, Mike Snitzer wrote: > > On Mon, May 29 2017 at 6:23P -0400, > > Damien Le Moal <damien.lemoal@wdc.com> wrote: > > > >> Mike, > >> > >> The first 3 patches of this series are incremental fixes for the zoned block > >> device support patches that you committed to the for-4.13/dm branch. > >> > >> The first patch correct the zone alignement checks so that the check is > >> performed for any device, regardless of the device LBA size (it is skipped for > >> 512B LBA devices otherwise). > > > > I folded this first patch into the original commit (baf844bf4ae3). > > Great. Thanks. > > >> The second patch is a fix for commit baf844bf4ae3 "dm table: add zoned block > >> devices validation". In that commit, the stacked limits zoned model was not > >> set to the zoned model of the table target devices, leading to the exposed > >> device always being exposed as a regular block device. With this fix, dm-flaky > >> and dm-linear work fine on top of host-managed zoned block devices. > >> > >> The third patch fixes zoned model validation again to allow for target types > >> emulating a different zoned model than the model of the table target devices, > >> e.g. dm-zoned. > > > > The 2nd and 3rd seem over-done to me. After spending more time than > > ideal, the following patch would seem to be the equivalent to what > > you've done in patches 2 and 3 (sans the "cleanup" of passing limits to > > validate_hardware_zoned_model): > > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > > index 6545150..a39bcd9 100644 > > --- a/drivers/md/dm-table.c > > +++ b/drivers/md/dm-table.c > > @@ -1523,19 +1524,39 @@ int dm_calculate_queue_limits(struct dm_table *table, > > dm_device_name(table->md), > > (unsigned long long) ti->begin, > > (unsigned long long) ti->len); > > + > > + /* > > + * FIXME: this should likely be moved to blk_stack_limits(), would > > + * also eliminate limits->zoned stacking hack in dm_set_device_limits() > > + */ > > + if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) { > > + /* > > + * By default, the stacked limits zoned model is set to > > + * BLK_ZONED_NONE in blk_set_stacking_limits(). Update > > + * this model using the first target model reported > > + * that is not BLK_ZONED_NONE. This will be either the > > + * first target device zoned model or the model reported > > + * by the target .io_hints. > > + */ > > + limits->zoned = ti_limits.zoned; > > + } > > } > > > > /* > > * Verify that the zoned model and zone sectors, as determined before > > * any .io_hints override, are the same across all devices in the table. > > - * - but if limits->zoned is not BLK_ZONED_NONE validate match for it > > - * - simillarly, check all devices conform to limits->chunk_sectors if > > - * .io_hints altered them > > + * - this is especially relevant if .io_hints is emulating a disk-managed > > + * zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices. > > + * BUT... > > */ > > - if (limits->zoned != BLK_ZONED_NONE) > > + if (limits->zoned != BLK_ZONED_NONE) { > > + /* > > + * ...IF the above limits stacking determined a zoned model > > + * validate that all of the table's devices conform to it. > > + */ > > zoned_model = limits->zoned; > > - if (limits->chunk_sectors != zone_sectors) > > zone_sectors = limits->chunk_sectors; > > + } > > if (validate_hardware_zoned_model(table, zoned_model, zone_sectors)) > > return -EINVAL; > > > > Anyway, I've folded this into the original commit too. If you could > > verify it works with your scenarios I'd appreciate it. > > I tested with dm-linear, dm-flakey and dm-zoned. No problems detected, > the end device zone model and zone size was always correct. I also tried > all invalid setup I could generate and all were properly caught. > > There is however one case that will not work: an HM (or HA) emulating > target on top of a regular (NONE) block device. In that case, we will > end up checking that the underlying devices are compatible HM/HA, whih > will fail. But since none of the existing targets currently do this, I > guess the code is OK as is. What do you think ? Yeah, I think it best to fix that if/when there is a need. > > FYI, any additional cosmetic cleanup can wait (I happen to think this > > code is clearer than how you relied on the matches functions to > > initialize a temporary value). > > OK. No problem. > > > I also folded in an validate_hardware_zoned_model() optimization to > > return early if zoned_model == BLK_ZONED_NONE, please see/test the > > rolled-up end result here: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.13/dm&id=9a6b54360f147c2d25fba7debc31a3251b804cc2 > > > > Also, please note that I've forcibly rebased linux-dm.git's > > 'for-4.13/dm' and staged it in 'for-next'. > > I tested this tree unmodified. No problem detected. > Thank you. Good news, thanks for the review/testing. FYI: my review of dm-zoned will be focused on DM target correctness (suspend/resume quirks, no allocations in the IO path that aren't backed by a mempool, coding style nits, etc). I don't know enough about zoned block devices to weigh-in on those details. Ultimately I'll be deferring to you, others on your team, and others in the community that are more invested in zoned block devices to steer and stabilize this target. Anyway, hopefully my review will be fairly quick and I can get dm-zoned staged for 4.13 by end of day tomorrow. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, May 31 2017 at 10:39am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > FYI: my review of dm-zoned will be focused on DM target correctness > (suspend/resume quirks, no allocations in the IO path that aren't backed > by a mempool, coding style nits, etc). I don't know enough about zoned > block devices to weigh-in on those details. Ultimately I'll be > deferring to you, others on your team, and others in the community that > are more invested in zoned block devices to steer and stabilize this > target. > > Anyway, hopefully my review will be fairly quick and I can get dm-zoned > staged for 4.13 by end of day tomorrow. I made a go of it but I'm getting hung up on quite a lot of code that doesn't conform to, what I'd like to think is, the cleaner nature of how DM targets that are split across multiple files should be. You basically slammed everything into 'struct dmz_target' and passed dmz everywhere. I tried to split out a 'struct dmz_metadata' (and got quite far!) but finally gave up because affecting that churn was killing me slowly. Anyway, here is where I left off: https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned In hindsight, maybe I should've just responded with the laundry list of things I saw so that you could fix them. But if you see changes that you like in that branch feel free to pull them in to a new version of dm-zoned that you resubmit. As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_ separation but there is little point with doing so if we're going to just half-ass it and add in a back-pointer to the 'struct dmz_target' to access certain members. I was left unhappy with my attempt.. again, was a shit-show of churn. I think this target needs a more critical eye on the various places IO is being submitted and where allocations are occuring. I allowed myself to get hung up on code movement when I should've focused on more constructive design choices you made. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Mike, Thank you very much for the very detailed review and all the cleanup. On 6/2/17 09:36, Mike Snitzer wrote: > On Wed, May 31 2017 at 10:39am -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > >> FYI: my review of dm-zoned will be focused on DM target correctness >> (suspend/resume quirks, no allocations in the IO path that aren't backed >> by a mempool, coding style nits, etc). I don't know enough about zoned >> block devices to weigh-in on those details. Ultimately I'll be >> deferring to you, others on your team, and others in the community that >> are more invested in zoned block devices to steer and stabilize this >> target. >> >> Anyway, hopefully my review will be fairly quick and I can get dm-zoned >> staged for 4.13 by end of day tomorrow. > > I made a go of it but I'm getting hung up on quite a lot of code that > doesn't conform to, what I'd like to think is, the cleaner nature of how > DM targets that are split across multiple files should be. > > You basically slammed everything into 'struct dmz_target' and passed dmz > everywhere. I tried to split out a 'struct dmz_metadata' (and got quite > far!) but finally gave up because affecting that churn was killing me > slowly. Anyway, here is where I left off: > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned > > In hindsight, maybe I should've just responded with the laundry list of > things I saw so that you could fix them. But if you see changes that > you like in that branch feel free to pull them in to a new version of > dm-zoned that you resubmit. > > As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_ > separation but there is little point with doing so if we're going to > just half-ass it and add in a back-pointer to the 'struct dmz_target' to > access certain members. I was left unhappy with my attempt.. again, was > a shit-show of churn. I continued and finished the separation. There is no back-pointer anymore. All functions in dm-zoned-metadata.c take the metadata struct pointer as argument. Other functions in dm-zoned-target.c and dm-zoned-reclaim.c take the dmz_target pointer as argument. To do this, I had to add a "struct dmz_dev" to store the static information about the device being used (bdev, name, capacity, number of zones, zone size). This does cleanup further the initialization and tear-down path. I also moved around and renamed some functions to further cleanup the code and make it easier to read. At this point, I think it would be easy to also separate all fields needed for reclaim from the dmz_target structure. But I have not pushed changes this far as the amount of data needed for reclaim is rather small. > I think this target needs a more critical eye on the various places IO > is being submitted and where allocations are occuring. I allowed myself > to get hung up on code movement when I should've focused on more > constructive design choices you made. I did address some of the "FIXME" notes you added. The main one is the BIO cloning in the I/O path. I removed most of that and added a .end_io method for completion processing. The only place were I do not see how to remove the call to bio_clone() is during read BIO processing: since a read BIO may end up being split between buffer zone, sequential zone and simple buffer zero-out, fragmentation of the read BIO is sometimes necessary and so need a clone. There is one "FIXME" that I did not address: the allocation of metadata blocks on cache miss. This is in the I/O path, but called only from the context of the chunk workers, so a different context than the BIO submit one. I do not see a problem with this. Please let me know if you would prefer another solution. I am running tests against the new version created with all these changes. If everything goes well, I will send it out tomorrow. Best regards.
On Mon, Jun 05 2017 at 6:48am -0400, Damien Le Moal <damien.lemoal@wdc.com> wrote: > Mike, > > Thank you very much for the very detailed review and all the cleanup. > > On 6/2/17 09:36, Mike Snitzer wrote: > > On Wed, May 31 2017 at 10:39am -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > >> FYI: my review of dm-zoned will be focused on DM target correctness > >> (suspend/resume quirks, no allocations in the IO path that aren't backed > >> by a mempool, coding style nits, etc). I don't know enough about zoned > >> block devices to weigh-in on those details. Ultimately I'll be > >> deferring to you, others on your team, and others in the community that > >> are more invested in zoned block devices to steer and stabilize this > >> target. > >> > >> Anyway, hopefully my review will be fairly quick and I can get dm-zoned > >> staged for 4.13 by end of day tomorrow. > > > > I made a go of it but I'm getting hung up on quite a lot of code that > > doesn't conform to, what I'd like to think is, the cleaner nature of how > > DM targets that are split across multiple files should be. > > > > You basically slammed everything into 'struct dmz_target' and passed dmz > > everywhere. I tried to split out a 'struct dmz_metadata' (and got quite > > far!) but finally gave up because affecting that churn was killing me > > slowly. Anyway, here is where I left off: > > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-zoned > > > > In hindsight, maybe I should've just responded with the laundry list of > > things I saw so that you could fix them. But if you see changes that > > you like in that branch feel free to pull them in to a new version of > > dm-zoned that you resubmit. > > > > As for splitting out a 'struct dmz_metadata'.. I'd really prefer _some_ > > separation but there is little point with doing so if we're going to > > just half-ass it and add in a back-pointer to the 'struct dmz_target' to > > access certain members. I was left unhappy with my attempt.. again, was > > a shit-show of churn. > > I continued and finished the separation. There is no back-pointer > anymore. All functions in dm-zoned-metadata.c take the metadata struct > pointer as argument. Other functions in dm-zoned-target.c and > dm-zoned-reclaim.c take the dmz_target pointer as argument. To do this, > I had to add a "struct dmz_dev" to store the static information about > the device being used (bdev, name, capacity, number of zones, zone > size). This does cleanup further the initialization and tear-down path. > I also moved around and renamed some functions to further cleanup the > code and make it easier to read. At this point, I think it would be easy > to also separate all fields needed for reclaim from the dmz_target > structure. But I have not pushed changes this far as the amount of data > needed for reclaim is rather small. Sounds good. > > I think this target needs a more critical eye on the various places IO > > is being submitted and where allocations are occuring. I allowed myself > > to get hung up on code movement when I should've focused on more > > constructive design choices you made. > > I did address some of the "FIXME" notes you added. The main one is the > BIO cloning in the I/O path. I removed most of that and added a .end_io > method for completion processing. The only place were I do not see how > to remove the call to bio_clone() is during read BIO processing: since a > read BIO may end up being split between buffer zone, sequential zone and > simple buffer zero-out, fragmentation of the read BIO is sometimes > necessary and so need a clone. > > There is one "FIXME" that I did not address: the allocation of metadata > blocks on cache miss. This is in the I/O path, but called only from the > context of the chunk workers, so a different context than the BIO submit > one. I do not see a problem with this. Please let me know if you would > prefer another solution. > > I am running tests against the new version created with all these > changes. If everything goes well, I will send it out tomorrow. Great, thanks for carrying it forward. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Jun 05 2017 at 6:48am -0400, Damien Le Moal <damien.lemoal@wdc.com> wrote: > I did address some of the "FIXME" notes you added. The main one is the > BIO cloning in the I/O path. I removed most of that and added a .end_io > method for completion processing. The only place were I do not see how > to remove the call to bio_clone() is during read BIO processing: since a > read BIO may end up being split between buffer zone, sequential zone and > simple buffer zero-out, fragmentation of the read BIO is sometimes > necessary and so need a clone. So shouldn't it be possible to not allow a given bio to cross zone boundaries by using dm_accept_partial_bio()? Like you're already doing in dmz_map() actually... so why do you need to account for crossing zone boundaries on read later on in dmz_submit_read_bio()? Is it that these zones aren't easily known up front (in dmz_map)? Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Mike, On 6/9/17 05:21, Mike Snitzer wrote: > On Mon, Jun 05 2017 at 6:48am -0400, > Damien Le Moal <damien.lemoal@wdc.com> wrote: > >> I did address some of the "FIXME" notes you added. The main one is the >> BIO cloning in the I/O path. I removed most of that and added a .end_io >> method for completion processing. The only place were I do not see how >> to remove the call to bio_clone() is during read BIO processing: since a >> read BIO may end up being split between buffer zone, sequential zone and >> simple buffer zero-out, fragmentation of the read BIO is sometimes >> necessary and so need a clone. > > So shouldn't it be possible to not allow a given bio to cross zone > boundaries by using dm_accept_partial_bio()? > > Like you're already doing in dmz_map() actually... so why do you need to > account for crossing zone boundaries on read later on in > dmz_submit_read_bio()? It is not crossing of zone or chunk boundaries that is being delt with here. When the read BIO is being processed, we already know that it does not cross chunk boundaries thanks to dmz_map(). Since chunks are mapped to entire zones, the BIO does not cross a zone boundary either. But the blocks to read within the chunk may be (1) invalid if they were never written, (2) valid in the chunk data zone or (3) valid in the chunk write buffer zone (this case exists only if the chunk is mapped to a sequential zone. So we need to examine the zones block bitmaps to discover this and eventually split the BIO in several fragments if needed. Hence the need for bio_clone(). If this processing was done within the context of dmz_map(), we could of course use dm_accept_partial_bio() for splitting the BIO. But I avoided this method as access to the zone bitmap may trigger metadata I/Os, which is not very nice in the context of the user bio_submit(). There is also the fact that dm-zoned does not have per zone locks to deal with concurrent accesses to the same chunk. All BIO processing for a chunk get serialized in the chunk work. This is mandatory to ensure sequential write to sequential zones whenever possible. > Is it that these zones aren't easily known up front (in dmz_map)? dmz_map() does the mapping discovery using the mapping table that is pinned-down in memory. So no metadata I/O trigger. But doing more than that could trigger metadata I/Os, which I wanted to avoid. On another note, I just posted an additional small patch for fixing 2 problems: an overflow in the target length calculation and a bad mistake in the patch with the suspend method setup that caused compilation error. My apologies for these mistakes. They were not in my local tree when I tested. I think I messed up with the merging of all my local patches. I checked your tree again and can confirm now that I am in sync. I will keep testing anyway to make sure I did not do anything else wrong. Thank you. Best regards.
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 6545150..a39bcd9 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1523,19 +1524,39 @@ int dm_calculate_queue_limits(struct dm_table *table, dm_device_name(table->md), (unsigned long long) ti->begin, (unsigned long long) ti->len); + + /* + * FIXME: this should likely be moved to blk_stack_limits(), would + * also eliminate limits->zoned stacking hack in dm_set_device_limits() + */ + if (limits->zoned == BLK_ZONED_NONE && ti_limits.zoned != BLK_ZONED_NONE) { + /* + * By default, the stacked limits zoned model is set to + * BLK_ZONED_NONE in blk_set_stacking_limits(). Update + * this model using the first target model reported + * that is not BLK_ZONED_NONE. This will be either the + * first target device zoned model or the model reported + * by the target .io_hints. + */ + limits->zoned = ti_limits.zoned; + } } /* * Verify that the zoned model and zone sectors, as determined before * any .io_hints override, are the same across all devices in the table. - * - but if limits->zoned is not BLK_ZONED_NONE validate match for it - * - simillarly, check all devices conform to limits->chunk_sectors if - * .io_hints altered them + * - this is especially relevant if .io_hints is emulating a disk-managed + * zoned model (aka BLK_ZONED_NONE) on host-managed zoned block devices. + * BUT... */ - if (limits->zoned != BLK_ZONED_NONE) + if (limits->zoned != BLK_ZONED_NONE) { + /* + * ...IF the above limits stacking determined a zoned model + * validate that all of the table's devices conform to it. + */ zoned_model = limits->zoned; - if (limits->chunk_sectors != zone_sectors) zone_sectors = limits->chunk_sectors; + } if (validate_hardware_zoned_model(table, zoned_model, zone_sectors)) return -EINVAL;