Message ID | 20220427165710.225808-1-krisman@collabora.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath | expand |
On Wed, Apr 27 2022 at 12:57P -0400, Gabriel Krisman Bertazi <krisman@collabora.com> wrote: > The precision loss of reading IO start_time with jiffies_to_nsecs > instead of using a high resolution timer degrades HST path prediction > for BIO-based mpath on high load workloads. > > Below, I show the utilization percentage of a 10 disk multipath with > asymmetrical disk access cost, while being exercised by a randwrite FIO > benchmark with high submission queue depth (depth=64). It is possible > to see that the HST path selection degrades heavily for high-iops in > BIO-mpath, underutilizing the slower paths way beyond expected. This > seems to be caused by the start_time truncation, which makes some IO to > seem much slower than they actually is. In this scenario ST outperforms > HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns(). > > The third column shows utilization with this patch applied. It is easy > to see that now HST prediction is much closer to the ideal distribution > (calculated considering the real cost of each path). > > | | ST | HST (orig) | HST(ktime) | Best | > | sdd | 0.17 | 0.20 | 0.17 | 0.18 | > | sde | 0.17 | 0.20 | 0.17 | 0.18 | > | sdf | 0.17 | 0.20 | 0.17 | 0.18 | > | sdg | 0.06 | 0.00 | 0.06 | 0.04 | > | sdh | 0.03 | 0.00 | 0.03 | 0.02 | > | sdi | 0.03 | 0.00 | 0.03 | 0.02 | > | sdj | 0.02 | 0.00 | 0.01 | 0.01 | > | sdk | 0.02 | 0.00 | 0.01 | 0.01 | > | sdl | 0.17 | 0.20 | 0.17 | 0.18 | > | sdm | 0.17 | 0.20 | 0.17 | 0.18 | > > This issue was originally discussed [1] when we first merged HST, and > this patch was left as a low hanging fruit to be solved later. I don't > think anyone is using HST with BIO mpath, but it'd be neat to get it > sorted out. > > Regarding the implementation, as suggested by Mike in that mail thread, > in order to avoid the overhead of ktime_get_ns for other selectors, this > patch adds a flag for the selector code to request the high-resolution > timer. > > I tested this using the same benchmark used in the original HST submission. > > Full test and benchmark scripts are available here: > > https://people.collabora.com/~krisman/HST-BIO-MPATH/ > > [1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/ > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com> Overall your code was OK, but I nudged it a bit further to be inkeeping with how 'features' flags have been implemented elsewhere (e.g. dm_target_type's features) -- by using a healer to test the flag, etc. I also tweaked some other small implementation details. Please see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.19&id=c06dfd124d46df9c482fbd1319b5fe19bcb1a110 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Mon, May 09 2022 at 3:47P -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Wed, Apr 27 2022 at 12:57P -0400, > Gabriel Krisman Bertazi <krisman@collabora.com> wrote: > > > The precision loss of reading IO start_time with jiffies_to_nsecs > > instead of using a high resolution timer degrades HST path prediction > > for BIO-based mpath on high load workloads. > > > > Below, I show the utilization percentage of a 10 disk multipath with > > asymmetrical disk access cost, while being exercised by a randwrite FIO > > benchmark with high submission queue depth (depth=64). It is possible > > to see that the HST path selection degrades heavily for high-iops in > > BIO-mpath, underutilizing the slower paths way beyond expected. This > > seems to be caused by the start_time truncation, which makes some IO to > > seem much slower than they actually is. In this scenario ST outperforms > > HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns(). > > > > The third column shows utilization with this patch applied. It is easy > > to see that now HST prediction is much closer to the ideal distribution > > (calculated considering the real cost of each path). > > > > | | ST | HST (orig) | HST(ktime) | Best | > > | sdd | 0.17 | 0.20 | 0.17 | 0.18 | > > | sde | 0.17 | 0.20 | 0.17 | 0.18 | > > | sdf | 0.17 | 0.20 | 0.17 | 0.18 | > > | sdg | 0.06 | 0.00 | 0.06 | 0.04 | > > | sdh | 0.03 | 0.00 | 0.03 | 0.02 | > > | sdi | 0.03 | 0.00 | 0.03 | 0.02 | > > | sdj | 0.02 | 0.00 | 0.01 | 0.01 | > > | sdk | 0.02 | 0.00 | 0.01 | 0.01 | > > | sdl | 0.17 | 0.20 | 0.17 | 0.18 | > > | sdm | 0.17 | 0.20 | 0.17 | 0.18 | > > > > This issue was originally discussed [1] when we first merged HST, and > > this patch was left as a low hanging fruit to be solved later. I don't > > think anyone is using HST with BIO mpath, but it'd be neat to get it > > sorted out. > > > > Regarding the implementation, as suggested by Mike in that mail thread, > > in order to avoid the overhead of ktime_get_ns for other selectors, this > > patch adds a flag for the selector code to request the high-resolution > > timer. > > > > I tested this using the same benchmark used in the original HST submission. > > > > Full test and benchmark scripts are available here: > > > > https://people.collabora.com/~krisman/HST-BIO-MPATH/ > > > > [1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/ > > > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > > Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com> > > Overall your code was OK, but I nudged it a bit further to be > inkeeping with how 'features' flags have been implemented elsewhere > (e.g. dm_target_type's features) -- by using a healer to test the > flag, etc. s/healer/helper/ ;) -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer <snitzer@redhat.com> writes: > Overall your code was OK, but I nudged it a bit further to be > inkeeping with how 'features' flags have been implemented elsewhere > (e.g. dm_target_type's features) -- by using a healer to test the > flag, etc. > > I also tweaked some other small implementation details. Please see: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.19&id=c06dfd124d46df9c482fbd1319b5fe19bcb1a110 looks good to me. thanks for the fixups.
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index f4719b65e5e3..c58cfcd87d04 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -105,6 +105,7 @@ struct multipath { struct dm_mpath_io { struct pgpath *pgpath; size_t nr_bytes; + u64 start_time_ns; }; typedef int (*action_fn) (struct pgpath *pgpath); @@ -647,6 +648,11 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, mpio->pgpath = pgpath; + if (pgpath->pg->ps.type->flags & PS_NEED_HR_TIMER_FL) + mpio->start_time_ns = ktime_get_ns(); + else + mpio->start_time_ns = 0; + bio->bi_status = 0; bio_set_dev(bio, pgpath->path.dev->bdev); bio->bi_opf |= REQ_FAILFAST_TRANSPORT; @@ -1715,9 +1721,15 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, if (pgpath) { struct path_selector *ps = &pgpath->pg->ps; - if (ps->type->end_io) + if (ps->type->end_io) { + u64 st_time = mpio->start_time_ns; + + if (!st_time) + st_time = dm_start_time_ns_from_clone(clone); + ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes, - dm_start_time_ns_from_clone(clone)); + st_time); + } } return r; diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h index c47bc0e20275..46710f364286 100644 --- a/drivers/md/dm-path-selector.h +++ b/drivers/md/dm-path-selector.h @@ -26,6 +26,18 @@ struct path_selector { void *context; }; +/* If a path selector uses this flag, a more precise timer + * (ktime_get_ns) is used to account for IO start time in BIO-based + * mpath. This improves performance of some path selectors (i.e. HST), + * in exchange of a slightly higher overhead when submiting the + * BIO. The extra cost is usually offset by improved path selection for + * some benchmarks. + * + * This has no effect for request-based mpath, since it already uses a + * higher precision timer by default. + */ +#define PS_NEED_HR_TIMER_FL 0x1 + /* Information about a path selector type */ struct path_selector_type { char *name; @@ -33,6 +45,7 @@ struct path_selector_type { unsigned int table_args; unsigned int info_args; + unsigned int flags; /* * Constructs a path selector object, takes custom arguments diff --git a/drivers/md/dm-ps-historical-service-time.c b/drivers/md/dm-ps-historical-service-time.c index 82f2a06153dc..d2233cc4ea97 100644 --- a/drivers/md/dm-ps-historical-service-time.c +++ b/drivers/md/dm-ps-historical-service-time.c @@ -523,6 +523,7 @@ static int hst_end_io(struct path_selector *ps, struct dm_path *path, static struct path_selector_type hst_ps = { .name = "historical-service-time", .module = THIS_MODULE, + .flags = PS_NEED_HR_TIMER_FL, .table_args = 1, .info_args = 3, .create = hst_create,