Message ID | 20250207023505.86967-1-colyli@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V3] md-linear: optimize which_dev() for small disks number | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_14-PR | fail | PR summary |
mdraidci/vmtest-md-6_14-VM_Test-0 | fail | Logs for per-patch-testing |
Hi, 在 2025/02/07 10:35, colyli@kernel.org 写道: > From: Coly Li <colyli@kernel.org> > > which_dev() is a top hot function in md-linear.c, every I/O request will > call it to find out which component disk the bio should be issued to. > > Current witch_dev() performs a standard binary search, indeed this is > not the fastest algorithm in practice. When the whole conf->disks array > can be stored within a single cache line, simple linear search is faster > than binary search for a small disks number. > >>From micro benchmark, around 20%~30% latency reduction can be observed. > Of course such huge optimization cannot be achieved in real workload, in > my benchmark with, > 1) One md linear device assembled by 2 or 4 Intel Optane memory block > device on Lenovo ThinkSystem SR650 server. > 2) Random write I/O issued by fio, with I/O depth 1 and 512 bytes block > size. > > The percentage of I/O latencies completed with 750 nsec increases from > 97.186% to 99.324% in average, in a rough estimation the write latency > improves (reduces) around 2.138%. > > This is quite ideal result, I believe on slow hard drives such small > code-running optimization will be overwhelmed by hardware latency and > hard to be recognized. > > This patch will go back to binary search when the linear device grows > and conf->disks array cannot be placed within a single cache line. > > Although the optimization result is tiny in most of cases, it is good to > have it since we don't pay any other cost. > > Signed-off-by: Coly Li <colyli@kernel.org> > Cc: Song Liu <song@kernel.org> > Cc: Yu Kuai <yukuai3@huawei.com> > --- > Changelog, > v3: fix typo and email address which are reported by raid kernel ci. > v2: return last item of conf->disks[] if fast search missed. > v1: initial version. > > > drivers/md/md-linear.c | 45 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 44 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c > index a382929ce7ba..cdb59f2b2a1c 100644 > --- a/drivers/md/md-linear.c > +++ b/drivers/md/md-linear.c > @@ -25,10 +25,12 @@ struct linear_conf { > struct dev_info disks[] __counted_by(raid_disks); > }; > > +static int prefer_linear_dev_search; Why using global variable instead of per linear? There are still hole in the linear_conf() that you can use. > + > /* > * find which device holds a particular offset > */ > -static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) > +static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector) > { > int lo, mid, hi; > struct linear_conf *conf; > @@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) > return conf->disks + lo; > } > > +/* > + * If conf->disk[] can be hold within a L1 cache line, > + * linear search is fater than binary search. > + */ > +static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) > +{ > + int i; > + > + if (prefer_linear_dev_search) { > + struct linear_conf *conf; > + struct dev_info *dev; > + int max; > + > + conf = mddev->private; > + dev = conf->disks; > + max = conf->raid_disks; > + for (i = 0; i < max; i++, dev++) { > + if (sector < dev->end_sector) > + return dev; > + } > + return &conf->disks[max-1]; > + } > + > + /* slow path */ > + return __which_dev(mddev, sector); > +} > + > + > static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks) > { > struct linear_conf *conf; > @@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) > md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); > set_capacity_and_notify(mddev->gendisk, mddev->array_sectors); > kfree_rcu(oldconf, rcu); > + > + /* > + * When elements in linear_conf->disks[] becomes large enough, > + * set prefer_linear_dev_search as 0 to indicate linear search > + * in which_dev() is not optimized. Slow path in __which_dev() > + * might be faster. > + */ > + if ((mddev->raid_disks * sizeof(struct dev_info)) > > + cache_line_size() && BTW, you said in the case *a single cache line*, however, the field disks is 32 bytes offset in linear_conf, you might want to align it to cacheline, or add sizeof(struct linear_conf) on the left and align linear_conf to the cacheline. Thanks, Kuai > + prefer_linear_dev_search == 1) > + prefer_linear_dev_search = 0; > + > return 0; > } > > @@ -337,6 +379,7 @@ static struct md_personality linear_personality = { > > static int __init linear_init(void) > { > + prefer_linear_dev_search = 1; > return register_md_personality(&linear_personality); > } > >
> 2025年2月7日 17:50,Yu Kuai <yukuai1@huaweicloud.com> 写道: > > Hi, > > 在 2025/02/07 10:35, colyli@kernel.org 写道: >> From: Coly Li <colyli@kernel.org> >> which_dev() is a top hot function in md-linear.c, every I/O request will >> call it to find out which component disk the bio should be issued to. >> Current witch_dev() performs a standard binary search, indeed this is >> not the fastest algorithm in practice. When the whole conf->disks array >> can be stored within a single cache line, simple linear search is faster >> than binary search for a small disks number. >>> From micro benchmark, around 20%~30% latency reduction can be observed. >> Of course such huge optimization cannot be achieved in real workload, in >> my benchmark with, >> 1) One md linear device assembled by 2 or 4 Intel Optane memory block >> device on Lenovo ThinkSystem SR650 server. >> 2) Random write I/O issued by fio, with I/O depth 1 and 512 bytes block >> size. > >> The percentage of I/O latencies completed with 750 nsec increases from >> 97.186% to 99.324% in average, in a rough estimation the write latency >> improves (reduces) around 2.138%. >> This is quite ideal result, I believe on slow hard drives such small >> code-running optimization will be overwhelmed by hardware latency and >> hard to be recognized. >> This patch will go back to binary search when the linear device grows >> and conf->disks array cannot be placed within a single cache line. >> Although the optimization result is tiny in most of cases, it is good to >> have it since we don't pay any other cost. >> Signed-off-by: Coly Li <colyli@kernel.org> >> Cc: Song Liu <song@kernel.org> >> Cc: Yu Kuai <yukuai3@huawei.com> >> --- >> Changelog, >> v3: fix typo and email address which are reported by raid kernel ci. >> v2: return last item of conf->disks[] if fast search missed. >> v1: initial version. >> drivers/md/md-linear.c | 45 +++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 44 insertions(+), 1 deletion(-) >> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c >> index a382929ce7ba..cdb59f2b2a1c 100644 >> --- a/drivers/md/md-linear.c >> +++ b/drivers/md/md-linear.c >> @@ -25,10 +25,12 @@ struct linear_conf { >> struct dev_info disks[] __counted_by(raid_disks); >> }; >> +static int prefer_linear_dev_search; > > Why using global variable instead of per linear? There are still hole in > the linear_conf() that you can use. OK, I will use space inside linear_conf. >> + >> /* >> * find which device holds a particular offset >> */ >> -static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> +static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector) >> { >> int lo, mid, hi; >> struct linear_conf *conf; >> @@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> return conf->disks + lo; >> } >> +/* >> + * If conf->disk[] can be hold within a L1 cache line, >> + * linear search is fater than binary search. >> + */ >> +static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) >> +{ >> + int i; >> + >> + if (prefer_linear_dev_search) { >> + struct linear_conf *conf; >> + struct dev_info *dev; >> + int max; >> + >> + conf = mddev->private; >> + dev = conf->disks; >> + max = conf->raid_disks; >> + for (i = 0; i < max; i++, dev++) { >> + if (sector < dev->end_sector) >> + return dev; >> + } >> + return &conf->disks[max-1]; >> + } >> + >> + /* slow path */ >> + return __which_dev(mddev, sector); >> +} >> + >> + >> static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks) >> { >> struct linear_conf *conf; >> @@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) >> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >> set_capacity_and_notify(mddev->gendisk, mddev->array_sectors); >> kfree_rcu(oldconf, rcu); >> + >> + /* >> + * When elements in linear_conf->disks[] becomes large enough, >> + * set prefer_linear_dev_search as 0 to indicate linear search >> + * in which_dev() is not optimized. Slow path in __which_dev() >> + * might be faster. >> + */ >> + if ((mddev->raid_disks * sizeof(struct dev_info)) > >> + cache_line_size() && > > BTW, you said in the case *a single cache line*, however, the field > disks is 32 bytes offset in linear_conf, you might want to align it to > cacheline, or add sizeof(struct linear_conf) on the left and align > linear_conf to the cacheline. > Yes, thanks for the hint. Let me do it in next version. > Thanks, > Kuai >> + prefer_linear_dev_search == 1) >> + prefer_linear_dev_search = 0; >> + >> return 0; >> } >> @@ -337,6 +379,7 @@ static struct md_personality linear_personality = { >> static int __init linear_init(void) >> { >> + prefer_linear_dev_search = 1; >> return register_md_personality(&linear_personality); >> }
diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c index a382929ce7ba..cdb59f2b2a1c 100644 --- a/drivers/md/md-linear.c +++ b/drivers/md/md-linear.c @@ -25,10 +25,12 @@ struct linear_conf { struct dev_info disks[] __counted_by(raid_disks); }; +static int prefer_linear_dev_search; + /* * find which device holds a particular offset */ -static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) +static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector) { int lo, mid, hi; struct linear_conf *conf; @@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) return conf->disks + lo; } +/* + * If conf->disk[] can be hold within a L1 cache line, + * linear search is fater than binary search. + */ +static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector) +{ + int i; + + if (prefer_linear_dev_search) { + struct linear_conf *conf; + struct dev_info *dev; + int max; + + conf = mddev->private; + dev = conf->disks; + max = conf->raid_disks; + for (i = 0; i < max; i++, dev++) { + if (sector < dev->end_sector) + return dev; + } + return &conf->disks[max-1]; + } + + /* slow path */ + return __which_dev(mddev, sector); +} + + static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks) { struct linear_conf *conf; @@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); set_capacity_and_notify(mddev->gendisk, mddev->array_sectors); kfree_rcu(oldconf, rcu); + + /* + * When elements in linear_conf->disks[] becomes large enough, + * set prefer_linear_dev_search as 0 to indicate linear search + * in which_dev() is not optimized. Slow path in __which_dev() + * might be faster. + */ + if ((mddev->raid_disks * sizeof(struct dev_info)) > + cache_line_size() && + prefer_linear_dev_search == 1) + prefer_linear_dev_search = 0; + return 0; } @@ -337,6 +379,7 @@ static struct md_personality linear_personality = { static int __init linear_init(void) { + prefer_linear_dev_search = 1; return register_md_personality(&linear_personality); }