Message ID | beb9ab5875427431b58e1001e481b7a43e9188eb.1606186717.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Some cleanups and improvements for blk-iocost | expand |
Hello, On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote: > @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, > * after the above debt payment. > */ > ctx.vbudget = vbudget; > - current_hweight(iocg, NULL, &ctx.hw_inuse); > + if (need_update_hwi) > + current_hweight(iocg, NULL, &ctx.hw_inuse); So, if you look at the implementation of current_hweight(), it's 1. If nothing has changed, read out the cached values. 2. If something has changed, recalculate. and the "something changed" test is single memory read (most likely L1 hot at this point) and testing for equality. IOW, the change you're suggesting isn't much of an optimization. Maybe the compiler can do a somewhat better job of arranging the code and it's a register load than memory load but given that it's already a relatively cold wait path, this is unlikely to make any actual difference. And that's how current_hweight() is meant to be used. So, I'm not sure this is an improvement. It increases complication without actually gaining anything. Thanks.
> Hello, > > On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote: >> @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, >> * after the above debt payment. >> */ >> ctx.vbudget = vbudget; >> - current_hweight(iocg, NULL, &ctx.hw_inuse); >> + if (need_update_hwi) >> + current_hweight(iocg, NULL, &ctx.hw_inuse); > > So, if you look at the implementation of current_hweight(), it's > > 1. If nothing has changed, read out the cached values. > 2. If something has changed, recalculate. Yes, correct. > > and the "something changed" test is single memory read (most likely L1 hot > at this point) and testing for equality. IOW, the change you're suggesting > isn't much of an optimization. Maybe the compiler can do a somewhat better > job of arranging the code and it's a register load than memory load but > given that it's already a relatively cold wait path, this is unlikely to > make any actual difference. And that's how current_hweight() is meant to be > used. What I want to avoid is the 'atomic_read(&ioc->hweight_gen)' in current_hweight(), cause this is not a register load and is always a memory load. But introducing a flag can be cached and more light than a memory load. But after thinking more, I think we can just move the "current_hweight(iocg, NULL, &ctx.hw_inuse);" to the correct place without introducing new flag to optimize the code. How do you think the below code? diff --git a/block/blk-iocost.c b/block/blk-iocost.c index bbe86d1..db29200 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1413,7 +1413,7 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, lockdep_assert_held(&iocg->waitq.lock); - current_hweight(iocg, &hwa, NULL); + current_hweight(iocg, &hwa, &ctx.hw_inuse); vbudget = now->vnow - atomic64_read(&iocg->vtime); /* pay off debt */ @@ -1428,6 +1428,11 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, atomic64_add(vpay, &iocg->done_vtime); iocg_pay_debt(iocg, abs_vpay, now); vbudget -= vpay; + /* + * As paying off debt restores hw_inuse, it must be read after + * the above debt payment. + */ + current_hweight(iocg, NULL, &ctx.hw_inuse); } if (iocg->abs_vdebt || iocg->delay) @@ -1446,11 +1451,9 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, /* * Wake up the ones which are due and see how much vtime we'll need for - * the next one. As paying off debt restores hw_inuse, it must be read - * after the above debt payment. + * the next one. */ ctx.vbudget = vbudget; - current_hweight(iocg, NULL, &ctx.hw_inuse); __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx); > > So, I'm not sure this is an improvement. It increases complication without > actually gaining anything. > > Thanks. >
On Wed, Nov 25, 2020 at 10:15:38PM +0800, Baolin Wang wrote: > > > Hello, > > > > On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote: > > > @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, > > > * after the above debt payment. > > > */ > > > ctx.vbudget = vbudget; > > > - current_hweight(iocg, NULL, &ctx.hw_inuse); > > > + if (need_update_hwi) > > > + current_hweight(iocg, NULL, &ctx.hw_inuse); > > > > So, if you look at the implementation of current_hweight(), it's > > > > 1. If nothing has changed, read out the cached values. > > 2. If something has changed, recalculate. > > Yes, correct. > > > > > and the "something changed" test is single memory read (most likely L1 hot > > at this point) and testing for equality. IOW, the change you're suggesting > > isn't much of an optimization. Maybe the compiler can do a somewhat better > > job of arranging the code and it's a register load than memory load but > > given that it's already a relatively cold wait path, this is unlikely to > > make any actual difference. And that's how current_hweight() is meant to be > > used. > > What I want to avoid is the 'atomic_read(&ioc->hweight_gen)' in > current_hweight(), cause this is not a register load and is always a memory > load. But introducing a flag can be cached and more light than a memory > load. > > But after thinking more, I think we can just move the "current_hweight(iocg, > NULL, &ctx.hw_inuse);" to the correct place without introducing new flag to > optimize the code. How do you think the below code? I don't find this discussion very meaningful. We're talking about theoretical one memory load optimization in a path which likely isn't hot enough for such difference to make any difference. If you can show that this matters, please do. Otherwise, what are we doing? Thanks.
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 089f3fe..5305afd 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1405,10 +1405,11 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, u64 vshortage, expires, oexpires; s64 vbudget; u32 hwa; + bool need_update_hwi = false; lockdep_assert_held(&iocg->waitq.lock); - current_hweight(iocg, &hwa, NULL); + current_hweight(iocg, &hwa, &ctx.hw_inuse); vbudget = now->vnow - atomic64_read(&iocg->vtime); /* pay off debt */ @@ -1423,6 +1424,7 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, atomic64_add(vpay, &iocg->done_vtime); iocg_pay_debt(iocg, abs_vpay, now); vbudget -= vpay; + need_update_hwi = true; } if (iocg->abs_vdebt || iocg->delay) @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, * after the above debt payment. */ ctx.vbudget = vbudget; - current_hweight(iocg, NULL, &ctx.hw_inuse); + if (need_update_hwi) + current_hweight(iocg, NULL, &ctx.hw_inuse); __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx);
We can get the hwa and hwi at one time if no debt need to pay off, thus add a flag to indicate if the hw_inuse has been changed and need to update, which can avoid calling current_hweight() twice for no debt iocgs. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)