diff mbox series

[4/7] blk-iocost: Add a flag to indicate if need update hwi

Message ID beb9ab5875427431b58e1001e481b7a43e9188eb.1606186717.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series Some cleanups and improvements for blk-iocost | expand

Commit Message

Baolin Wang Nov. 24, 2020, 3:33 a.m. UTC
We can get the hwa and hwi at one time if no debt need to pay off,
thus add a flag to indicate if the hw_inuse has been changed and
need to update, which can avoid calling current_hweight() twice
for no debt iocgs.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 block/blk-iocost.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Tejun Heo Nov. 25, 2020, 12:14 p.m. UTC | #1
Hello,

On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote:
> @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
>  	 * after the above debt payment.
>  	 */
>  	ctx.vbudget = vbudget;
> -	current_hweight(iocg, NULL, &ctx.hw_inuse);
> +	if (need_update_hwi)
> +		current_hweight(iocg, NULL, &ctx.hw_inuse);

So, if you look at the implementation of current_hweight(), it's

1. If nothing has changed, read out the cached values.
2. If something has changed, recalculate.

and the "something changed" test is single memory read (most likely L1 hot
at this point) and testing for equality. IOW, the change you're suggesting
isn't much of an optimization. Maybe the compiler can do a somewhat better
job of arranging the code and it's a register load than memory load but
given that it's already a relatively cold wait path, this is unlikely to
make any actual difference. And that's how current_hweight() is meant to be
used.

So, I'm not sure this is an improvement. It increases complication without
actually gaining anything.

Thanks.
Baolin Wang Nov. 25, 2020, 2:15 p.m. UTC | #2
> Hello,
> 
> On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote:
>> @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
>>   	 * after the above debt payment.
>>   	 */
>>   	ctx.vbudget = vbudget;
>> -	current_hweight(iocg, NULL, &ctx.hw_inuse);
>> +	if (need_update_hwi)
>> +		current_hweight(iocg, NULL, &ctx.hw_inuse);
> 
> So, if you look at the implementation of current_hweight(), it's
> 
> 1. If nothing has changed, read out the cached values.
> 2. If something has changed, recalculate.

Yes, correct.

> 
> and the "something changed" test is single memory read (most likely L1 hot
> at this point) and testing for equality. IOW, the change you're suggesting
> isn't much of an optimization. Maybe the compiler can do a somewhat better
> job of arranging the code and it's a register load than memory load but
> given that it's already a relatively cold wait path, this is unlikely to
> make any actual difference. And that's how current_hweight() is meant to be
> used.

What I want to avoid is the 'atomic_read(&ioc->hweight_gen)' in 
current_hweight(), cause this is not a register load and is always a 
memory load. But introducing a flag can be cached and more light than a 
memory load.

But after thinking more, I think we can just move the 
"current_hweight(iocg, NULL, &ctx.hw_inuse);" to the correct place 
without introducing new flag to optimize the code. How do you think the 
below code?

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index bbe86d1..db29200 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1413,7 +1413,7 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, 
bool pay_debt,

         lockdep_assert_held(&iocg->waitq.lock);

-       current_hweight(iocg, &hwa, NULL);
+       current_hweight(iocg, &hwa, &ctx.hw_inuse);
         vbudget = now->vnow - atomic64_read(&iocg->vtime);

         /* pay off debt */
@@ -1428,6 +1428,11 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, 
bool pay_debt,
                 atomic64_add(vpay, &iocg->done_vtime);
                 iocg_pay_debt(iocg, abs_vpay, now);
                 vbudget -= vpay;
+               /*
+                * As paying off debt restores hw_inuse, it must be read 
after
+                * the above debt payment.
+                */
+               current_hweight(iocg, NULL, &ctx.hw_inuse);
         }

         if (iocg->abs_vdebt || iocg->delay)
@@ -1446,11 +1451,9 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, 
bool pay_debt,

         /*
          * Wake up the ones which are due and see how much vtime we'll 
need for
-        * the next one. As paying off debt restores hw_inuse, it must 
be read
-        * after the above debt payment.
+        * the next one.
          */
         ctx.vbudget = vbudget;
-       current_hweight(iocg, NULL, &ctx.hw_inuse);

         __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx);

> 
> So, I'm not sure this is an improvement. It increases complication without
> actually gaining anything.
> 
> Thanks.
>
Tejun Heo Nov. 25, 2020, 2:35 p.m. UTC | #3
On Wed, Nov 25, 2020 at 10:15:38PM +0800, Baolin Wang wrote:
> 
> > Hello,
> > 
> > On Tue, Nov 24, 2020 at 11:33:33AM +0800, Baolin Wang wrote:
> > > @@ -1445,7 +1447,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
> > >   	 * after the above debt payment.
> > >   	 */
> > >   	ctx.vbudget = vbudget;
> > > -	current_hweight(iocg, NULL, &ctx.hw_inuse);
> > > +	if (need_update_hwi)
> > > +		current_hweight(iocg, NULL, &ctx.hw_inuse);
> > 
> > So, if you look at the implementation of current_hweight(), it's
> > 
> > 1. If nothing has changed, read out the cached values.
> > 2. If something has changed, recalculate.
> 
> Yes, correct.
> 
> > 
> > and the "something changed" test is single memory read (most likely L1 hot
> > at this point) and testing for equality. IOW, the change you're suggesting
> > isn't much of an optimization. Maybe the compiler can do a somewhat better
> > job of arranging the code and it's a register load than memory load but
> > given that it's already a relatively cold wait path, this is unlikely to
> > make any actual difference. And that's how current_hweight() is meant to be
> > used.
> 
> What I want to avoid is the 'atomic_read(&ioc->hweight_gen)' in
> current_hweight(), cause this is not a register load and is always a memory
> load. But introducing a flag can be cached and more light than a memory
> load.
> 
> But after thinking more, I think we can just move the "current_hweight(iocg,
> NULL, &ctx.hw_inuse);" to the correct place without introducing new flag to
> optimize the code. How do you think the below code?

I don't find this discussion very meaningful. We're talking about
theoretical one memory load optimization in a path which likely isn't hot
enough for such difference to make any difference. If you can show that this
matters, please do. Otherwise, what are we doing?

Thanks.
diff mbox series

Patch

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 089f3fe..5305afd 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1405,10 +1405,11 @@  static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
 	u64 vshortage, expires, oexpires;
 	s64 vbudget;
 	u32 hwa;
+	bool need_update_hwi = false;
 
 	lockdep_assert_held(&iocg->waitq.lock);
 
-	current_hweight(iocg, &hwa, NULL);
+	current_hweight(iocg, &hwa, &ctx.hw_inuse);
 	vbudget = now->vnow - atomic64_read(&iocg->vtime);
 
 	/* pay off debt */
@@ -1423,6 +1424,7 @@  static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
 		atomic64_add(vpay, &iocg->done_vtime);
 		iocg_pay_debt(iocg, abs_vpay, now);
 		vbudget -= vpay;
+		need_update_hwi = true;
 	}
 
 	if (iocg->abs_vdebt || iocg->delay)
@@ -1445,7 +1447,8 @@  static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
 	 * after the above debt payment.
 	 */
 	ctx.vbudget = vbudget;
-	current_hweight(iocg, NULL, &ctx.hw_inuse);
+	if (need_update_hwi)
+		current_hweight(iocg, NULL, &ctx.hw_inuse);
 
 	__wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx);