diff mbox series

blk-iocost: fix false positive lagging

Message ID 20220526133554.21079-1-zhouchengming@bytedance.com (mailing list archive)
State New, archived
Headers show
Series blk-iocost: fix false positive lagging | expand

Commit Message

Chengming Zhou May 26, 2022, 1:35 p.m. UTC
I found many false positive lagging during iocost test.

Since iocg->vtime will be advanced to (vnow - margins.target)
in hweight_after_donation(), which called throw away excess,
the iocg->done_vtime will also be advanced that much.

       period_at_vtime  <--period_vtime-->  vnow
              |                              |
  --------------------------------------------------->
        |<--->|
     margins.target
        |->
  vtime, done_vtime

If that iocg has some inflight io when vnow, but its done_vtime
is before period_at_vtime, ioc_timer_fn() will think it has
lagging io, even these io maybe issued just before now.

This patch change the condition to check if vdone is before
(period_at_vtime - margins.target) instead of period_at_vtime.

But there is another problem that this patch doesn't fix.
Since vtime will be advanced, we can't check if vtime is
after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
whether this iocg pin lagging for too long.

Maybe we can add lagging_periods in iocg to record how many
periods this iocg pin lagging, but I don't know when to clean it.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 block/blk-iocost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tejun Heo May 26, 2022, 5:43 p.m. UTC | #1
Hello,

On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote:
> I found many false positive lagging during iocost test.
> 
> Since iocg->vtime will be advanced to (vnow - margins.target)
> in hweight_after_donation(), which called throw away excess,
> the iocg->done_vtime will also be advanced that much.
> 
>        period_at_vtime  <--period_vtime-->  vnow
>               |                              |
>   --------------------------------------------------->
>         |<--->|
>      margins.target
>         |->
>   vtime, done_vtime

All it does is shifting the vtime (and done_vtime) within the current window
so that we don't build up budget too lage a budget possibly spanning
multiple periods. The lagging detection is supposed to detect IOs which are
issued two+ periods ago which didn't finish in the last period. So, I don't
think the above sliding up the window affects that detection given that the
lagging detection is done before the window sliding. All it's checking is
whether there still are in-flight IOs which were issued two+ windows ago, so
how the last window has been fast forwarded shouldn't affect the detection,
no?

Thanks.
Chengming Zhou May 26, 2022, 11:43 p.m. UTC | #2
Hello,

On 2022/5/27 01:43, Tejun Heo wrote:
> Hello,
> 
> On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote:
>> I found many false positive lagging during iocost test.
>>
>> Since iocg->vtime will be advanced to (vnow - margins.target)
>> in hweight_after_donation(), which called throw away excess,
>> the iocg->done_vtime will also be advanced that much.
>>
>>        period_at_vtime  <--period_vtime-->  vnow
>>               |                              |
>>   --------------------------------------------------->
>>         |<--->|
>>      margins.target
>>         |->
>>   vtime, done_vtime
> 
> All it does is shifting the vtime (and done_vtime) within the current window
> so that we don't build up budget too lage a budget possibly spanning
> multiple periods. 

Yes, this is necessary. Suppose in the last timer, the iocg doesn't have inflights
and have excess, then iocg->vtime = iocg->done_vtime = (period_at_vtime - margins.target)

> The lagging detection is supposed to detect IOs which are
> issued two+ periods ago which didn't finish in the last period. So, I don't

Yes, I understand.

> think the above sliding up the window affects that detection given that the
> lagging detection is done before the window sliding. All it's checking is
> whether there still are in-flight IOs which were issued two+ windows ago, so
> how the last window has been fast forwarded shouldn't affect the detection,
> no?

Right, the lagging detection is done before the window sliding in this period timer.
The conditions that it checks vtime, done_vtime have been slided in the last timer.

time_after64(vtime, vdone) &&
time_after64(vtime, now.vnow - MAX_LAGGING_PERIODS * period_vtime) &&
time_before64(vdone, now.vnow - period_vtime)

The first condition says it has some inflights, the second condition is always true
if vtime has been slided in the last timer, the third condition will be true if the
cost of io completed since last timer < ioc->margins.target.

So I think it doesn't check correctly whether it has inflights that were issued two+
windows ago.

Thanks.

> 
> Thanks.
>
Chengming Zhou May 28, 2022, 8:17 a.m. UTC | #3
On 2022/5/26 21:35, Chengming Zhou wrote:
> I found many false positive lagging during iocost test.
> 
> Since iocg->vtime will be advanced to (vnow - margins.target)
> in hweight_after_donation(), which called throw away excess,
> the iocg->done_vtime will also be advanced that much.
> 
>        period_at_vtime  <--period_vtime-->  vnow
>               |                              |
>   --------------------------------------------------->
>         |<--->|
>      margins.target
>         |->
>   vtime, done_vtime
> 
> If that iocg has some inflight io when vnow, but its done_vtime
> is before period_at_vtime, ioc_timer_fn() will think it has
> lagging io, even these io maybe issued just before now.
> 
> This patch change the condition to check if vdone is before
> (period_at_vtime - margins.target) instead of period_at_vtime.
> 
> But there is another problem that this patch doesn't fix.
> Since vtime will be advanced, we can't check if vtime is
> after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
> whether this iocg pin lagging for too long.
> 
> Maybe we can add lagging_periods in iocg to record how many
> periods this iocg pin lagging, but I don't know when to clean it.

Hello tejun, I add lagging_periods in iocg based on the original patch,
to record how many periods this iocg pin lagging. So we can use it to
avoid letting cmds which take a very long time pin lagging for too long.

Thanks.


diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 33a11ba971ea..998bb38ffb37 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -541,6 +541,8 @@ struct ioc_gq {
        u64                             indebt_since;
        u64                             indelay_since;

+       int                             lagging_periods;
+
        /* this iocg's depth in the hierarchy and ancestors including self */
        int                             level;
        struct ioc_gq                   *ancestors[];
@@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer)
                if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
                    !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
                    time_after64(vtime, vdone) &&
-                   time_after64(vtime, now.vnow -
-                                MAX_LAGGING_PERIODS * period_vtime) &&
-                   time_before64(vdone, now.vnow - period_vtime))
-                       nr_lagging++;
+                   time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) {
+                       if (iocg->lagging_periods < MAX_LAGGING_PERIODS) {
+                               nr_lagging++;
+                               iocg->lagging_periods++;
+                       }
+               } else if (iocg->lagging_periods)
+                       iocg->lagging_periods = 0;

                /*
                 * Determine absolute usage factoring in in-flight IOs to avoid


> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  block/blk-iocost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 33a11ba971ea..42e301b7527b 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer)
>  		    time_after64(vtime, vdone) &&
>  		    time_after64(vtime, now.vnow -
>  				 MAX_LAGGING_PERIODS * period_vtime) &&
> -		    time_before64(vdone, now.vnow - period_vtime))
> +		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
>  			nr_lagging++;
>  
>  		/*
Chengming Zhou June 1, 2022, 12:32 p.m. UTC | #4
On 2022/5/28 16:17, Chengming Zhou wrote:
> On 2022/5/26 21:35, Chengming Zhou wrote:
>> I found many false positive lagging during iocost test.
>>
>> Since iocg->vtime will be advanced to (vnow - margins.target)
>> in hweight_after_donation(), which called throw away excess,
>> the iocg->done_vtime will also be advanced that much.
>>
>>        period_at_vtime  <--period_vtime-->  vnow
>>               |                              |
>>   --------------------------------------------------->
>>         |<--->|
>>      margins.target
>>         |->
>>   vtime, done_vtime
>>
>> If that iocg has some inflight io when vnow, but its done_vtime
>> is before period_at_vtime, ioc_timer_fn() will think it has
>> lagging io, even these io maybe issued just before now.
>>
>> This patch change the condition to check if vdone is before
>> (period_at_vtime - margins.target) instead of period_at_vtime.
>>
>> But there is another problem that this patch doesn't fix.
>> Since vtime will be advanced, we can't check if vtime is
>> after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
>> whether this iocg pin lagging for too long.
>>
>> Maybe we can add lagging_periods in iocg to record how many
>> periods this iocg pin lagging, but I don't know when to clean it.
> 
> Hello tejun, I add lagging_periods in iocg based on the original patch,
> to record how many periods this iocg pin lagging. So we can use it to
> avoid letting cmds which take a very long time pin lagging for too long.
> 
> Thanks.
> 
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 33a11ba971ea..998bb38ffb37 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -541,6 +541,8 @@ struct ioc_gq {
>         u64                             indebt_since;
>         u64                             indelay_since;
> 
> +       int                             lagging_periods;
> +
>         /* this iocg's depth in the hierarchy and ancestors including self */
>         int                             level;
>         struct ioc_gq                   *ancestors[];
> @@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer)
>                 if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
>                     !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
>                     time_after64(vtime, vdone) &&
> -                   time_after64(vtime, now.vnow -
> -                                MAX_LAGGING_PERIODS * period_vtime) &&
> -                   time_before64(vdone, now.vnow - period_vtime))
> -                       nr_lagging++;
> +                   time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) {
> +                       if (iocg->lagging_periods < MAX_LAGGING_PERIODS) {
> +                               nr_lagging++;
> +                               iocg->lagging_periods++;
> +                       }
> +               } else if (iocg->lagging_periods)
> +                       iocg->lagging_periods = 0;
> 
>                 /*
>                  * Determine absolute usage factoring in in-flight IOs to avoid
> 

Hi, I tested with this version, previous false laggings are gone. So I wonder
if I should send v2 for review?

Thanks!

> 
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  block/blk-iocost.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
>> index 33a11ba971ea..42e301b7527b 100644
>> --- a/block/blk-iocost.c
>> +++ b/block/blk-iocost.c
>> @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer)
>>  		    time_after64(vtime, vdone) &&
>>  		    time_after64(vtime, now.vnow -
>>  				 MAX_LAGGING_PERIODS * period_vtime) &&
>> -		    time_before64(vdone, now.vnow - period_vtime))
>> +		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
>>  			nr_lagging++;
>>  
>>  		/*
diff mbox series

Patch

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 33a11ba971ea..42e301b7527b 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2259,7 +2259,7 @@  static void ioc_timer_fn(struct timer_list *timer)
 		    time_after64(vtime, vdone) &&
 		    time_after64(vtime, now.vnow -
 				 MAX_LAGGING_PERIODS * period_vtime) &&
-		    time_before64(vdone, now.vnow - period_vtime))
+		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
 			nr_lagging++;
 
 		/*