diff mbox series

[v2,net,2/7] net/sched: taprio: fix cycle time adjustment for next entry

Message ID 20231107112023.676016-3-faizal.abdul.rahim@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series qbv cycle time extension/truncation | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1312 this patch: 1312
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1340 this patch: 1340
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1340 this patch: 1340
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Abdul Rahim, Faizal Nov. 7, 2023, 11:20 a.m. UTC
According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
"the Cycle Time Extension variable allows this extension of the last old
cycle to be done in a defined way. If the last complete old cycle would
normally end less than OperCycleTimeExtension nanoseconds before the new
base time, then the last complete cycle before AdminBaseTime is reached
is extended so that it ends at AdminBaseTime."

The current taprio implementation does not extend the last old cycle in
the defined manner specified in the Qbv Spec. This is part of the fix
covered in this patch.

Here are the changes made:

1. A new API, get_cycle_time_correction(), has been added to return the
correction value. If it returns a non-initialize value, it indicates
changes required for the next entry schedule, and upon the completion
of the next entry's duration, entries will be loaded from the new admin
schedule.

2. Store correction values in cycle_time_correction:
a) Positive correction value/extension
We calculate the correction between the last operational cycle and the
new admin base time. Note that for positive correction to take place,
the next entry should be the last entry from oper and the new admin base
time falls within the next cycle time of old oper.

b) Negative correction value
The new admin base time starts earlier than the next entry's end time.

c) Zero correction value
The new admin base time aligns exactly with the old cycle.

3. When cycle_time_correction is set to a non-initialized value, it is
used to:
a) Indicate that cycle correction is active to trigger adjustments in
affected fields like interval and cycle_time. A new API,
cycle_corr_active(), has been created to help with this purpose.

b) Transition to the new admin schedule at the beginning of
advance_sched(). A new API, should_change_sched(), has been introduced
for this purpose.

4. Remove the previous definition of should_change_scheds() API. A new
should_change_sched() API has been introduced, but it serves a
completely different purpose than the one that was removed.

Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
 net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 35 deletions(-)

Comments

Vladimir Oltean Nov. 8, 2023, 11:20 p.m. UTC | #1
On Tue, Nov 07, 2023 at 06:20:18AM -0500, Faizal Rahim wrote:
> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
> "the Cycle Time Extension variable allows this extension of the last old
> cycle to be done in a defined way. If the last complete old cycle would
> normally end less than OperCycleTimeExtension nanoseconds before the new
> base time, then the last complete cycle before AdminBaseTime is reached
> is extended so that it ends at AdminBaseTime."

So far, so good.

> The current taprio implementation does not extend the last old cycle in
> the defined manner specified in the Qbv Spec. This is part of the fix
> covered in this patch.

In the discussion on v1, I said that prior to commit a1e6ad30fa19
("net/sched: taprio: calculate guard band against actual TC gate close
time"), the last entry's next->close_time actually used to include the
oper schedule's correction, but it no longer does. This points to a
regression, and not to something that was never there. Am I wrong?

> Here are the changes made:
> 
> 1. A new API, get_cycle_time_correction(), has been added to return the

I would call an "API" an interface between two distinct software layers,
based on an agreed-upon contract. Not a function in sch_taprio.c which
is called by another function in sch_taprio.c.

> correction value. If it returns a non-initialize value, it indicates
> changes required for the next entry schedule, and upon the completion
> of the next entry's duration, entries will be loaded from the new admin
> schedule.

This paragraph doesn't really help. It gets the reader lost in
irrelevant details which are actually not that hard to deduce from the
code with some good naming. Actually I find it poor naming to say
"non-initialize value" when what you mean is "!= INIT_CYCLE_TIME_CORRECTION".
I think I would name this a "specific" or "valid" cycle correction, when
it takes a value different from CYCLE_TIME_CORRECTION_UNSPEC.

> 
> 2. Store correction values in cycle_time_correction:
> a) Positive correction value/extension
> We calculate the correction between the last operational cycle and the
> new admin base time. Note that for positive correction to take place,
> the next entry should be the last entry from oper and the new admin base
> time falls within the next cycle time of old oper.
> 
> b) Negative correction value
> The new admin base time starts earlier than the next entry's end time.
> 
> c) Zero correction value
> The new admin base time aligns exactly with the old cycle.
> 
> 3. When cycle_time_correction is set to a non-initialized value, it is
> used to:
> a) Indicate that cycle correction is active to trigger adjustments in
> affected fields like interval and cycle_time. A new API,
> cycle_corr_active(), has been created to help with this purpose.

Again, it's exaggerated to call this an "API".
Although, what you can do is provide a kerneldoc-style comment above the
functions which you wish to describe, and remove this explanation from
the commit message.

> 
> b) Transition to the new admin schedule at the beginning of
> advance_sched(). A new API, should_change_sched(), has been introduced
> for this purpose.

This should have been done since patch 1, not here.

> 4. Remove the previous definition of should_change_scheds() API. A new
> should_change_sched() API has been introduced, but it serves a
> completely different purpose than the one that was removed.

So don't name it the same!

> 
> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
>  net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
>  1 file changed, 70 insertions(+), 35 deletions(-)
> 
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index dee103647823..ed32654b46f5 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
>  	return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
>  }
>  
> +static bool cycle_corr_active(s64 cycle_time_correction)
> +{
> +	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> +		return false;
> +	else
> +		return true;
> +}
> +

Could look like this:

static bool cycle_corr_active(struct sched_gate_list *oper)
{
	return oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION;
}

>  /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
>   * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
>   * the maximum open gate durations at the given link speed.
> @@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
>  	return false;
>  }
>  
> -static bool should_change_schedules(const struct sched_gate_list *admin,
> -				    const struct sched_gate_list *oper,
> -				    ktime_t end_time)
> +static bool should_change_sched(struct sched_gate_list *oper)
>  {
> -	ktime_t next_base_time, extension_time;
> +	bool change_to_admin_sched = false;
>  
> -	if (!admin)
> -		return false;
> +	if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
> +		/* The recent entry ran is the last one from oper */
> +		change_to_admin_sched = true;
> +		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;

Don't make a function called should_change_sched() stateful. Don't make
this function reset the value of oper->cycle_time_correction, since that
practically equates with actually starting the schedule change.

The oper->cycle_time_correction assignment actually belongs to
switch_schedules(), in my opinion.

And if you make should_change_sched() stateless, then surprise-surprise,
it will contain the exact same logic, and return the exact same thing,
as cycle_corr_active(). I think that naming this single function
sched_change_pending() and providing a kerneldoc comment as to why it is
implemented the way it is should be sufficient.

> +	}
>  
> -	next_base_time = sched_base_time(admin);
> +	return change_to_admin_sched;
> +}
>  
> -	/* This is the simple case, the end_time would fall after
> -	 * the next schedule base_time.
> -	 */
> -	if (ktime_compare(next_base_time, end_time) <= 0)
> +static bool should_extend_cycle(const struct sched_gate_list *oper,
> +				ktime_t new_base_time,
> +				ktime_t entry_end_time,
> +				const struct sched_entry *entry)
> +{
> +	ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
> +						   oper->cycle_time);
> +	bool extension_supported = oper->cycle_time_extension > 0 ? true : false;

"? true : false" is redundant. Just "extension_supported = oper->cycle_time_extension > 0"
is enough.

> +	s64 extension_limit = oper->cycle_time_extension;
> +
> +	if (extension_supported &&
> +	    list_is_last(&entry->list, &oper->entries) &&
> +	    ktime_before(new_base_time, next_cycle_end_time) &&
> +	    ktime_sub(new_base_time, entry_end_time) < extension_limit)
>  		return true;
> +	else
> +		return false;

Style nitpick:

	return extension_supported &&
	       list_is_last(&entry->list, &oper->entries) &&
	       ktime_before(new_base_time, next_cycle_end_time) &&
	       ktime_sub(new_base_time, entry_end_time) < extension_limit;

> +}
>  
> -	/* This is the cycle_time_extension case, if the end_time
> -	 * plus the amount that can be extended would fall after the
> -	 * next schedule base_time, we can extend the current schedule
> -	 * for that amount.
> -	 */
> -	extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
> +static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
> +				     ktime_t new_base_time,
> +				     ktime_t entry_end_time,
> +				     const struct sched_entry *entry)
> +{
> +	s64 correction = INIT_CYCLE_TIME_CORRECTION;
>  
> -	/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
> -	 * how precisely the extension should be made. So after
> -	 * conformance testing, this logic may change.
> -	 */
> -	if (ktime_compare(next_base_time, extension_time) <= 0)
> -		return true;
> +	if (!entry || !oper)
> +		return correction;

This function is called as follows:

	oper->cycle_time_correction = get_cycle_time_correction(oper, ...);

So, "oper" cannot be NULL if we dereference "oper" in the left hand side
of the assignment and expect the kernel not to crash, no?

"entry" - assigned from the "next" variable in advance_sched() - should
not be NULL either, from the way in which it is assigned.

>  
> -	return false;
> +	if (ktime_compare(new_base_time, entry_end_time) <= 0) {
> +		/* negative or zero correction */

At least for me, it would be helpful if you could transplant the
explanation from the commit message ("The new admin base time starts
earlier than the next entry's end time") into an expanded comment here.
I am easily confused about the "ktime_compare(a, b) <= 0" construction.

> +		correction = ktime_sub(new_base_time, entry_end_time);
> +	} else if (ktime_after(new_base_time, entry_end_time) &&
> +		   should_extend_cycle(oper, new_base_time,
> +				       entry_end_time, entry)) {
> +		/* positive correction */

Same here - move the explanation from the commit message to the comment,
please.

> +		correction = ktime_sub(new_base_time, entry_end_time);
> +	}
> +
> +	return correction;
>  }
>  
>  static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> @@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>  	admin = rcu_dereference_protected(q->admin_sched,
>  					  lockdep_is_held(&q->current_entry_lock));
>  
> -	if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
> -		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
> +	if (!oper || should_change_sched(oper))
>  		switch_schedules(q, &admin, &oper);
> -	}
>  
>  	/* This can happen in two cases: 1. this is the very first run
>  	 * of this function (i.e. we weren't running any schedule
> @@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>  	end_time = ktime_add_ns(entry->end_time, next->interval);
>  	end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
>  
> +	if (admin) {
> +		ktime_t new_base_time = sched_base_time(admin);
> +
> +		oper->cycle_time_correction =
> +			get_cycle_time_correction(oper, new_base_time,
> +						  end_time, next);
> +
> +		if (cycle_corr_active(oper->cycle_time_correction)) {
> +			/* The next entry is the last entry we will run from
> +			 * oper, subsequent ones will take from the new admin
> +			 */
> +			oper->cycle_end_time = new_base_time;
> +			end_time = new_base_time;
> +		}
> +	}
> +
>  	for (tc = 0; tc < num_tc; tc++) {
>  		if (next->gate_duration[tc] == oper->cycle_time)
>  			next->gate_close_time[tc] = KTIME_MAX;
> @@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>  								 next->gate_duration[tc]);
>  	}
>  
> -	if (should_change_schedules(admin, oper, end_time)) {
> -		/* Set things so the next time this runs, the new
> -		 * schedule runs.
> -		 */
> -		end_time = sched_base_time(admin);
> -		oper->cycle_time_correction = 0;
> -	}
> -
>  	next->end_time = end_time;
>  	taprio_set_budgets(q, oper, next);
>  
> -- 
> 2.25.1
>
Abdul Rahim, Faizal Nov. 15, 2023, 11:55 a.m. UTC | #2
On 9/11/2023 7:20 am, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:18AM -0500, Faizal Rahim wrote:
>> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
>> "the Cycle Time Extension variable allows this extension of the last old
>> cycle to be done in a defined way. If the last complete old cycle would
>> normally end less than OperCycleTimeExtension nanoseconds before the new
>> base time, then the last complete cycle before AdminBaseTime is reached
>> is extended so that it ends at AdminBaseTime."
> 
> So far, so good.
> 
>> The current taprio implementation does not extend the last old cycle in
>> the defined manner specified in the Qbv Spec. This is part of the fix
>> covered in this patch.
> 
> In the discussion on v1, I said that prior to commit a1e6ad30fa19
> ("net/sched: taprio: calculate guard band against actual TC gate close
> time"), the last entry's next->close_time actually used to include the
> oper schedule's correction, but it no longer does. This points to a
> regression, and not to something that was never there. Am I wrong?

That is correct, that patch you mentioned is a regression to some of the 
extension correction logic.

The regression caused by the patch you mentioned is resolved by 
("net/sched: taprio: update impacted fields during cycle time adjustment").
Here's a snippet:
		if (cycle_corr_active(oper->cycle_time_correction) &&
		    (next->gate_mask & BIT(tc)))
			next->gate_close_time[tc] = end_time;

The `end_time` here is already corrected. The corrected `gate_close_time` 
is then used by taprio_dequeue_from_txq() -> taprio_entry_allows_tx(), 
which now takes dynamic scheduling into account.


However, the extension logic had other issues even before that patch. 
Example, in should_change_schedules(), it didn't consider if the next entry 
is the last one in the oper cycle before extending the schedule.
Although this comes as no surprise as there was already a FIXME tag in 
should_change_schedules().


This patch primarily addresses the should_change_schedules() logic using 
the new get_cycle_time_correction().


>> Here are the changes made:
>>
>> 1. A new API, get_cycle_time_correction(), has been added to return the
> 
> I would call an "API" an interface between two distinct software layers,
> based on an agreed-upon contract. Not a function in sch_taprio.c which
> is called by another function in sch_taprio.c.

Alright, my use of the term "API" was a bit casual without much 
consideration – my mistake.

>> correction value. If it returns a non-initialize value, it indicates
>> changes required for the next entry schedule, and upon the completion
>> of the next entry's duration, entries will be loaded from the new admin
>> schedule.
> 
> This paragraph doesn't really help. It gets the reader lost in
> irrelevant details which are actually not that hard to deduce from the
> code with some good naming. Actually I find it poor naming to say
> "non-initialize value" when what you mean is "!= INIT_CYCLE_TIME_CORRECTION".
> I think I would name this a "specific" or "valid" cycle correction, when
> it takes a value different from CYCLE_TIME_CORRECTION_UNSPEC.

Will rename

>>
>> 2. Store correction values in cycle_time_correction:
>> a) Positive correction value/extension
>> We calculate the correction between the last operational cycle and the
>> new admin base time. Note that for positive correction to take place,
>> the next entry should be the last entry from oper and the new admin base
>> time falls within the next cycle time of old oper.
>>
>> b) Negative correction value
>> The new admin base time starts earlier than the next entry's end time.
>>
>> c) Zero correction value
>> The new admin base time aligns exactly with the old cycle.
>>
>> 3. When cycle_time_correction is set to a non-initialized value, it is
>> used to:
>> a) Indicate that cycle correction is active to trigger adjustments in
>> affected fields like interval and cycle_time. A new API,
>> cycle_corr_active(), has been created to help with this purpose.
> 
> Again, it's exaggerated to call this an "API".
> Although, what you can do is provide a kerneldoc-style comment above the
> functions which you wish to describe, and remove this explanation from
> the commit message.

okay

>>
>> b) Transition to the new admin schedule at the beginning of
>> advance_sched(). A new API, should_change_sched(), has been introduced
>> for this purpose.
> 
> This should have been done since patch 1, not here.

Understood. My bad - that would have been a better choice.

>> 4. Remove the previous definition of should_change_scheds() API. A new
>> should_change_sched() API has been introduced, but it serves a
>> completely different purpose than the one that was removed.
> 
> So don't name it the same!
> 
>>
>> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>>   net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
>>   1 file changed, 70 insertions(+), 35 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index dee103647823..ed32654b46f5 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
>>   	return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
>>   }
>>   
>> +static bool cycle_corr_active(s64 cycle_time_correction)
>> +{
>> +	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
>> +		return false;
>> +	else
>> +		return true;
>> +}
>> +
> 
> Could look like this:
> 
> static bool cycle_corr_active(struct sched_gate_list *oper)
> {
> 	return oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION;
> }
> 
>>   /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
>>    * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
>>    * the maximum open gate durations at the given link speed.
>> @@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
>>   	return false;
>>   }
>>   
>> -static bool should_change_schedules(const struct sched_gate_list *admin,
>> -				    const struct sched_gate_list *oper,
>> -				    ktime_t end_time)
>> +static bool should_change_sched(struct sched_gate_list *oper)
>>   {
>> -	ktime_t next_base_time, extension_time;
>> +	bool change_to_admin_sched = false;
>>   
>> -	if (!admin)
>> -		return false;
>> +	if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>> +		/* The recent entry ran is the last one from oper */
>> +		change_to_admin_sched = true;
>> +		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
> 
> Don't make a function called should_change_sched() stateful. Don't make
> this function reset the value of oper->cycle_time_correction, since that
> practically equates with actually starting the schedule change.
> 
> The oper->cycle_time_correction assignment actually belongs to
> switch_schedules(), in my opinion.
> 
> And if you make should_change_sched() stateless, then surprise-surprise,
> it will contain the exact same logic, and return the exact same thing,
> as cycle_corr_active(). I think that naming this single function
> sched_change_pending() and providing a kerneldoc comment as to why it is
> implemented the way it is should be sufficient.
> 

I intentionally made should_change_schedule() slightly different from 
cycle_corr_active() and, unfortunately, stateful by resetting 
oper->cycle_time_correction. My aim was to have two functions with clear 
names reflecting their intentions based on usage.

For instance, this:
	if (should_change_schedule(oper)) {
    		oper->cycle_end_time = new_base_time;
    		end_time = new_base_time;

is less intuitive than this:
	if (cycle_corr_active(oper->cycle_time_correction)) {
    		oper->cycle_end_time = new_base_time;
    		end_time = new_base_time;

And this:
	if (!oper || should_change_sched(oper))
	   	switch_schedules(q, &admin, &oper);

reads clearer than:
	if (!oper || cycle_corr_active(oper->cycle_time_correction))
		switch_schedules(q, &admin, &oper);


Normally I prefer clear function names that don't need comments for 
explanation. But I probably overthink it, seems to have more cons than 
pros. Will replace with a single sched_change_pending() as suggested, thanks.


>> +	}
>>   
>> -	next_base_time = sched_base_time(admin);
>> +	return change_to_admin_sched;
>> +}
>>   
>> -	/* This is the simple case, the end_time would fall after
>> -	 * the next schedule base_time.
>> -	 */
>> -	if (ktime_compare(next_base_time, end_time) <= 0)
>> +static bool should_extend_cycle(const struct sched_gate_list *oper,
>> +				ktime_t new_base_time,
>> +				ktime_t entry_end_time,
>> +				const struct sched_entry *entry)
>> +{
>> +	ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
>> +						   oper->cycle_time);
>> +	bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
> 
> "? true : false" is redundant. Just "extension_supported = oper->cycle_time_extension > 0"
> is enough.

Ooops sorry for this blunder. Will change.

>> +	s64 extension_limit = oper->cycle_time_extension;
>> +
>> +	if (extension_supported &&
>> +	    list_is_last(&entry->list, &oper->entries) &&
>> +	    ktime_before(new_base_time, next_cycle_end_time) &&
>> +	    ktime_sub(new_base_time, entry_end_time) < extension_limit)
>>   		return true;
>> +	else
>> +		return false;
> 
> Style nitpick:
> 
> 	return extension_supported &&
> 	       list_is_last(&entry->list, &oper->entries) &&
> 	       ktime_before(new_base_time, next_cycle_end_time) &&
> 	       ktime_sub(new_base_time, entry_end_time) < extension_limit;
> 
>> +}
>>   
>> -	/* This is the cycle_time_extension case, if the end_time
>> -	 * plus the amount that can be extended would fall after the
>> -	 * next schedule base_time, we can extend the current schedule
>> -	 * for that amount.
>> -	 */
>> -	extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
>> +static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
>> +				     ktime_t new_base_time,
>> +				     ktime_t entry_end_time,
>> +				     const struct sched_entry *entry)
>> +{
>> +	s64 correction = INIT_CYCLE_TIME_CORRECTION;
>>   
>> -	/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
>> -	 * how precisely the extension should be made. So after
>> -	 * conformance testing, this logic may change.
>> -	 */
>> -	if (ktime_compare(next_base_time, extension_time) <= 0)
>> -		return true;
>> +	if (!entry || !oper)
>> +		return correction;
> 
> This function is called as follows:
> 
> 	oper->cycle_time_correction = get_cycle_time_correction(oper, ...);
> 
> So, "oper" cannot be NULL if we dereference "oper" in the left hand side
> of the assignment and expect the kernel not to crash, no?
> 
> "entry" - assigned from the "next" variable in advance_sched() - should
> not be NULL either, from the way in which it is assigned.

My bad on the oper null check.
Will remove both.

>>   
>> -	return false;
>> +	if (ktime_compare(new_base_time, entry_end_time) <= 0) {
>> +		/* negative or zero correction */
> 
> At least for me, it would be helpful if you could transplant the
> explanation from the commit message ("The new admin base time starts
> earlier than the next entry's end time") into an expanded comment here.
> I am easily confused about the "ktime_compare(a, b) <= 0" construction.
> 
>> +		correction = ktime_sub(new_base_time, entry_end_time);
>> +	} else if (ktime_after(new_base_time, entry_end_time) &&
>> +		   should_extend_cycle(oper, new_base_time,
>> +				       entry_end_time, entry)) {
>> +		/* positive correction */
> 
> Same here - move the explanation from the commit message to the comment,
> please.

Will do.

> 
>> +		correction = ktime_sub(new_base_time, entry_end_time);
>> +	}
>> +
>> +	return correction;
>>   }
>>   
>>   static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> @@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>>   	admin = rcu_dereference_protected(q->admin_sched,
>>   					  lockdep_is_held(&q->current_entry_lock));
>>   
>> -	if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>> -		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>> +	if (!oper || should_change_sched(oper))
>>   		switch_schedules(q, &admin, &oper);
>> -	}
>>   
>>   	/* This can happen in two cases: 1. this is the very first run
>>   	 * of this function (i.e. we weren't running any schedule
>> @@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>>   	end_time = ktime_add_ns(entry->end_time, next->interval);
>>   	end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
>>   
>> +	if (admin) {
>> +		ktime_t new_base_time = sched_base_time(admin);
>> +
>> +		oper->cycle_time_correction =
>> +			get_cycle_time_correction(oper, new_base_time,
>> +						  end_time, next);
>> +
>> +		if (cycle_corr_active(oper->cycle_time_correction)) {
>> +			/* The next entry is the last entry we will run from
>> +			 * oper, subsequent ones will take from the new admin
>> +			 */
>> +			oper->cycle_end_time = new_base_time;
>> +			end_time = new_base_time;
>> +		}
>> +	}
>> +
>>   	for (tc = 0; tc < num_tc; tc++) {
>>   		if (next->gate_duration[tc] == oper->cycle_time)
>>   			next->gate_close_time[tc] = KTIME_MAX;
>> @@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>>   								 next->gate_duration[tc]);
>>   	}
>>   
>> -	if (should_change_schedules(admin, oper, end_time)) {
>> -		/* Set things so the next time this runs, the new
>> -		 * schedule runs.
>> -		 */
>> -		end_time = sched_base_time(admin);
>> -		oper->cycle_time_correction = 0;
>> -	}
>> -
>>   	next->end_time = end_time;
>>   	taprio_set_budgets(q, oper, next);
>>   
>> -- 
>> 2.25.1
>>
diff mbox series

Patch

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index dee103647823..ed32654b46f5 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -259,6 +259,14 @@  static int duration_to_length(struct taprio_sched *q, u64 duration)
 	return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
 }
 
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+		return false;
+	else
+		return true;
+}
+
 /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
  * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
  * the maximum open gate durations at the given link speed.
@@ -888,38 +896,59 @@  static bool should_restart_cycle(const struct sched_gate_list *oper,
 	return false;
 }
 
-static bool should_change_schedules(const struct sched_gate_list *admin,
-				    const struct sched_gate_list *oper,
-				    ktime_t end_time)
+static bool should_change_sched(struct sched_gate_list *oper)
 {
-	ktime_t next_base_time, extension_time;
+	bool change_to_admin_sched = false;
 
-	if (!admin)
-		return false;
+	if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
+		/* The recent entry ran is the last one from oper */
+		change_to_admin_sched = true;
+		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+	}
 
-	next_base_time = sched_base_time(admin);
+	return change_to_admin_sched;
+}
 
-	/* This is the simple case, the end_time would fall after
-	 * the next schedule base_time.
-	 */
-	if (ktime_compare(next_base_time, end_time) <= 0)
+static bool should_extend_cycle(const struct sched_gate_list *oper,
+				ktime_t new_base_time,
+				ktime_t entry_end_time,
+				const struct sched_entry *entry)
+{
+	ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
+						   oper->cycle_time);
+	bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
+	s64 extension_limit = oper->cycle_time_extension;
+
+	if (extension_supported &&
+	    list_is_last(&entry->list, &oper->entries) &&
+	    ktime_before(new_base_time, next_cycle_end_time) &&
+	    ktime_sub(new_base_time, entry_end_time) < extension_limit)
 		return true;
+	else
+		return false;
+}
 
-	/* This is the cycle_time_extension case, if the end_time
-	 * plus the amount that can be extended would fall after the
-	 * next schedule base_time, we can extend the current schedule
-	 * for that amount.
-	 */
-	extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
+static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
+				     ktime_t new_base_time,
+				     ktime_t entry_end_time,
+				     const struct sched_entry *entry)
+{
+	s64 correction = INIT_CYCLE_TIME_CORRECTION;
 
-	/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
-	 * how precisely the extension should be made. So after
-	 * conformance testing, this logic may change.
-	 */
-	if (ktime_compare(next_base_time, extension_time) <= 0)
-		return true;
+	if (!entry || !oper)
+		return correction;
 
-	return false;
+	if (ktime_compare(new_base_time, entry_end_time) <= 0) {
+		/* negative or zero correction */
+		correction = ktime_sub(new_base_time, entry_end_time);
+	} else if (ktime_after(new_base_time, entry_end_time) &&
+		   should_extend_cycle(oper, new_base_time,
+				       entry_end_time, entry)) {
+		/* positive correction */
+		correction = ktime_sub(new_base_time, entry_end_time);
+	}
+
+	return correction;
 }
 
 static enum hrtimer_restart advance_sched(struct hrtimer *timer)
@@ -942,10 +971,8 @@  static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 	admin = rcu_dereference_protected(q->admin_sched,
 					  lockdep_is_held(&q->current_entry_lock));
 
-	if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
-		oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+	if (!oper || should_change_sched(oper))
 		switch_schedules(q, &admin, &oper);
-	}
 
 	/* This can happen in two cases: 1. this is the very first run
 	 * of this function (i.e. we weren't running any schedule
@@ -972,6 +999,22 @@  static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 	end_time = ktime_add_ns(entry->end_time, next->interval);
 	end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
 
+	if (admin) {
+		ktime_t new_base_time = sched_base_time(admin);
+
+		oper->cycle_time_correction =
+			get_cycle_time_correction(oper, new_base_time,
+						  end_time, next);
+
+		if (cycle_corr_active(oper->cycle_time_correction)) {
+			/* The next entry is the last entry we will run from
+			 * oper, subsequent ones will take from the new admin
+			 */
+			oper->cycle_end_time = new_base_time;
+			end_time = new_base_time;
+		}
+	}
+
 	for (tc = 0; tc < num_tc; tc++) {
 		if (next->gate_duration[tc] == oper->cycle_time)
 			next->gate_close_time[tc] = KTIME_MAX;
@@ -980,14 +1023,6 @@  static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 								 next->gate_duration[tc]);
 	}
 
-	if (should_change_schedules(admin, oper, end_time)) {
-		/* Set things so the next time this runs, the new
-		 * schedule runs.
-		 */
-		end_time = sched_base_time(admin);
-		oper->cycle_time_correction = 0;
-	}
-
 	next->end_time = end_time;
 	taprio_set_budgets(q, oper, next);