diff mbox series

[v2,net,4/7] net/sched: taprio: get corrected value of cycle_time and interval

Message ID 20231107112023.676016-5-faizal.abdul.rahim@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series qbv cycle time extension/truncation | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1312 this patch: 1316
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang fail Errors and warnings before: 1340 this patch: 1343
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1340 this patch: 1344
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Abdul Rahim, Faizal Nov. 7, 2023, 11:20 a.m. UTC
Retrieve adjusted cycle_time and interval values through new APIs.
Note that in some cases where the original values are required,
such as in dump_schedule() and setup_first_end_time(), direct calls
to cycle_time and interval are retained without using the new APIs.

Added a new field, correction_active, in the sched_entry struct to
determine the entry's correction state. This field is required due
to specific flow like find_entry_to_transmit() -> get_interval_end_time()
which retrieves the interval for each entry. During positive cycle
time correction, it's known that the last entry interval requires
correction. However, for negative correction, the affected entry
is unknown, which is why this new field is necessary.

Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
 net/sched/sch_taprio.c | 50 ++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 14 deletions(-)

Comments

kernel test robot Nov. 7, 2023, 10:45 p.m. UTC | #1
Hi Faizal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843
base:   net/main
patch link:    https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com
patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311080506.qMlPx2WA-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/sched/sch_taprio.c:227:5: warning: no previous prototype for 'get_interval' [-Wmissing-prototypes]
     227 | u32 get_interval(const struct sched_entry *entry,
         |     ^~~~~~~~~~~~
>> net/sched/sch_taprio.c:236:5: warning: no previous prototype for 'get_cycle_time' [-Wmissing-prototypes]
     236 | s64 get_cycle_time(const struct sched_gate_list *oper)
         |     ^~~~~~~~~~~~~~


vim +/get_interval +227 net/sched/sch_taprio.c

   226	
 > 227	u32 get_interval(const struct sched_entry *entry,
   228			 const struct sched_gate_list *oper)
   229	{
   230		if (entry->correction_active)
   231			return entry->interval + oper->cycle_time_correction;
   232		else
   233			return entry->interval;
   234	}
   235	
 > 236	s64 get_cycle_time(const struct sched_gate_list *oper)
   237	{
   238		if (cycle_corr_active(oper->cycle_time_correction))
   239			return oper->cycle_time + oper->cycle_time_correction;
   240		else
   241			return oper->cycle_time;
   242	}
   243
Vladimir Oltean Nov. 9, 2023, 11:11 a.m. UTC | #2
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
> Retrieve adjusted cycle_time and interval values through new APIs.
> Note that in some cases where the original values are required,
> such as in dump_schedule() and setup_first_end_time(), direct calls
> to cycle_time and interval are retained without using the new APIs.
> 
> Added a new field, correction_active, in the sched_entry struct to
> determine the entry's correction state. This field is required due
> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
> which retrieves the interval for each entry. During positive cycle
> time correction, it's known that the last entry interval requires
> correction. However, for negative correction, the affected entry
> is unknown, which is why this new field is necessary.

I agree with the motivation, but I'm not sure if the chosen solution is
correct.

static u32 get_interval(const struct sched_entry *entry,
			const struct sched_gate_list *oper)
{
	if (entry->correction_active)
		return entry->interval + oper->cycle_time_correction;

	return entry->interval;
}

What if the schedule looks like this:

	sched-entry S 0x01 125000000
	sched-entry S 0x02 125000000
	sched-entry S 0x04 125000000
	sched-entry S 0x08 125000000
	sched-entry S 0x10 125000000
	sched-entry S 0x20 125000000
	sched-entry S 0x40 125000000
	sched-entry S 0x80 125000000

and the calculated cycle_time_correction is -200000000? That would
eliminate the entire last sched-entry (0x80), and the previous one
(0x40) would run for just 75000000 ns. But your calculation would say
that its interval is −75000000 ns (actually reported as an u32 positive
integer, so it would be a completely bogus value).

So not only is the affected entry unknown, but also the amount of cycle
time correction that applies to it is unknown.

I'm looking at where we need get_interval(), and it's from:

taprio_enqueue_one()
-> is_valid_interval()
   -> find_entry_to_transmit()
      -> get_interval_end_time()
-> get_packet_txtime()
   -> find_entry_to_transmit()

I admit it's a part of taprio which I don't understand too well. Why do
we perform such complex calculations in get_interval_end_time() when we
should have struct sched_entry :: end_time precomputed and available for
this purpose (although it was primarily inteded for advance_sched() and
not for enqueue())?

Vinicius, do you know?
Vladimir Oltean Nov. 9, 2023, 12:01 p.m. UTC | #3
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
> @@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q,
>  	*admin = NULL;
>  }
>  
> +static bool cycle_corr_active(s64 cycle_time_correction)
> +{
> +	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> +		return false;
> +	else
> +		return true;
> +}
> @@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
>  	return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
>  }
>  
> -static bool cycle_corr_active(s64 cycle_time_correction)
> -{
> -	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> -		return false;
> -	else
> -		return true;
> -}
> -

Don't move code around that you've introduced in earlier changes. Just
place it where it needs to be from the beginning.
kernel test robot Nov. 10, 2023, 7:15 p.m. UTC | #4
Hi Faizal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843
base:   net/main
patch link:    https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com
patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311110208.GT4trtEk-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/sched/sch_taprio.c:227:5: warning: no previous prototype for function 'get_interval' [-Wmissing-prototypes]
     227 | u32 get_interval(const struct sched_entry *entry,
         |     ^
   net/sched/sch_taprio.c:227:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     227 | u32 get_interval(const struct sched_entry *entry,
         | ^
         | static 
>> net/sched/sch_taprio.c:236:5: warning: no previous prototype for function 'get_cycle_time' [-Wmissing-prototypes]
     236 | s64 get_cycle_time(const struct sched_gate_list *oper)
         |     ^
   net/sched/sch_taprio.c:236:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     236 | s64 get_cycle_time(const struct sched_gate_list *oper)
         | ^
         | static 
   2 warnings generated.


vim +/get_interval +227 net/sched/sch_taprio.c

   226	
 > 227	u32 get_interval(const struct sched_entry *entry,
   228			 const struct sched_gate_list *oper)
   229	{
   230		if (entry->correction_active)
   231			return entry->interval + oper->cycle_time_correction;
   232		else
   233			return entry->interval;
   234	}
   235	
 > 236	s64 get_cycle_time(const struct sched_gate_list *oper)
   237	{
   238		if (cycle_corr_active(oper->cycle_time_correction))
   239			return oper->cycle_time + oper->cycle_time_correction;
   240		else
   241			return oper->cycle_time;
   242	}
   243
Abdul Rahim, Faizal Nov. 15, 2023, 11:55 a.m. UTC | #5
On 9/11/2023 7:11 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
>> Retrieve adjusted cycle_time and interval values through new APIs.
>> Note that in some cases where the original values are required,
>> such as in dump_schedule() and setup_first_end_time(), direct calls
>> to cycle_time and interval are retained without using the new APIs.
>>
>> Added a new field, correction_active, in the sched_entry struct to
>> determine the entry's correction state. This field is required due
>> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
>> which retrieves the interval for each entry. During positive cycle
>> time correction, it's known that the last entry interval requires
>> correction. However, for negative correction, the affected entry
>> is unknown, which is why this new field is necessary.
> 
> I agree with the motivation, but I'm not sure if the chosen solution is
> correct.
> 
> static u32 get_interval(const struct sched_entry *entry,
> 			const struct sched_gate_list *oper)
> {
> 	if (entry->correction_active)
> 		return entry->interval + oper->cycle_time_correction;
> 
> 	return entry->interval;
> }
> 
> What if the schedule looks like this:
> 
> 	sched-entry S 0x01 125000000
> 	sched-entry S 0x02 125000000
> 	sched-entry S 0x04 125000000
> 	sched-entry S 0x08 125000000
> 	sched-entry S 0x10 125000000
> 	sched-entry S 0x20 125000000
> 	sched-entry S 0x40 125000000
> 	sched-entry S 0x80 125000000
> 
> and the calculated cycle_time_correction is -200000000? That would
> eliminate the entire last sched-entry (0x80), and the previous one
> (0x40) would run for just 75000000 ns. But your calculation would say
> that its interval is −75000000 ns (actually reported as an u32 positive
> integer, so it would be a completely bogus value).
> 
> So not only is the affected entry unknown, but also the amount of cycle
> time correction that applies to it is unknown.
> 

Just an FYI, my cycle time extension test for sending packets fails without 
updating the interval and cycle_time – the duration doesn't extend 
properly. I only observe proper extension when this patch is included.

In patch series v1, interval and cycle_time were updated directly. However, 
due to concerns in v1 comments about updating the fields directly, v2 
doesn't do that.

Regarding the concern about negative correction exceeding the interval 
value, I've checked the logic in get_cycle_time_correction() that sets 
cycle_time_correction, I don't see the possibility of this happening.... 
Still, if it does, it suggests an error much earlier than the 
get_interval() call. So, I propose a failure check in 
get_cycle_time_correction(). If the correction value is negative and 
consumes the entire entry interval or more, we set the negative 
cycle_time_correction to some arbitrary value, maybe half of the interval, 
just to mitigate the impact of the unknown error that occurred earlier.

What do you think ?

> I'm looking at where we need get_interval(), and it's from:
> 
> taprio_enqueue_one()
> -> is_valid_interval()
>     -> find_entry_to_transmit()
>        -> get_interval_end_time()
> -> get_packet_txtime()
>     -> find_entry_to_transmit()
> 
> I admit it's a part of taprio which I don't understand too well. Why do
> we perform such complex calculations in get_interval_end_time() when we
> should have struct sched_entry :: end_time precomputed and available for
> this purpose (although it was primarily inteded for advance_sched() and
> not for enqueue())?
> 
> Vinicius, do you know?
Vinicius Costa Gomes Nov. 17, 2023, 2:36 a.m. UTC | #6
Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
>> Retrieve adjusted cycle_time and interval values through new APIs.
>> Note that in some cases where the original values are required,
>> such as in dump_schedule() and setup_first_end_time(), direct calls
>> to cycle_time and interval are retained without using the new APIs.
>> 
>> Added a new field, correction_active, in the sched_entry struct to
>> determine the entry's correction state. This field is required due
>> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
>> which retrieves the interval for each entry. During positive cycle
>> time correction, it's known that the last entry interval requires
>> correction. However, for negative correction, the affected entry
>> is unknown, which is why this new field is necessary.
>
> I agree with the motivation, but I'm not sure if the chosen solution is
> correct.
>
> static u32 get_interval(const struct sched_entry *entry,
> 			const struct sched_gate_list *oper)
> {
> 	if (entry->correction_active)
> 		return entry->interval + oper->cycle_time_correction;
>
> 	return entry->interval;
> }
>
> What if the schedule looks like this:
>
> 	sched-entry S 0x01 125000000
> 	sched-entry S 0x02 125000000
> 	sched-entry S 0x04 125000000
> 	sched-entry S 0x08 125000000
> 	sched-entry S 0x10 125000000
> 	sched-entry S 0x20 125000000
> 	sched-entry S 0x40 125000000
> 	sched-entry S 0x80 125000000
>
> and the calculated cycle_time_correction is -200000000? That would
> eliminate the entire last sched-entry (0x80), and the previous one
> (0x40) would run for just 75000000 ns. But your calculation would say
> that its interval is −75000000 ns (actually reported as an u32 positive
> integer, so it would be a completely bogus value).
>
> So not only is the affected entry unknown, but also the amount of cycle
> time correction that applies to it is unknown.
>
> I'm looking at where we need get_interval(), and it's from:
>
> taprio_enqueue_one()
> -> is_valid_interval()
>    -> find_entry_to_transmit()
>       -> get_interval_end_time()
> -> get_packet_txtime()
>    -> find_entry_to_transmit()
>
> I admit it's a part of taprio which I don't understand too well. Why do
> we perform such complex calculations in get_interval_end_time() when we
> should have struct sched_entry :: end_time precomputed and available for
> this purpose (although it was primarily inteded for advance_sched() and
> not for enqueue())?
>
> Vinicius, do you know?

Sorry for the delay, I thought that I went through all the messages in
this thread, but missed this one.

I think what is missing is some context, this series from Faizal also
includes fixes for taprio "txtime-assisted mode", where we try to
support for 801.1Qbv schedules, including cycle-extension and schedules
with arbitrary number of entries.

The basic idea is that during enqueue, taprio will calculate the txtime
of a packet so it "follows" the configured schedule, and pass that
packet to ETF, which is running as child of taprio. It is a bit of hack,
but it works well enough.

And I agree with your opinion, that this part of the code is
complicated. I have one permanent item on my todo list to spend some
quality time looking at it, and trying to make it simpler.

But fixing it to make it work with cycle-time-extension comes first.
Then, it's on me to not break it later.

Sorry for the rambling. Does this answer your question?


Cheers,
diff mbox series

Patch

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 119dec3bbe88..f18a5fe12f0c 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -61,6 +61,7 @@  struct sched_entry {
 	u32 gate_mask;
 	u32 interval;
 	u8 command;
+	bool correction_active;
 };
 
 struct sched_gate_list {
@@ -215,6 +216,31 @@  static void switch_schedules(struct taprio_sched *q,
 	*admin = NULL;
 }
 
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+		return false;
+	else
+		return true;
+}
+
+u32 get_interval(const struct sched_entry *entry,
+		 const struct sched_gate_list *oper)
+{
+	if (entry->correction_active)
+		return entry->interval + oper->cycle_time_correction;
+	else
+		return entry->interval;
+}
+
+s64 get_cycle_time(const struct sched_gate_list *oper)
+{
+	if (cycle_corr_active(oper->cycle_time_correction))
+		return oper->cycle_time + oper->cycle_time_correction;
+	else
+		return oper->cycle_time;
+}
+
 /* Get how much time has been already elapsed in the current cycle. */
 static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
 {
@@ -222,7 +248,7 @@  static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
 	s32 time_elapsed;
 
 	time_since_sched_start = ktime_sub(time, sched->base_time);
-	div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed);
+	div_s64_rem(time_since_sched_start, get_cycle_time(sched), &time_elapsed);
 
 	return time_elapsed;
 }
@@ -235,8 +261,9 @@  static ktime_t get_interval_end_time(struct sched_gate_list *sched,
 	s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start);
 	ktime_t intv_end, cycle_ext_end, cycle_end;
 
-	cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed);
-	intv_end = ktime_add_ns(intv_start, entry->interval);
+	cycle_end = ktime_add_ns(intv_start,
+				 get_cycle_time(sched) - cycle_elapsed);
+	intv_end = ktime_add_ns(intv_start, get_interval(entry, sched));
 	cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension);
 
 	if (ktime_before(intv_end, cycle_end))
@@ -259,14 +286,6 @@  static int duration_to_length(struct taprio_sched *q, u64 duration)
 	return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
 }
 
-static bool cycle_corr_active(s64 cycle_time_correction)
-{
-	if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
-		return false;
-	else
-		return true;
-}
-
 /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
  * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
  * the maximum open gate durations at the given link speed.
@@ -351,7 +370,7 @@  static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
 	if (!sched)
 		return NULL;
 
-	cycle = sched->cycle_time;
+	cycle = get_cycle_time(sched);
 	cycle_elapsed = get_cycle_time_elapsed(sched, time);
 	curr_intv_end = ktime_sub_ns(time, cycle_elapsed);
 	cycle_end = ktime_add_ns(curr_intv_end, cycle);
@@ -365,7 +384,7 @@  static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
 			break;
 
 		if (!(entry->gate_mask & BIT(tc)) ||
-		    packet_transmit_time > entry->interval)
+		    packet_transmit_time > get_interval(entry, sched))
 			continue;
 
 		txtime = entry->next_txtime;
@@ -543,7 +562,8 @@  static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
 		 * interval starts.
 		 */
 		if (ktime_after(transmit_end_time, interval_end))
-			entry->next_txtime = ktime_add(interval_start, sched->cycle_time);
+			entry->next_txtime =
+				ktime_add(interval_start, get_cycle_time(sched));
 	} while (sched_changed || ktime_after(transmit_end_time, interval_end));
 
 	entry->next_txtime = transmit_end_time;
@@ -1045,6 +1065,7 @@  static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 
 			oper->cycle_end_time = new_base_time;
 			end_time = new_base_time;
+			next->correction_active = true;
 
 			update_open_gate_duration(next, oper, num_tc,
 						  new_gate_duration);
@@ -1146,6 +1167,7 @@  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
 	}
 
 	entry->interval = interval;
+	entry->correction_active = false;
 
 	return 0;
 }