Message ID | 20231107112023.676016-5-faizal.abdul.rahim@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | qbv cycle time extension/truncation | expand |
Hi Faizal, kernel test robot noticed the following build warnings: [auto build test WARNING on net/main] url: https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843 base: net/main patch link: https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/config) compiler: powerpc64-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202311080506.qMlPx2WA-lkp@intel.com/ All warnings (new ones prefixed by >>): >> net/sched/sch_taprio.c:227:5: warning: no previous prototype for 'get_interval' [-Wmissing-prototypes] 227 | u32 get_interval(const struct sched_entry *entry, | ^~~~~~~~~~~~ >> net/sched/sch_taprio.c:236:5: warning: no previous prototype for 'get_cycle_time' [-Wmissing-prototypes] 236 | s64 get_cycle_time(const struct sched_gate_list *oper) | ^~~~~~~~~~~~~~ vim +/get_interval +227 net/sched/sch_taprio.c 226 > 227 u32 get_interval(const struct sched_entry *entry, 228 const struct sched_gate_list *oper) 229 { 230 if (entry->correction_active) 231 return entry->interval + oper->cycle_time_correction; 232 else 233 return entry->interval; 234 } 235 > 236 s64 get_cycle_time(const struct sched_gate_list *oper) 237 { 238 if (cycle_corr_active(oper->cycle_time_correction)) 239 return oper->cycle_time + oper->cycle_time_correction; 240 else 241 return oper->cycle_time; 242 } 243
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote: > Retrieve adjusted cycle_time and interval values through new APIs. > Note that in some cases where the original values are required, > such as in dump_schedule() and setup_first_end_time(), direct calls > to cycle_time and interval are retained without using the new APIs. > > Added a new field, correction_active, in the sched_entry struct to > determine the entry's correction state. This field is required due > to specific flow like find_entry_to_transmit() -> get_interval_end_time() > which retrieves the interval for each entry. During positive cycle > time correction, it's known that the last entry interval requires > correction. However, for negative correction, the affected entry > is unknown, which is why this new field is necessary. I agree with the motivation, but I'm not sure if the chosen solution is correct. static u32 get_interval(const struct sched_entry *entry, const struct sched_gate_list *oper) { if (entry->correction_active) return entry->interval + oper->cycle_time_correction; return entry->interval; } What if the schedule looks like this: sched-entry S 0x01 125000000 sched-entry S 0x02 125000000 sched-entry S 0x04 125000000 sched-entry S 0x08 125000000 sched-entry S 0x10 125000000 sched-entry S 0x20 125000000 sched-entry S 0x40 125000000 sched-entry S 0x80 125000000 and the calculated cycle_time_correction is -200000000? That would eliminate the entire last sched-entry (0x80), and the previous one (0x40) would run for just 75000000 ns. But your calculation would say that its interval is −75000000 ns (actually reported as an u32 positive integer, so it would be a completely bogus value). So not only is the affected entry unknown, but also the amount of cycle time correction that applies to it is unknown. I'm looking at where we need get_interval(), and it's from: taprio_enqueue_one() -> is_valid_interval() -> find_entry_to_transmit() -> get_interval_end_time() -> get_packet_txtime() -> find_entry_to_transmit() I admit it's a part of taprio which I don't understand too well. Why do we perform such complex calculations in get_interval_end_time() when we should have struct sched_entry :: end_time precomputed and available for this purpose (although it was primarily inteded for advance_sched() and not for enqueue())? Vinicius, do you know?
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote: > @@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q, > *admin = NULL; > } > > +static bool cycle_corr_active(s64 cycle_time_correction) > +{ > + if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION) > + return false; > + else > + return true; > +} > @@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration) > return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte)); > } > > -static bool cycle_corr_active(s64 cycle_time_correction) > -{ > - if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION) > - return false; > - else > - return true; > -} > - Don't move code around that you've introduced in earlier changes. Just place it where it needs to be from the beginning.
Hi Faizal, kernel test robot noticed the following build warnings: [auto build test WARNING on net/main] url: https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843 base: net/main patch link: https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/config) compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202311110208.GT4trtEk-lkp@intel.com/ All warnings (new ones prefixed by >>): >> net/sched/sch_taprio.c:227:5: warning: no previous prototype for function 'get_interval' [-Wmissing-prototypes] 227 | u32 get_interval(const struct sched_entry *entry, | ^ net/sched/sch_taprio.c:227:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 227 | u32 get_interval(const struct sched_entry *entry, | ^ | static >> net/sched/sch_taprio.c:236:5: warning: no previous prototype for function 'get_cycle_time' [-Wmissing-prototypes] 236 | s64 get_cycle_time(const struct sched_gate_list *oper) | ^ net/sched/sch_taprio.c:236:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 236 | s64 get_cycle_time(const struct sched_gate_list *oper) | ^ | static 2 warnings generated. vim +/get_interval +227 net/sched/sch_taprio.c 226 > 227 u32 get_interval(const struct sched_entry *entry, 228 const struct sched_gate_list *oper) 229 { 230 if (entry->correction_active) 231 return entry->interval + oper->cycle_time_correction; 232 else 233 return entry->interval; 234 } 235 > 236 s64 get_cycle_time(const struct sched_gate_list *oper) 237 { 238 if (cycle_corr_active(oper->cycle_time_correction)) 239 return oper->cycle_time + oper->cycle_time_correction; 240 else 241 return oper->cycle_time; 242 } 243
On 9/11/2023 7:11 pm, Vladimir Oltean wrote: > On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote: >> Retrieve adjusted cycle_time and interval values through new APIs. >> Note that in some cases where the original values are required, >> such as in dump_schedule() and setup_first_end_time(), direct calls >> to cycle_time and interval are retained without using the new APIs. >> >> Added a new field, correction_active, in the sched_entry struct to >> determine the entry's correction state. This field is required due >> to specific flow like find_entry_to_transmit() -> get_interval_end_time() >> which retrieves the interval for each entry. During positive cycle >> time correction, it's known that the last entry interval requires >> correction. However, for negative correction, the affected entry >> is unknown, which is why this new field is necessary. > > I agree with the motivation, but I'm not sure if the chosen solution is > correct. > > static u32 get_interval(const struct sched_entry *entry, > const struct sched_gate_list *oper) > { > if (entry->correction_active) > return entry->interval + oper->cycle_time_correction; > > return entry->interval; > } > > What if the schedule looks like this: > > sched-entry S 0x01 125000000 > sched-entry S 0x02 125000000 > sched-entry S 0x04 125000000 > sched-entry S 0x08 125000000 > sched-entry S 0x10 125000000 > sched-entry S 0x20 125000000 > sched-entry S 0x40 125000000 > sched-entry S 0x80 125000000 > > and the calculated cycle_time_correction is -200000000? That would > eliminate the entire last sched-entry (0x80), and the previous one > (0x40) would run for just 75000000 ns. But your calculation would say > that its interval is −75000000 ns (actually reported as an u32 positive > integer, so it would be a completely bogus value). > > So not only is the affected entry unknown, but also the amount of cycle > time correction that applies to it is unknown. > Just an FYI, my cycle time extension test for sending packets fails without updating the interval and cycle_time – the duration doesn't extend properly. I only observe proper extension when this patch is included. In patch series v1, interval and cycle_time were updated directly. However, due to concerns in v1 comments about updating the fields directly, v2 doesn't do that. Regarding the concern about negative correction exceeding the interval value, I've checked the logic in get_cycle_time_correction() that sets cycle_time_correction, I don't see the possibility of this happening.... Still, if it does, it suggests an error much earlier than the get_interval() call. So, I propose a failure check in get_cycle_time_correction(). If the correction value is negative and consumes the entire entry interval or more, we set the negative cycle_time_correction to some arbitrary value, maybe half of the interval, just to mitigate the impact of the unknown error that occurred earlier. What do you think ? > I'm looking at where we need get_interval(), and it's from: > > taprio_enqueue_one() > -> is_valid_interval() > -> find_entry_to_transmit() > -> get_interval_end_time() > -> get_packet_txtime() > -> find_entry_to_transmit() > > I admit it's a part of taprio which I don't understand too well. Why do > we perform such complex calculations in get_interval_end_time() when we > should have struct sched_entry :: end_time precomputed and available for > this purpose (although it was primarily inteded for advance_sched() and > not for enqueue())? > > Vinicius, do you know?
Hi Vladimir, Vladimir Oltean <vladimir.oltean@nxp.com> writes: > On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote: >> Retrieve adjusted cycle_time and interval values through new APIs. >> Note that in some cases where the original values are required, >> such as in dump_schedule() and setup_first_end_time(), direct calls >> to cycle_time and interval are retained without using the new APIs. >> >> Added a new field, correction_active, in the sched_entry struct to >> determine the entry's correction state. This field is required due >> to specific flow like find_entry_to_transmit() -> get_interval_end_time() >> which retrieves the interval for each entry. During positive cycle >> time correction, it's known that the last entry interval requires >> correction. However, for negative correction, the affected entry >> is unknown, which is why this new field is necessary. > > I agree with the motivation, but I'm not sure if the chosen solution is > correct. > > static u32 get_interval(const struct sched_entry *entry, > const struct sched_gate_list *oper) > { > if (entry->correction_active) > return entry->interval + oper->cycle_time_correction; > > return entry->interval; > } > > What if the schedule looks like this: > > sched-entry S 0x01 125000000 > sched-entry S 0x02 125000000 > sched-entry S 0x04 125000000 > sched-entry S 0x08 125000000 > sched-entry S 0x10 125000000 > sched-entry S 0x20 125000000 > sched-entry S 0x40 125000000 > sched-entry S 0x80 125000000 > > and the calculated cycle_time_correction is -200000000? That would > eliminate the entire last sched-entry (0x80), and the previous one > (0x40) would run for just 75000000 ns. But your calculation would say > that its interval is −75000000 ns (actually reported as an u32 positive > integer, so it would be a completely bogus value). > > So not only is the affected entry unknown, but also the amount of cycle > time correction that applies to it is unknown. > > I'm looking at where we need get_interval(), and it's from: > > taprio_enqueue_one() > -> is_valid_interval() > -> find_entry_to_transmit() > -> get_interval_end_time() > -> get_packet_txtime() > -> find_entry_to_transmit() > > I admit it's a part of taprio which I don't understand too well. Why do > we perform such complex calculations in get_interval_end_time() when we > should have struct sched_entry :: end_time precomputed and available for > this purpose (although it was primarily inteded for advance_sched() and > not for enqueue())? > > Vinicius, do you know? Sorry for the delay, I thought that I went through all the messages in this thread, but missed this one. I think what is missing is some context, this series from Faizal also includes fixes for taprio "txtime-assisted mode", where we try to support for 801.1Qbv schedules, including cycle-extension and schedules with arbitrary number of entries. The basic idea is that during enqueue, taprio will calculate the txtime of a packet so it "follows" the configured schedule, and pass that packet to ETF, which is running as child of taprio. It is a bit of hack, but it works well enough. And I agree with your opinion, that this part of the code is complicated. I have one permanent item on my todo list to spend some quality time looking at it, and trying to make it simpler. But fixing it to make it work with cycle-time-extension comes first. Then, it's on me to not break it later. Sorry for the rambling. Does this answer your question? Cheers,
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 119dec3bbe88..f18a5fe12f0c 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -61,6 +61,7 @@ struct sched_entry { u32 gate_mask; u32 interval; u8 command; + bool correction_active; }; struct sched_gate_list { @@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q, *admin = NULL; } +static bool cycle_corr_active(s64 cycle_time_correction) +{ + if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION) + return false; + else + return true; +} + +u32 get_interval(const struct sched_entry *entry, + const struct sched_gate_list *oper) +{ + if (entry->correction_active) + return entry->interval + oper->cycle_time_correction; + else + return entry->interval; +} + +s64 get_cycle_time(const struct sched_gate_list *oper) +{ + if (cycle_corr_active(oper->cycle_time_correction)) + return oper->cycle_time + oper->cycle_time_correction; + else + return oper->cycle_time; +} + /* Get how much time has been already elapsed in the current cycle. */ static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time) { @@ -222,7 +248,7 @@ static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time) s32 time_elapsed; time_since_sched_start = ktime_sub(time, sched->base_time); - div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed); + div_s64_rem(time_since_sched_start, get_cycle_time(sched), &time_elapsed); return time_elapsed; } @@ -235,8 +261,9 @@ static ktime_t get_interval_end_time(struct sched_gate_list *sched, s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start); ktime_t intv_end, cycle_ext_end, cycle_end; - cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed); - intv_end = ktime_add_ns(intv_start, entry->interval); + cycle_end = ktime_add_ns(intv_start, + get_cycle_time(sched) - cycle_elapsed); + intv_end = ktime_add_ns(intv_start, get_interval(entry, sched)); cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension); if (ktime_before(intv_end, cycle_end)) @@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration) return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte)); } -static bool cycle_corr_active(s64 cycle_time_correction) -{ - if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION) - return false; - else - return true; -} - /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the * q->max_sdu[] requested by the user and the max_sdu dynamically determined by * the maximum open gate durations at the given link speed. @@ -351,7 +370,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb, if (!sched) return NULL; - cycle = sched->cycle_time; + cycle = get_cycle_time(sched); cycle_elapsed = get_cycle_time_elapsed(sched, time); curr_intv_end = ktime_sub_ns(time, cycle_elapsed); cycle_end = ktime_add_ns(curr_intv_end, cycle); @@ -365,7 +384,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb, break; if (!(entry->gate_mask & BIT(tc)) || - packet_transmit_time > entry->interval) + packet_transmit_time > get_interval(entry, sched)) continue; txtime = entry->next_txtime; @@ -543,7 +562,8 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch) * interval starts. */ if (ktime_after(transmit_end_time, interval_end)) - entry->next_txtime = ktime_add(interval_start, sched->cycle_time); + entry->next_txtime = + ktime_add(interval_start, get_cycle_time(sched)); } while (sched_changed || ktime_after(transmit_end_time, interval_end)); entry->next_txtime = transmit_end_time; @@ -1045,6 +1065,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) oper->cycle_end_time = new_base_time; end_time = new_base_time; + next->correction_active = true; update_open_gate_duration(next, oper, num_tc, new_gate_duration); @@ -1146,6 +1167,7 @@ static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb, } entry->interval = interval; + entry->correction_active = false; return 0; }
Retrieve adjusted cycle_time and interval values through new APIs. Note that in some cases where the original values are required, such as in dump_schedule() and setup_first_end_time(), direct calls to cycle_time and interval are retained without using the new APIs. Added a new field, correction_active, in the sched_entry struct to determine the entry's correction state. This field is required due to specific flow like find_entry_to_transmit() -> get_interval_end_time() which retrieves the interval for each entry. During positive cycle time correction, it's known that the last entry interval requires correction. However, for negative correction, the affected entry is unknown, which is why this new field is necessary. Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> --- net/sched/sch_taprio.c | 50 ++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-)