Message ID | 20230530082541.495-1-muhammad.husaini.zulkifli@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v1] net/sched: taprio: fix cycle time extension logic | expand |
On Tue, May 30, 2023 at 04:25:41PM +0800, Muhammad Husaini Zulkifli wrote: > From: Tan Tee Min <tee.min.tan@linux.intel.com> > > According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension, > the Cycle Time Extension variable allows this extension of the last old > cycle to be done in a defined way. If the last complete old cycle would > normally end less than OperCycleTimeExtension nanoseconds before the new > base time, then the last complete cycle before AdminBaseTime is reached > is extended so that it ends at AdminBaseTime. > > This patch extends the last entry of last complete cycle to AdminBaseTime. > > Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") > Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> > Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> > --- Thanks for the patch. I think that the commit message insufficiently describes the problems with the existing code. Also, the incorrect Fixes: tag suggests it may even have been incompletely characterized. Here are some additional thoughts of mine (also insufficient, since they are based on static code analysis, done now) which may nuance things a bit more, to change the Fixes tag and the shape of the proposed solution. Of the problems is that after my commit a1e6ad30fa19 ("net/sched: taprio: calculate guard band against actual TC gate close time"), the last entry of the old admin schedule stops being valid from taprio_dequeue_from_txq()'s perspective. Before that patch, taprio_dequeue_from_txq() would look at entry->close_time to decide whether packets are eligible to be sent or not. The old logic would take a cycle length correction into account like this: if (should_change_schedules(admin, oper, close_time)) { /* Set things so the next time this runs, the new * schedule runs. */ close_time = sched_base_time(admin); switch_schedules(q, &admin, &oper); } next->close_time = close_time; // contains the cycle length correction After that patch, taprio_dequeue_from_txq() -> taprio_entry_allows_tx() simply does not have logic to take a cycle time correction into consideration; it just looks at entry->gate_close_time[tc] which is determined from the previous entry->end_time plus the next->gate_duration[tc]. All entry->gate_duration[tc] values are calculated only once, during taprio_calculate_gate_durations(). Nowhere is a dynamic schedule change taken into account. Now, taprio_dequeue_from_txq() uses the following "entry" fields: gate_close_time[tc] and budget[tc]. They are both recalculated incorrectly by advance_sched(), which your change addresses. That is one issue which should be described properly, and a patch limited to that. Sure, you're modifying entry->gate_duration[] to account for the correction, but now it no longer matches this kind of checks: /* Traffic classes which never close have infinite budget */ if (entry->gate_duration[tc] == sched->cycle_time) budget = INT_MAX; so this is why your choice is to also update the cycle_time. An unfortunate consequence of that choice is that this might also become transiently visible to taprio_dump(), which I'm not totally convinced is desirable - because effectively, the cycle time shouldn't have changed. Although, true, the old oper schedule is going away soon, we don't really know how soon. The cycle times can be arbitrarily large. It would be great if we could save the correction into a dedicated "s64 correction" variable (ranging from -cycle_time+1 to +cycle_time_extension), and leave the cycle_time alone. So my patch is partly to blame for today's mishaps, which is something that this patch fails to identify properly. But taprio_enqueue() is also a problem, and that isn't addressed. It can be, that during a corrected cycle, frames which were oversized due to the queueMaxSDU restriction are transiently not oversized anymore, and should be allowed to pass - or the other way around (and this is worse): a frame which would have passed through a full-size window will not pass through a truncated last cycle (negative correction), and taprio_enqueue() will not detect this and will not drop the skb. taprio_update_queue_max_sdu() would need to be called, and that depends on an up-to-date sched->max_open_gate_duration[tc] which isn't currently recalculated. So, one way or another, taprio_calculate_gate_durations() also needs to be called again after a change to the schedule's correction. > net/sched/sch_taprio.c | 32 +++++++++++++++++++++----------- > 1 file changed, 21 insertions(+), 11 deletions(-) > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index 76db9a10ef504..ef487fef83fce 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -99,6 +99,7 @@ struct taprio_sched { > u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */ > u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */ > u32 txtime_delay; > + bool sched_changed; nitpick: sched_change_pending? > }; > > struct __tc_taprio_qopt_offload { > @@ -934,8 +935,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) > admin = rcu_dereference_protected(q->admin_sched, > lockdep_is_held(&q->current_entry_lock)); > > - if (!oper) > + if (!oper || q->sched_changed) { > + q->sched_changed = false; > switch_schedules(q, &admin, &oper); > + } > > /* This can happen in two cases: 1. this is the very first run > * of this function (i.e. we weren't running any schedule > @@ -962,20 +965,27 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) > end_time = ktime_add_ns(entry->end_time, next->interval); > end_time = min_t(ktime_t, end_time, oper->cycle_end_time); > > + if (should_change_schedules(admin, oper, oper->cycle_end_time) && > + list_is_last(&next->list, &oper->entries)) { > + u32 ori_interval = next->interval; The choice of operations seems unintuitive, when you could have simply done: ktime_t correction = ktime_sub(sched_base_time(admin), end_time); next->interval += correction; oper->cycle_time += correction; (or possibly save the correction instead, see the feedback above) > + > + next->interval += ktime_sub(sched_base_time(admin), end_time); > + oper->cycle_time += next->interval - ori_interval; > + end_time = sched_base_time(admin); > + oper->cycle_end_time = end_time; > + q->sched_changed = true; I see an issue here: you need to set q->sched_changed = true whenever should_change_schedules() returned true and not just for the last entry, correct? Because there could be a schedule change which needs to happens during entry 2 out of 4 of the current one (truncation case - negative correction). In that case, I believe that should_change_schedules() keeps shouting at you "change! change! change!" but you only call switch_schedules() when you've reached entry 4 with the "next" pointer, which is not what the standard suggests should be done. IIUC, the standard says that when an admin and an oper sched meet, the decision of what to do near the AdminBaseTime - whether to elongate the next-to-last cycle of the oper sched or to let the last cycle run but to cut it short - depends on the OperCycleTimeExtension. In a nutshell, that variable says what is the maximum positive correction applicable to the last sched entry and to the cycle time. If a positive correction larger than that would be necessary (relative to the next-to-last cycle), the decision is to just let the last cycle run for how long it can. > + } > + > for (tc = 0; tc < num_tc; tc++) { > - if (next->gate_duration[tc] == oper->cycle_time) > + if (next->gate_duration[tc] == oper->cycle_time) { > next->gate_close_time[tc] = KTIME_MAX; > - else > + } else if (q->sched_changed && next->gate_duration[tc]) { > + next->gate_close_time[tc] = end_time; > + next->gate_duration[tc] = next->interval; This deserves a comment because, although I understand what it intends to do, I fail to understand if it will work or not :) > + } else { > next->gate_close_time[tc] = ktime_add_ns(entry->end_time, > next->gate_duration[tc]); > - } > - > - if (should_change_schedules(admin, oper, end_time)) { > - /* Set things so the next time this runs, the new > - * schedule runs. > - */ > - end_time = sched_base_time(admin); > - switch_schedules(q, &admin, &oper); I guess this is also a separate issue with the code. switch_schedules() changes q->oper_sched too early, and taprio_skb_exceeds_queue_max_sdu() looks at q->oper_sched. So during an extension period, the queueMaxSDU of the new schedule will be applied instead of the (extended) old one. > + } > } > > next->end_time = end_time; > -- > 2.17.1 > I guess at some point we should also fix up this comment? /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about * how precisely the extension should be made. So after * conformance testing, this logic may change. */ if (ktime_compare(next_base_time, extension_time) <= 0) return true;
On Tue, May 30, 2023 at 10:47:08PM +0300, Vladimir Oltean wrote: > On Tue, May 30, 2023 at 04:25:41PM +0800, Muhammad Husaini Zulkifli wrote: > > From: Tan Tee Min <tee.min.tan@linux.intel.com> > > > I think that the commit message insufficiently describes the problems > with the existing code. Also, the incorrect Fixes: tag suggests it may > even have been incompletely characterized. > > Here are some additional thoughts of mine (also insufficient, since they > are based on static code analysis, done now) which may nuance things a > bit more, to change the Fixes tag and the shape of the proposed solution. Thanks a lot for the reviews! > > Now, taprio_dequeue_from_txq() uses the following "entry" fields: > gate_close_time[tc] and budget[tc]. They are both recalculated > incorrectly by advance_sched(), which your change addresses. That is one > issue which should be described properly, and a patch limited to that. Do you mean a separate patch to fix the recalculation issue for gate_close_time[tc] and budget[tc] in advance_sched()? > > so this is why your choice is to also update the cycle_time. An unfortunate > consequence of that choice is that this might also become transiently visible > to taprio_dump(), which I'm not totally convinced is desirable - because > effectively, the cycle time shouldn't have changed. Although, true, the > old oper schedule is going away soon, we don't really know how soon. The > cycle times can be arbitrarily large. It would be great if we could save > the correction into a dedicated "s64 correction" variable (ranging from > -cycle_time+1 to +cycle_time_extension), and leave the cycle_time alone. That makes sense. I will save the correction into a dedicated "correction" variable, and make it use with the cycle time at other places, and not update the oper cycle_time. So that the taprio_dump() still shows the original oper cycle_time during the schedule change pending period. > > But taprio_enqueue() is also a problem, and that isn't addressed. It can be, > that during a corrected cycle, frames which were oversized due to the > queueMaxSDU restriction are transiently not oversized anymore, and > should be allowed to pass - or the other way around (and this is worse): > a frame which would have passed through a full-size window will not pass > through a truncated last cycle (negative correction), and taprio_enqueue() > will not detect this and will not drop the skb. > > taprio_update_queue_max_sdu() would need to be called, and that depends > on an up-to-date sched->max_open_gate_duration[tc] which isn't currently > recalculated. > > So, one way or another, taprio_calculate_gate_durations() also needs to > be called again after a change to the schedule's correction. Yape, the queueMaxSDU and max_open_gate_durations() both should need to be updated too. I will correct it in v2 patch by adding in the taprio_update_queue_max_sdu() and taprio_calculate_gate_durations() after the schedule change. > > > net/sched/sch_taprio.c | 32 +++++++++++++++++++++----------- > > 1 file changed, 21 insertions(+), 11 deletions(-) > > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > > index 76db9a10ef504..ef487fef83fce 100644 > > --- a/net/sched/sch_taprio.c > > +++ b/net/sched/sch_taprio.c > > @@ -99,6 +99,7 @@ struct taprio_sched { > > u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */ > > u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */ > > u32 txtime_delay; > > + bool sched_changed; > > nitpick: sched_change_pending? Noted. Will change it to sched_change_pending. > > The choice of operations seems unintuitive, when you could have simply > done: > > ktime_t correction = ktime_sub(sched_base_time(admin), end_time); > > next->interval += correction; > oper->cycle_time += correction; > > (or possibly save the correction instead, see the feedback above) Thanks. Feedback taken. > I see an issue here: you need to set q->sched_changed = true whenever > should_change_schedules() returned true and not just for the last entry, > correct? Because there could be a schedule change which needs to happens > during entry 2 out of 4 of the current one (truncation case - negative > correction). In that case, I believe that should_change_schedules() > keeps shouting at you "change! change! change!" but you only call > switch_schedules() when you've reached entry 4 with the "next" pointer, > which is not what the standard suggests should be done. > > IIUC, the standard says that when an admin and an oper sched meet, the > decision of what to do near the AdminBaseTime - whether to elongate the > next-to-last cycle of the oper sched or to let the last cycle run but to > cut it short - depends on the OperCycleTimeExtension. In a nutshell, > that variable says what is the maximum positive correction applicable to > the last sched entry and to the cycle time. If a positive correction > larger than that would be necessary (relative to the next-to-last cycle), > the decision is to just let the last cycle run for how long it can. Based on my understanding, here are the two cases when the Oper and Admin cycle boundaries don’t line up perfectly:- 1/ The final Oper cycle before first Admin cycle is smaller than OperCycleTimeExtension: - Then extend the final oper cycle rather than restart a very short final oper cycle. 2/ The final Oper cycle before first Admin cycle is greater than OperCycleTimeExtension: - Then it won't extend the final Oper cycle and restart the final Oper cycle instead, then it will be truncated at Admin base time. I think you are saying the scenario 2/ above, right? Let me rework the solution and come back with the proper fixes. > I guess at some point we should also fix up this comment? > > /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about > * how precisely the extension should be made. So after > * conformance testing, this logic may change. > */ > if (ktime_compare(next_base_time, extension_time) <= 0) > return true; Agree. Let me fix up this comment in next patch. Thanks, Tee Min
On Fri, Jun 02, 2023 at 03:14:06PM +0800, Tan Tee Min wrote: > Do you mean a separate patch to fix the recalculation issue for > gate_close_time[tc] and budget[tc] in advance_sched()? Yes. You might be asking: "separate from what"? I notice one other unrelated change in your patch is to delay the advance_sched() -> switch_schedules() call via this new sched_change_pending variable, and that is probably a good change. But it needs its own patch with its own context, problem description and solution description. > > I see an issue here: you need to set q->sched_changed = true whenever > > should_change_schedules() returned true and not just for the last entry, > > correct? Because there could be a schedule change which needs to happens > > during entry 2 out of 4 of the current one (truncation case - negative > > correction). In that case, I believe that should_change_schedules() > > keeps shouting at you "change! change! change!" but you only call > > switch_schedules() when you've reached entry 4 with the "next" pointer, > > which is not what the standard suggests should be done. > > > > IIUC, the standard says that when an admin and an oper sched meet, the > > decision of what to do near the AdminBaseTime - whether to elongate the > > next-to-last cycle of the oper sched or to let the last cycle run but to > > cut it short - depends on the OperCycleTimeExtension. In a nutshell, > > that variable says what is the maximum positive correction applicable to > > the last sched entry and to the cycle time. If a positive correction > > larger than that would be necessary (relative to the next-to-last cycle), > > the decision is to just let the last cycle run for how long it can. > > Based on my understanding, here are the two cases when the Oper and Admin > cycle boundaries don’t line up perfectly:- > 1/ The final Oper cycle before first Admin cycle is smaller than > OperCycleTimeExtension: > - Then extend the final oper cycle rather than restart a very short final > oper cycle. Yes, this should result in a positive correction - a new cycle is not begun, but the last entry of the next-to-last cycle just lasts longer. > 2/ The final Oper cycle before first Admin cycle is greater than > OperCycleTimeExtension: > - Then it won't extend the final Oper cycle and restart the final Oper > cycle instead, then it will be truncated at Admin base time. Yes, this should result in a negative correction - a new cycle is begun, but whose effective length will be shorter because it will be truncated as you say. > I think you are saying the scenario 2/ above, right? > Let me rework the solution and come back with the proper fixes. Yes, I'm saying that in situation 2, you're not allowing the schedule to be truncated where it should be, I believe.
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 76db9a10ef504..ef487fef83fce 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -99,6 +99,7 @@ struct taprio_sched { u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */ u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */ u32 txtime_delay; + bool sched_changed; }; struct __tc_taprio_qopt_offload { @@ -934,8 +935,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) admin = rcu_dereference_protected(q->admin_sched, lockdep_is_held(&q->current_entry_lock)); - if (!oper) + if (!oper || q->sched_changed) { + q->sched_changed = false; switch_schedules(q, &admin, &oper); + } /* This can happen in two cases: 1. this is the very first run * of this function (i.e. we weren't running any schedule @@ -962,20 +965,27 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) end_time = ktime_add_ns(entry->end_time, next->interval); end_time = min_t(ktime_t, end_time, oper->cycle_end_time); + if (should_change_schedules(admin, oper, oper->cycle_end_time) && + list_is_last(&next->list, &oper->entries)) { + u32 ori_interval = next->interval; + + next->interval += ktime_sub(sched_base_time(admin), end_time); + oper->cycle_time += next->interval - ori_interval; + end_time = sched_base_time(admin); + oper->cycle_end_time = end_time; + q->sched_changed = true; + } + for (tc = 0; tc < num_tc; tc++) { - if (next->gate_duration[tc] == oper->cycle_time) + if (next->gate_duration[tc] == oper->cycle_time) { next->gate_close_time[tc] = KTIME_MAX; - else + } else if (q->sched_changed && next->gate_duration[tc]) { + next->gate_close_time[tc] = end_time; + next->gate_duration[tc] = next->interval; + } else { next->gate_close_time[tc] = ktime_add_ns(entry->end_time, next->gate_duration[tc]); - } - - if (should_change_schedules(admin, oper, end_time)) { - /* Set things so the next time this runs, the new - * schedule runs. - */ - end_time = sched_base_time(admin); - switch_schedules(q, &admin, &oper); + } } next->end_time = end_time;