diff mbox series

[RFC,net] net/sched: taprio: account for L1 overhead when calculating transmit time

Message ID 20220505160357.298794-1-vladimir.oltean@nxp.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net] net/sched: taprio: account for L1 overhead when calculating transmit time | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 32 this patch: 32
netdev/cc_maintainers fail 1 blamed authors not CCed: vedang.patel@intel.com; 4 maintainers not CCed: vedang.patel@intel.com jiri@resnulli.us xiyou.wangcong@gmail.com jhs@mojatatu.com
netdev/build_clang success Errors and warnings before: 12 this patch: 12
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 30 this patch: 30
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean May 5, 2022, 4:03 p.m. UTC
The taprio scheduler underestimates the packet transmission time, which
means that packets can be scheduled for transmission in time slots in
which they are never going to fit.

When this function was added in commit 4cfd5779bd6e ("taprio: Add
support for txtime-assist mode"), the only implication was that time
triggered packets would overrun its time slot and eat from the next one,
because with txtime-assist there isn't really any emulation of a "gate
close" event that would stop a packet from being transmitted.

However, commit b5b73b26b3ca ("taprio: Fix allowing too small
intervals") started using this function too, in all modes of operation
(software, txtime-assist and full offload). So we now accept time slots
which we know we won't be ever able to fulfill.

It's difficult to say which issue is more pressing, I'd say both are
visible with testing, even though the second would be more obvious
because of a black&white result (trying to send small packets in an
insufficiently large window blocks the queue).

Issue found through code inspection, the code was not even compile
tested.

The L1 overhead chosen here is an approximation, because various network
equipment has configurable IFG, however I don't think Linux is aware of
this.

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_taprio.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Vinicius Costa Gomes May 5, 2022, 5:25 p.m. UTC | #1
Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> The taprio scheduler underestimates the packet transmission time, which
> means that packets can be scheduled for transmission in time slots in
> which they are never going to fit.
>
> When this function was added in commit 4cfd5779bd6e ("taprio: Add
> support for txtime-assist mode"), the only implication was that time
> triggered packets would overrun its time slot and eat from the next one,
> because with txtime-assist there isn't really any emulation of a "gate
> close" event that would stop a packet from being transmitted.
>
> However, commit b5b73b26b3ca ("taprio: Fix allowing too small
> intervals") started using this function too, in all modes of operation
> (software, txtime-assist and full offload). So we now accept time slots
> which we know we won't be ever able to fulfill.
>
> It's difficult to say which issue is more pressing, I'd say both are
> visible with testing, even though the second would be more obvious
> because of a black&white result (trying to send small packets in an
> insufficiently large window blocks the queue).
>
> Issue found through code inspection, the code was not even compile
> tested.
>
> The L1 overhead chosen here is an approximation, because various network
> equipment has configurable IFG, however I don't think Linux is aware of
> this.

When testing CBS, I remember using tc-stab: 

https://man7.org/linux/man-pages/man8/tc-stab.8.html

To set the 'overhead' to some value.

That value should be used in the calculation.

I agree that it's not ideal, in the ideal world we would have a way to
retrieve the link overhead from the netdevice. But I would think that it
gets complicated really quickly when using netdevices that are not
Ethernet-based.

>
> Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/sched/sch_taprio.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index b9c71a304d39..8c8681c37d4f 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -176,7 +176,10 @@ static ktime_t get_interval_end_time(struct sched_gate_list *sched,
>  
>  static int length_to_duration(struct taprio_sched *q, int len)
>  {
> -	return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
> +	/* The duration of frame transmission should account for L1 overhead
> +	 * (12 octets IFG, 7 octets of preamble, 1 octet SFD, 4 octets FCS)
> +	 */
> +	return div_u64((24 + len) * atomic64_read(&q->picos_per_byte), 1000);
>  }
>  
>  /* Returns the entry corresponding to next available interval. If
> -- 
> 2.25.1
>
Vladimir Oltean May 5, 2022, 7:22 p.m. UTC | #2
On Thu, May 05, 2022 at 10:25:44AM -0700, Vinicius Costa Gomes wrote:
> Hi Vladimir,
> 
> Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> 
> > The taprio scheduler underestimates the packet transmission time, which
> > means that packets can be scheduled for transmission in time slots in
> > which they are never going to fit.
> >
> > When this function was added in commit 4cfd5779bd6e ("taprio: Add
> > support for txtime-assist mode"), the only implication was that time
> > triggered packets would overrun its time slot and eat from the next one,
> > because with txtime-assist there isn't really any emulation of a "gate
> > close" event that would stop a packet from being transmitted.
> >
> > However, commit b5b73b26b3ca ("taprio: Fix allowing too small
> > intervals") started using this function too, in all modes of operation
> > (software, txtime-assist and full offload). So we now accept time slots
> > which we know we won't be ever able to fulfill.
> >
> > It's difficult to say which issue is more pressing, I'd say both are
> > visible with testing, even though the second would be more obvious
> > because of a black&white result (trying to send small packets in an
> > insufficiently large window blocks the queue).
> >
> > Issue found through code inspection, the code was not even compile
> > tested.
> >
> > The L1 overhead chosen here is an approximation, because various network
> > equipment has configurable IFG, however I don't think Linux is aware of
> > this.
> 
> When testing CBS, I remember using tc-stab: 
> 
> https://man7.org/linux/man-pages/man8/tc-stab.8.html
> 
> To set the 'overhead' to some value.
> 
> That value should be used in the calculation.
> 
> I agree that it's not ideal, in the ideal world we would have a way to
> retrieve the link overhead from the netdevice. But I would think that it
> gets complicated really quickly when using netdevices that are not
> Ethernet-based.

Interesting. So because the majority of length_to_duration() calls take
qdisc_pkt_len(skb) as argument, a user-supplied overhead is taken into
account. The exception is the bare ETH_ZLEN. For that, we'd have to
change the prototype of __qdisc_calculate_pkt_len to return an int, and
change qdisc_calculate_pkt_len like this:

static inline void qdisc_calculate_pkt_len(struct sk_buff *skb,
					   const struct Qdisc *sch)
{
#ifdef CONFIG_NET_SCHED
	struct qdisc_size_table *stab = rcu_dereference_bh(sch->stab);

	if (stab)
		qdisc_skb_cb(skb)->pkt_len = __qdisc_calculate_pkt_len(skb->len, stab);
#endif
}

then we would use __qdisc_calculate_pkt_len(ETH_ZLEN, rtnl_dereference(q->root->stab)).
Again completely untested.

Also, maybe the dependency on tc-stab for correct operation at least in
txtime assist mode should be mentioned in the man page, maybe? I don't
think it's completely obvious.
diff mbox series

Patch

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b9c71a304d39..8c8681c37d4f 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -176,7 +176,10 @@  static ktime_t get_interval_end_time(struct sched_gate_list *sched,
 
 static int length_to_duration(struct taprio_sched *q, int len)
 {
-	return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
+	/* The duration of frame transmission should account for L1 overhead
+	 * (12 octets IFG, 7 octets of preamble, 1 octet SFD, 4 octets FCS)
+	 */
+	return div_u64((24 + len) * atomic64_read(&q->picos_per_byte), 1000);
 }
 
 /* Returns the entry corresponding to next available interval. If