Message ID | 20220923163310.3192733-2-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add tc-taprio support for queueMaxSDU | expand |
On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote: > + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) { > + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing"); NL_SET_ERR_ATTR_MISS() ? > + return -EINVAL; > + } > + > + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]); > + if (tc >= TC_QOPT_MAX_QUEUE) { > + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range"); NLA_POLICY_MAX() Are you not using those on purpose? :(
On Mon, Sep 26, 2022 at 01:38:29PM -0700, Jakub Kicinski wrote: > On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote: > > + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) { > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing"); > > NL_SET_ERR_ATTR_MISS() ? > > > + return -EINVAL; > > + } > > + > > + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]); > > + if (tc >= TC_QOPT_MAX_QUEUE) { > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range"); > > NLA_POLICY_MAX() > > Are you not using those on purpose? :( I don't exactly see it as being super user friendly to leave it to the policy validator (or to use NL_SET_ERR_ATTR_MISS()) because all that will be reported back to user space will be the offset to the original attribute in the nlmsghdr, which is pretty hard to retrieve and re-interpret (at least in the iproute2 tc source code, I can't seem to find a way to stringify it or something like that). For the NLA_POLICY_MAX(), all I'll get now is an uninformative "Error: integer out of range." What integer? What range? I don't understand what is the gain of removing extack message strings and just pointing to the netlink attribute via NLMSGERR_ATTR_OFFS? Could I at least use the NL_SET_ERR_ATTR_MISS() helper *and* set a custom message? That's for the missing nlattr. Regarding the range checking in the policy, I'd like a custom message there as well, but the NLA_POLICY_MAX() doesn't provide one. However, I see that struct nla_policy has a const char *reject_message for NLA_REJECT types. Would it be an abuse to move this outside of the union and allow U32 policies and such to also provide it?
On Tue, 27 Sep 2022 18:09:21 +0300 Vladimir Oltean wrote: > On Mon, Sep 26, 2022 at 01:38:29PM -0700, Jakub Kicinski wrote: > > On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote: > > > + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) { > > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing"); > > > > NL_SET_ERR_ATTR_MISS() ? > > > > > + return -EINVAL; > > > + } > > > + > > > + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]); > > > + if (tc >= TC_QOPT_MAX_QUEUE) { > > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range"); > > > > NLA_POLICY_MAX() > > > > Are you not using those on purpose? :( > > I don't exactly see it as being super user friendly to leave it to the > policy validator (or to use NL_SET_ERR_ATTR_MISS()) because all that > will be reported back to user space will be the offset to the original > attribute in the nlmsghdr, which is pretty hard to retrieve and > re-interpret (at least in the iproute2 tc source code, I can't seem to > find a way to stringify it or something like that). For the NLA_POLICY_MAX(), > all I'll get now is an uninformative "Error: integer out of range." > What integer? What range? I know, that's what I expected you'd say :( You'd need a reverse parser which is a PITA to do unless you have clearly specified bindings. > I don't understand what is the gain of removing extack message strings > and just pointing to the netlink attribute via NLMSGERR_ATTR_OFFS? Could > I at least use the NL_SET_ERR_ATTR_MISS() helper *and* set a custom message? > That's for the missing nlattr. Regarding the range checking in the > policy, I'd like a custom message there as well, but the NLA_POLICY_MAX() > doesn't provide one. However, I see that struct nla_policy has a const > char *reject_message for NLA_REJECT types. Would it be an abuse to move > this outside of the union and allow U32 policies and such to also > provide it? I'd rather you kept the code as is than make precedent for adding both string and machine readable. If we do that people will try to stay on the safe side and always add both. The machine readable format is carries all the information you need. It's just the user space is not clever enough to read it which is, well, solvable.
On Tue, Sep 27, 2022 at 11:27:10AM -0700, Jakub Kicinski wrote: > I know, that's what I expected you'd say :( > You'd need a reverse parser which is a PITA to do unless you have > clearly specified bindings. I think you're underestimating the problem, it's worse than PITA. My A still hurts and yet I couldn't find any way in which reverse parsing the bad netlink attribute is in any way practical in iproute2, other than doing it to prove a point that it's possible. > I'd rather you kept the code as is than make precedent for adding both > string and machine readable. If we do that people will try to stay on > the safe side and always add both. > > The machine readable format is carries all the information you need. Nope, the question "What range?" still isn't answered via the machine readable format. Just "What integer?". > It's just the user space is not clever enough to read it which is, > well, solvable. You should come work at NXP, we love people who keep a healthy dose of unnecessary complexity in things :) Sometimes, "not clever enough" is just fine.
On Tue, 27 Sep 2022 21:23:19 +0000 Vladimir Oltean wrote: > On Tue, Sep 27, 2022 at 11:27:10AM -0700, Jakub Kicinski wrote: > > I know, that's what I expected you'd say :( > > You'd need a reverse parser which is a PITA to do unless you have > > clearly specified bindings. > > I think you're underestimating the problem, it's worse than PITA. My A > still hurts and yet I couldn't find any way in which reverse parsing the > bad netlink attribute is in any way practical in iproute2, other than > doing it to prove a point that it's possible. Yup, iproute2 does not have policy tables, so it's hard. Once you have tables like this: https://github.com/kuba-moo/ynl/blob/main/tools/net/ynl/generated/ethtool-user.c#L22 all linking up the types, it's fairly easy to reverse parse: https://github.com/kuba-moo/ynl/blob/main/tools/net/ynl/lib/ynl.c#L75 Admittedly I haven't added parsing of the bounds yet so it'd just say "invalid argument .bla.something" not "argument out of range .bla.something (is: X, range N-M)" but that's just typing. > > I'd rather you kept the code as is than make precedent for adding both > > string and machine readable. If we do that people will try to stay on > > the safe side and always add both. > > > > The machine readable format is carries all the information you need. > > Nope, the question "What range?" still isn't answered via the machine > readable format. Just "What integer?". Hm, doesn't NLMSGERR_ATTR_POLICY contain the bounds? > > It's just the user space is not clever enough to read it which is, > > well, solvable. > > You should come work at NXP, we love people who keep a healthy dose of > unnecessary complexity in things :) Ha! :D > Sometimes, "not clever enough" is just fine. Yup, go ahead with just the strings. "We'll get there" for the machine readable parsing, hopefully, once my YAML descriptions come...
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 2ff80cd04c5c..3c65417bea94 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -168,6 +168,7 @@ struct tc_taprio_qopt_offload { ktime_t base_time; u64 cycle_time; u64 cycle_time_extension; + u32 max_sdu[TC_MAX_QUEUE]; size_t num_entries; struct tc_taprio_sched_entry entries[]; diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index f292b467b27f..000eec106856 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -1232,6 +1232,16 @@ enum { #define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST _BITUL(0) #define TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD _BITUL(1) +enum { + TCA_TAPRIO_TC_ENTRY_UNSPEC, + TCA_TAPRIO_TC_ENTRY_INDEX, /* u32 */ + TCA_TAPRIO_TC_ENTRY_MAX_SDU, /* u32 */ + + /* add new constants above here */ + __TCA_TAPRIO_TC_ENTRY_CNT, + TCA_TAPRIO_TC_ENTRY_MAX = (__TCA_TAPRIO_TC_ENTRY_CNT - 1) +}; + enum { TCA_TAPRIO_ATTR_UNSPEC, TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */ @@ -1245,6 +1255,7 @@ enum { TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */ TCA_TAPRIO_ATTR_FLAGS, /* u32 */ TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */ + TCA_TAPRIO_ATTR_TC_ENTRY, /* nest */ __TCA_TAPRIO_ATTR_MAX, }; diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 0bc6d90e1e51..c38ed1861ee7 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -78,6 +78,8 @@ struct taprio_sched { struct sched_gate_list __rcu *admin_sched; struct hrtimer advance_timer; struct list_head taprio_list; + u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */ + u32 max_sdu[TC_MAX_QUEUE]; /* for dump and offloading */ u32 txtime_delay; }; @@ -415,6 +417,9 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, struct Qdisc *child, struct sk_buff **to_free) { struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + int prio = skb->priority; + u8 tc; /* sk_flags are only safe to use on full sockets. */ if (skb->sk && sk_fullsock(skb->sk) && sock_flag(skb->sk, SOCK_TXTIME)) { @@ -426,6 +431,11 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, return qdisc_drop(skb, sch, to_free); } + /* Devices with full offload are expected to honor this in hardware */ + tc = netdev_get_prio_tc_map(dev, prio); + if (skb->len > q->max_frm_len[tc]) + return qdisc_drop(skb, sch, to_free); + qdisc_qstats_backlog_inc(sch, skb); sch->q.qlen++; @@ -754,6 +764,11 @@ static const struct nla_policy entry_policy[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = { [TCA_TAPRIO_SCHED_ENTRY_INTERVAL] = { .type = NLA_U32 }, }; +static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { + [TCA_TAPRIO_TC_ENTRY_INDEX] = { .type = NLA_U32 }, + [TCA_TAPRIO_TC_ENTRY_MAX_SDU] = { .type = NLA_U32 }, +}; + static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_PRIOMAP] = { .len = sizeof(struct tc_mqprio_qopt) @@ -766,6 +781,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 }, [TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 }, + [TCA_TAPRIO_ATTR_TC_ENTRY] = { .type = NLA_NESTED }, }; static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb, @@ -1216,7 +1232,7 @@ static int taprio_enable_offload(struct net_device *dev, { const struct net_device_ops *ops = dev->netdev_ops; struct tc_taprio_qopt_offload *offload; - int err = 0; + int tc, err = 0; if (!ops->ndo_setup_tc) { NL_SET_ERR_MSG(extack, @@ -1233,6 +1249,9 @@ static int taprio_enable_offload(struct net_device *dev, offload->enable = 1; taprio_sched_to_offload(dev, sched, offload); + for (tc = 0; tc < TC_MAX_QUEUE; tc++) + offload->max_sdu[tc] = q->max_sdu[tc]; + err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload); if (err < 0) { NL_SET_ERR_MSG(extack, @@ -1367,6 +1386,89 @@ static int taprio_parse_clockid(struct Qdisc *sch, struct nlattr **tb, return err; } +static int taprio_parse_tc_entry(struct Qdisc *sch, + struct nlattr *opt, + u32 max_sdu[TC_QOPT_MAX_QUEUE], + unsigned long *seen_tcs, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { }; + struct net_device *dev = qdisc_dev(sch); + u32 val = 0; + int err, tc; + + err = nla_parse_nested(tb, TCA_TAPRIO_TC_ENTRY_MAX, opt, + taprio_tc_policy, extack); + if (err < 0) + return err; + + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) { + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing"); + return -EINVAL; + } + + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]); + if (tc >= TC_QOPT_MAX_QUEUE) { + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range"); + return -ERANGE; + } + + if (*seen_tcs & BIT(tc)) { + NL_SET_ERR_MSG_MOD(extack, "Duplicate TC entry"); + return -EINVAL; + } + + *seen_tcs |= BIT(tc); + + if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]) + val = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]); + + if (val > dev->max_mtu) { + NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU"); + return -ERANGE; + } + + max_sdu[tc] = val; + + return 0; +} + +static int taprio_parse_tc_entries(struct Qdisc *sch, + struct nlattr *opt, + struct netlink_ext_ack *extack) +{ + struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + u32 max_sdu[TC_QOPT_MAX_QUEUE]; + unsigned long seen_tcs = 0; + struct nlattr *n; + int tc, rem; + int err = 0; + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) + max_sdu[tc] = q->max_sdu[tc]; + + nla_for_each_nested(n, opt, rem) { + if (nla_type(n) != TCA_TAPRIO_ATTR_TC_ENTRY) + continue; + + err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs, extack); + if (err) + goto out; + } + + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { + q->max_sdu[tc] = max_sdu[tc]; + if (max_sdu[tc]) + q->max_frm_len[tc] = max_sdu[tc] + dev->hard_header_len; + else + q->max_frm_len[tc] = U32_MAX; /* never oversized */ + } + +out: + return err; +} + static int taprio_mqprio_cmp(const struct net_device *dev, const struct tc_mqprio_qopt *mqprio) { @@ -1445,6 +1547,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, if (err < 0) return err; + err = taprio_parse_tc_entries(sch, opt, extack); + if (err) + return err; + new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL); if (!new_admin) { NL_SET_ERR_MSG(extack, "Not enough memory for a new schedule"); @@ -1825,6 +1931,33 @@ static int dump_schedule(struct sk_buff *msg, return -1; } +static int taprio_dump_tc_entries(struct taprio_sched *q, struct sk_buff *skb) +{ + struct nlattr *n; + int tc; + + for (tc = 0; tc < TC_MAX_QUEUE; tc++) { + n = nla_nest_start(skb, TCA_TAPRIO_ATTR_TC_ENTRY); + if (!n) + return -EMSGSIZE; + + if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_INDEX, tc)) + goto nla_put_failure; + + if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_MAX_SDU, + q->max_sdu[tc])) + goto nla_put_failure; + + nla_nest_end(skb, n); + } + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, n); + return -EMSGSIZE; +} + static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) { struct taprio_sched *q = qdisc_priv(sch); @@ -1863,6 +1996,9 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay)) goto options_error; + if (taprio_dump_tc_entries(q, skb)) + goto options_error; + if (oper && dump_schedule(skb, oper)) goto options_error;
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the existence of a per traffic class limitation of maximum frame sizes, with a fallback on the port-based MTU. As far as I am able to understand, the 802.1Q Service Data Unit (SDU) represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding any number of prepended VLAN headers which may be otherwise present in the MSDU. Therefore, the queueMaxSDU is directly comparable to the device MTU (1500 means L2 payload sizes are accepted, or frame sizes of 1518 octets, or 1522 plus one VLAN header). Drivers which offload this are directly responsible of translating into other units of measurement. To keep the fast path checks optimized, we keep 2 arrays in the qdisc, one for max_sdu translated into frame length (so that it's comparable to skb->len), and another for offloading and for dumping back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- v1->v2: precompute the max_frm_len using dev->hard_header_len, so that the fast path can directly check against skb->len include/net/pkt_sched.h | 1 + include/uapi/linux/pkt_sched.h | 11 +++ net/sched/sch_taprio.c | 138 ++++++++++++++++++++++++++++++++- 3 files changed, 149 insertions(+), 1 deletion(-)