Message ID | 20210122224453.4161729-3-vinicius.gomes@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ethtool: Add support for frame preemption | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 1 maintainers not CCed: davem@davemloft.net |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 7153 this patch: 7153 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 96 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 7553 this patch: 7553 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote: > + /* It's valid to enable frame preemption without any kind of > + * offloading being enabled, so keep it separated. > + */ > + if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) { > + u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]); > + struct tc_preempt_qopt_offload qopt = { }; > + > + if (preempt == U32_MAX) { > + NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible"); > + err = -EINVAL; > + goto free_sched; > + } > + > + qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt); > + > + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, > + &qopt); > + if (err) > + goto free_sched; > + > + q->preemptible_tcs = preempt; > + } > + First I'm interested in the means: why check for preempt == U32_MAX when you determine that all traffic classes are preemptible? What if less than 32 traffic classes are used by the netdev? The check will be bypassed, won't it? Secondly, why should at least one queue be preemptible? What's wrong with frame preemption being triggered by a tc-taprio window smaller than the packet size? This can happen regardless of traffic class.
Vladimir Oltean <vladimir.oltean@nxp.com> writes: > On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote: >> + /* It's valid to enable frame preemption without any kind of >> + * offloading being enabled, so keep it separated. >> + */ >> + if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) { >> + u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]); >> + struct tc_preempt_qopt_offload qopt = { }; >> + >> + if (preempt == U32_MAX) { >> + NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible"); >> + err = -EINVAL; >> + goto free_sched; >> + } >> + >> + qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt); >> + >> + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, >> + &qopt); >> + if (err) >> + goto free_sched; >> + >> + q->preemptible_tcs = preempt; >> + } >> + > > First I'm interested in the means: why check for preempt == U32_MAX when > you determine that all traffic classes are preemptible? What if less > than 32 traffic classes are used by the netdev? The check will be > bypassed, won't it? Good catch :-) I wanted to have this (at least one express queue) handled in a centralized way, but perhaps this should be handled best per driver. > > Secondly, why should at least one queue be preemptible? What's wrong > with frame preemption being triggered by a tc-taprio window smaller than > the packet size? This can happen regardless of traffic class. It's the opposite, at least one queue needs to be marked express/non-preemptible. But as I said above, perhaps this should be handled in a per-driver way. I will remove this from taprio. I think removing this check/limitation from taprio should solve the second part of your question, right? Cheers,
On Fri, 29 Jan 2021 13:13:24 -0800 Vinicius Costa Gomes wrote: > Vladimir Oltean <vladimir.oltean@nxp.com> writes: > > > On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote: > >> + /* It's valid to enable frame preemption without any kind of > >> + * offloading being enabled, so keep it separated. > >> + */ > >> + if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) { > >> + u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]); > >> + struct tc_preempt_qopt_offload qopt = { }; > >> + > >> + if (preempt == U32_MAX) { > >> + NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible"); > >> + err = -EINVAL; > >> + goto free_sched; > >> + } > >> + > >> + qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt); > >> + > >> + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, > >> + &qopt); > >> + if (err) > >> + goto free_sched; > >> + > >> + q->preemptible_tcs = preempt; > >> + } > >> + > > > > First I'm interested in the means: why check for preempt == U32_MAX when > > you determine that all traffic classes are preemptible? What if less > > than 32 traffic classes are used by the netdev? The check will be > > bypassed, won't it? > > Good catch :-) > > I wanted to have this (at least one express queue) handled in a > centralized way, but perhaps this should be handled best per driver. Centralized is good. Much easier than reviewing N drivers to make sure they all behave the same, and right.
Jakub Kicinski <kuba@kernel.org> writes: >> > First I'm interested in the means: why check for preempt == U32_MAX when >> > you determine that all traffic classes are preemptible? What if less >> > than 32 traffic classes are used by the netdev? The check will be >> > bypassed, won't it? >> >> Good catch :-) >> >> I wanted to have this (at least one express queue) handled in a >> centralized way, but perhaps this should be handled best per driver. > > Centralized is good. Much easier than reviewing N drivers to make sure > they all behave the same, and right. The issue is that it seems that not all drivers/hw have the same limitation: that at least one queue needs to be configured as express/not preemptible. That's the point I was trying to make when I suggested for the check to be done per-driver, different limitations. Cheers,
On Fri, Jan 29, 2021 at 01:13:24PM -0800, Vinicius Costa Gomes wrote: > > Secondly, why should at least one queue be preemptible? What's wrong > > with frame preemption being triggered by a tc-taprio window smaller than > > the packet size? This can happen regardless of traffic class. > > It's the opposite, at least one queue needs to be marked > express/non-preemptible. I meant to ask why should at least one queue be express. The second part of the question remains valid. > But as I said above, perhaps this should be handled in a per-driver > way. I will remove this from taprio. > > I think removing this check/limitation from taprio should solve the > second part of your question, right? Nope. Can you point me to either 802.1Q or 802.3 saying that at least one priority should go to the express MAC?
Vladimir Oltean <vladimir.oltean@nxp.com> writes: > On Fri, Jan 29, 2021 at 01:13:24PM -0800, Vinicius Costa Gomes wrote: >> > Secondly, why should at least one queue be preemptible? What's wrong >> > with frame preemption being triggered by a tc-taprio window smaller than >> > the packet size? This can happen regardless of traffic class. >> >> It's the opposite, at least one queue needs to be marked >> express/non-preemptible. > > I meant to ask why should at least one queue be express. The second part > of the question remains valid. > >> But as I said above, perhaps this should be handled in a per-driver >> way. I will remove this from taprio. >> >> I think removing this check/limitation from taprio should solve the >> second part of your question, right? > > Nope. Can you point me to either 802.1Q or 802.3 saying that at least > one priority should go to the express MAC? After re-reading Anex Q, I know it's informative, and thinking/remembering things a bit better, it seems that the standard only defines preemption of express queues/priorities over preemptible traffic. The standard doesn't talk about preemptible pririoties preempting other preemptible priorities. So, if there's no express queue, no preemption is going to happen, so it shouldn't be enabled, to avoid like an invalid/useless state. So I am going to take back my previous email: this seems like it's better to be kept in a centralized place. Cheers,
On Fri, Jan 29, 2021 at 03:42:05PM -0800, Vinicius Costa Gomes wrote: > >> But as I said above, perhaps this should be handled in a per-driver > >> way. I will remove this from taprio. > >> > >> I think removing this check/limitation from taprio should solve the > >> second part of your question, right? > > > > Nope. Can you point me to either 802.1Q or 802.3 saying that at least > > one priority should go to the express MAC? > > After re-reading Anex Q, I know it's informative, and > thinking/remembering things a bit better, it seems that the standard > only defines preemption of express queues/priorities over preemptible > traffic. The standard doesn't talk about preemptible pririoties > preempting other preemptible priorities. > > So, if there's no express queue, no preemption is going to happen, so it > shouldn't be enabled, to avoid like an invalid/useless state. > > So I am going to take back my previous email: this seems like it's > better to be kept in a centralized place. Sorry, what? If you go to the Figure 99-5 Transmit Processing state diagram in IEEE 802.3 (which is what a hardware designer will actually implement), the pMAC's transition from the PREEMPTABLE_TX state to TX_MCRC is gated by the "preempt" variable. Whose definition is: Boolean that is TRUE when a preemptable packet is to be preempted. The value of preempt is: pAllow * (eTx + hold) * preemptableFragSize * MIN_REMAIN (where * is logical AND, + is logical OR - editor's note). The important part is (eTx + hold). It means that either the eMAC wants to transmit, or a HOLD request has been emitted. Otherwise said, HOLD requests should always trigger a preemption regardless of whether the eMAC has packets to send or not. I believe that Michael Teener's peristaltic shaper which is described in Annex T "Cyclic queuing and forwarding" (right below what you linked me) makes use of frame preemption triggered by HOLD requests, in order to reduce the interference time as much as possible. That's the use case I was actually thinking of when I asked. See slide 11 here: https://www.ieee802.org/1/files/public/docs2012/new-avb-mjt-back-to-the-future-1112-v01.pdf
On Fri, 29 Jan 2021 15:12:58 -0800 Vinicius Costa Gomes wrote: > Jakub Kicinski <kuba@kernel.org> writes: > >> Good catch :-) > >> > >> I wanted to have this (at least one express queue) handled in a > >> centralized way, but perhaps this should be handled best per driver. > > > > Centralized is good. Much easier than reviewing N drivers to make sure > > they all behave the same, and right. > > The issue is that it seems that not all drivers/hw have the same > limitation: that at least one queue needs to be configured as > express/not preemptible. Oh, I thought that was something driven by the standard. For HW specific checks driver doing it is obviously fine. > That's the point I was trying to make when I suggested for the check to > be done per-driver, different limitations.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ef517254367d..b9ec0a2007b3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -858,6 +858,7 @@ enum tc_setup_type { TC_SETUP_QDISC_ETS, TC_SETUP_QDISC_TBF, TC_SETUP_QDISC_FIFO, + TC_SETUP_PREEMPT, }; /* These structures hold the attributes of bpf state that are being passed diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 15b1b30f454e..be5ff1535332 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -183,6 +183,10 @@ struct tc_taprio_qopt_offload { struct tc_taprio_sched_entry entries[]; }; +struct tc_preempt_qopt_offload { + u32 preemptible_queues; +}; + /* Reference counting */ struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload *offload); diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 9e7c2c607845..9ca9d2e55557 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -1240,6 +1240,7 @@ enum { TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */ TCA_TAPRIO_ATTR_FLAGS, /* u32 */ TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */ + TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */ __TCA_TAPRIO_ATTR_MAX, }; diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 8287894541e3..41503aff6572 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -64,6 +64,7 @@ struct taprio_sched { struct Qdisc **qdiscs; struct Qdisc *root; u32 flags; + u32 preemptible_tcs; enum tk_offsets tk_offset; int clockid; atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+ @@ -776,6 +777,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 }, [TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 }, + [TCA_TAPRIO_ATTR_PREEMPT_TCS] = { .type = NLA_U32 }, }; static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb, @@ -1268,6 +1270,7 @@ static int taprio_disable_offload(struct net_device *dev, struct netlink_ext_ack *extack) { const struct net_device_ops *ops = dev->netdev_ops; + struct tc_preempt_qopt_offload preempt = { }; struct tc_taprio_qopt_offload *offload; int err; @@ -1286,13 +1289,15 @@ static int taprio_disable_offload(struct net_device *dev, offload->enable = 0; err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload); - if (err < 0) { + if (err < 0) NL_SET_ERR_MSG(extack, - "Device failed to disable offload"); - goto out; - } + "Device failed to disable taprio offload"); + + err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt); + if (err < 0) + NL_SET_ERR_MSG(extack, + "Device failed to disable frame preemption offload"); -out: taprio_offload_free(offload); return err; @@ -1509,6 +1514,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, mqprio->prio_tc_map[i]); } + /* It's valid to enable frame preemption without any kind of + * offloading being enabled, so keep it separated. + */ + if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) { + u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]); + struct tc_preempt_qopt_offload qopt = { }; + + if (preempt == U32_MAX) { + NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible"); + err = -EINVAL; + goto free_sched; + } + + qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt); + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, + &qopt); + if (err) + goto free_sched; + + q->preemptible_tcs = preempt; + } + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) err = taprio_enable_offload(dev, q, new_admin, extack); else @@ -1665,6 +1693,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, */ q->clockid = -1; q->flags = TAPRIO_FLAGS_INVALID; + q->preemptible_tcs = U32_MAX; spin_lock(&taprio_list_lock); list_add(&q->taprio_list, &taprio_list); @@ -1848,6 +1877,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags)) goto options_error; + if (q->preemptible_tcs != U32_MAX && + nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs)) + goto options_error; + if (q->txtime_delay && nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay)) goto options_error;
Adds a way to configure which traffic classes are marked as preemptible and which are marked as express. Even if frame preemption is not a "real" offload, because it can't be executed purely in software, having this information near where the mapping of traffic classes to queues is specified, makes it, hopefully, easier to use. taprio will receive the information of which traffic classes are marked as express/preemptible, and when offloading frame preemption to the driver will convert the information, so the driver receives which queues are marked as express/preemptible. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> --- include/linux/netdevice.h | 1 + include/net/pkt_sched.h | 4 ++++ include/uapi/linux/pkt_sched.h | 1 + net/sched/sch_taprio.c | 43 ++++++++++++++++++++++++++++++---- 4 files changed, 44 insertions(+), 5 deletions(-)