Message ID | 20211028110646.13791-1-simon.horman@corigine.com (mailing list archive) |
---|---|
Headers | show |
Series | allow user to offload tc action to net device | expand |
On 2021-10-28 07:06, Simon Horman wrote: > aowen Zheng says: > > Allow use of flow_indr_dev_register/flow_indr_dev_setup_offload to offload > tc actions independent of flows. > > The motivation for this work is to prepare for using TC police action > instances to provide hardware offload of OVS metering feature - which calls > for policers that may be used by multiple flows and whose lifecycle is > independent of any flows that use them. > > This patch includes basic changes to offload drivers to return EOPNOTSUPP > if this feature is used - it is not yet supported by any driver. > > Tc cli command to offload and quote an action: > > tc qdisc del dev $DEV ingress && sleep 1 || true > tc actions delete action police index 99 || true > > tc qdisc add dev $DEV ingress > tc qdisc show dev $DEV ingress > > tc actions add action police index 99 rate 1mbit burst 100k skip_sw > tc actions list action police > > tc filter add dev $DEV protocol ip parent ffff: > flower ip_proto tcp action police index 99 > tc -s -d filter show dev $DEV protocol ip parent ffff: > tc filter add dev $DEV protocol ipv6 parent ffff: > flower skip_sw ip_proto tcp action police index 99 > tc -s -d filter show dev $DEV protocol ipv6 parent ffff: > tc actions list action police > > tc qdisc del dev $DEV ingress && sleep 1 > tc actions delete action police index 99 > tc actions list action police It will be helpful to display the output of the show commands in the cover letter.... cheers, jamal
On 2021-10-28 10:23, Jamal Hadi Salim wrote: [..] > > It will be helpful to display the output of the show commands in the > cover letter.... Also some tdc tests please... cheers, jamal
On 10/28/2021 2:06 PM, Simon Horman wrote: > Baowen Zheng says: > > Allow use of flow_indr_dev_register/flow_indr_dev_setup_offload to offload > tc actions independent of flows. > > The motivation for this work is to prepare for using TC police action > instances to provide hardware offload of OVS metering feature - which calls > for policers that may be used by multiple flows and whose lifecycle is > independent of any flows that use them. > > This patch includes basic changes to offload drivers to return EOPNOTSUPP > if this feature is used - it is not yet supported by any driver. > > Tc cli command to offload and quote an action: > > tc qdisc del dev $DEV ingress && sleep 1 || true > tc actions delete action police index 99 || true > > tc qdisc add dev $DEV ingress > tc qdisc show dev $DEV ingress > > tc actions add action police index 99 rate 1mbit burst 100k skip_sw > tc actions list action police > > tc filter add dev $DEV protocol ip parent ffff: > flower ip_proto tcp action police index 99 > tc -s -d filter show dev $DEV protocol ip parent ffff: > tc filter add dev $DEV protocol ipv6 parent ffff: > flower skip_sw ip_proto tcp action police index 99 > tc -s -d filter show dev $DEV protocol ipv6 parent ffff: > tc actions list action police > > tc qdisc del dev $DEV ingress && sleep 1 > tc actions delete action police index 99 > tc actions list action police > Actions are also (implicitly) instantiated when filters are created. In the following example the mirred action instance (created by the first filter) is shared by the second filter: tc filter add dev $DEV1 proto ip parent ffff: flower \ ip_proto tcp action mirred egress redirect dev $DEV3 tc filter add dev $DEV2 proto ip parent ffff: flower \ ip_proto tcp action mirred index 1 > Changes compared to v2 patches: > > * Made changes according to the review comments. > * Delete in_hw and not_in_hw flag and user can judge if the action is > offloaded to any hardware by in_hw_count. > * Split the main patch of the action offload to three single patch to > facilitate code review. > > Posting this revision of the patchset as an RFC as while we feel it is > ready for review we would like an opportunity to conduct further testing > before acceptance into upstream. > > Baowen Zheng (8): > flow_offload: fill flags to action structure > flow_offload: reject to offload tc actions in offload drivers > flow_offload: allow user to offload tc action to net device > flow_offload: add skip_hw and skip_sw to control if offload the action > flow_offload: add process to update action stats from hardware > net: sched: save full flags for tc action > flow_offload: add reoffload process to update hw_count > flow_offload: validate flags of filter and actions > > drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +- > .../ethernet/mellanox/mlx5/core/en/rep/tc.c | 3 + > .../ethernet/netronome/nfp/flower/offload.c | 3 + > include/linux/netdevice.h | 1 + > include/net/act_api.h | 34 +- > include/net/flow_offload.h | 17 + > include/net/pkt_cls.h | 61 ++- > include/uapi/linux/pkt_cls.h | 9 +- > net/core/flow_offload.c | 48 +- > net/sched/act_api.c | 440 +++++++++++++++++- > net/sched/act_bpf.c | 2 +- > net/sched/act_connmark.c | 2 +- > net/sched/act_ctinfo.c | 2 +- > net/sched/act_gate.c | 2 +- > net/sched/act_ife.c | 2 +- > net/sched/act_ipt.c | 2 +- > net/sched/act_mpls.c | 2 +- > net/sched/act_nat.c | 2 +- > net/sched/act_pedit.c | 2 +- > net/sched/act_police.c | 2 +- > net/sched/act_sample.c | 2 +- > net/sched/act_simple.c | 2 +- > net/sched/act_skbedit.c | 2 +- > net/sched/act_skbmod.c | 2 +- > net/sched/cls_api.c | 55 ++- > net/sched/cls_flower.c | 3 +- > net/sched/cls_matchall.c | 4 +- > net/sched/cls_u32.c | 7 +- > 28 files changed, 661 insertions(+), 54 deletions(-) >
On Sun, Oct 31, 2021 at 2:51 AM Oz Shlomo <ozsh@nvidia.com> wrote: > > > > On 10/28/2021 2:06 PM, Simon Horman wrote: > > Baowen Zheng says: > > > > Allow use of flow_indr_dev_register/flow_indr_dev_setup_offload to offload > > tc actions independent of flows. > > > > The motivation for this work is to prepare for using TC police action > > instances to provide hardware offload of OVS metering feature - which calls > > for policers that may be used by multiple flows and whose lifecycle is > > independent of any flows that use them. > > > > This patch includes basic changes to offload drivers to return EOPNOTSUPP > > if this feature is used - it is not yet supported by any driver. > > > > Tc cli command to offload and quote an action: > > > > tc qdisc del dev $DEV ingress && sleep 1 || true > > tc actions delete action police index 99 || true > > > > tc qdisc add dev $DEV ingress > > tc qdisc show dev $DEV ingress > > > > tc actions add action police index 99 rate 1mbit burst 100k skip_sw > > tc actions list action police > > > > tc filter add dev $DEV protocol ip parent ffff: > > flower ip_proto tcp action police index 99 > > tc -s -d filter show dev $DEV protocol ip parent ffff: > > tc filter add dev $DEV protocol ipv6 parent ffff: > > flower skip_sw ip_proto tcp action police index 99 > > tc -s -d filter show dev $DEV protocol ipv6 parent ffff: > > tc actions list action police > > > > tc qdisc del dev $DEV ingress && sleep 1 > > tc actions delete action police index 99 > > tc actions list action police > > > > Actions are also (implicitly) instantiated when filters are created. > In the following example the mirred action instance (created by the first filter) is shared by the > second filter: > > tc filter add dev $DEV1 proto ip parent ffff: flower \ > ip_proto tcp action mirred egress redirect dev $DEV3 > > tc filter add dev $DEV2 proto ip parent ffff: flower \ > ip_proto tcp action mirred index 1 > > > > Changes compared to v2 patches: > > > > * Made changes according to the review comments. > > * Delete in_hw and not_in_hw flag and user can judge if the action is > > offloaded to any hardware by in_hw_count. > > * Split the main patch of the action offload to three single patch to > > facilitate code review. > > > > Posting this revision of the patchset as an RFC as while we feel it is > > ready for review we would like an opportunity to conduct further testing > > before acceptance into upstream. > > > > Baowen Zheng (8): > > flow_offload: fill flags to action structure > > flow_offload: reject to offload tc actions in offload drivers > > flow_offload: allow user to offload tc action to net device > > flow_offload: add skip_hw and skip_sw to control if offload the action > > flow_offload: add process to update action stats from hardware > > net: sched: save full flags for tc action > > flow_offload: add reoffload process to update hw_count > > flow_offload: validate flags of filter and actions > > > > drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +- > > .../ethernet/mellanox/mlx5/core/en/rep/tc.c | 3 + > > .../ethernet/netronome/nfp/flower/offload.c | 3 + > > include/linux/netdevice.h | 1 + > > include/net/act_api.h | 34 +- > > include/net/flow_offload.h | 17 + > > include/net/pkt_cls.h | 61 ++- > > include/uapi/linux/pkt_cls.h | 9 +- > > net/core/flow_offload.c | 48 +- > > net/sched/act_api.c | 440 +++++++++++++++++- > > net/sched/act_bpf.c | 2 +- > > net/sched/act_connmark.c | 2 +- > > net/sched/act_ctinfo.c | 2 +- > > net/sched/act_gate.c | 2 +- > > net/sched/act_ife.c | 2 +- > > net/sched/act_ipt.c | 2 +- > > net/sched/act_mpls.c | 2 +- > > net/sched/act_nat.c | 2 +- > > net/sched/act_pedit.c | 2 +- > > net/sched/act_police.c | 2 +- > > net/sched/act_sample.c | 2 +- > > net/sched/act_simple.c | 2 +- > > net/sched/act_skbedit.c | 2 +- > > net/sched/act_skbmod.c | 2 +- > > net/sched/cls_api.c | 55 ++- > > net/sched/cls_flower.c | 3 +- > > net/sched/cls_matchall.c | 4 +- > > net/sched/cls_u32.c | 7 +- > > 28 files changed, 661 insertions(+), 54 deletions(-) > > Just as an on-going grump: It has been my hope that policing as a technique would have died a horrible death by now. Seeing it come back as an "easy to offload" operation here - fresh from the 1990s! does not mean it's a good idea, and I'd rather like it if we were finding ways to offload newer things that work better, such as modern aqm, fair queuing, and shaping technologies that are in pie, fq_codel, and cake. policing leads to bursty loss, especially at higher rates, BBR has a specific mode designed to defeat it, and I ripped it out of wondershaper long ago for very good reasons: https://www.bufferbloat.net/projects/bloat/wiki/Wondershaper_Must_Die/ I did a long time ago start working on a better policing idea based on some good aqm ideas like AFD, but dropped it figuring that policing was going to vanish from the planet. It's baaaaaack.
On 2021-10-31 05:50, Oz Shlomo wrote: > > > On 10/28/2021 2:06 PM, Simon Horman wrote: >> Baowen Zheng says: >> >> Allow use of flow_indr_dev_register/flow_indr_dev_setup_offload to >> offload >> tc actions independent of flows. >> >> The motivation for this work is to prepare for using TC police action >> instances to provide hardware offload of OVS metering feature - which >> calls >> for policers that may be used by multiple flows and whose lifecycle is >> independent of any flows that use them. >> >> This patch includes basic changes to offload drivers to return EOPNOTSUPP >> if this feature is used - it is not yet supported by any driver. >> >> Tc cli command to offload and quote an action: >> >> tc qdisc del dev $DEV ingress && sleep 1 || true >> tc actions delete action police index 99 || true >> >> tc qdisc add dev $DEV ingress >> tc qdisc show dev $DEV ingress >> >> tc actions add action police index 99 rate 1mbit burst 100k skip_sw >> tc actions list action police >> >> tc filter add dev $DEV protocol ip parent ffff: >> flower ip_proto tcp action police index 99 >> tc -s -d filter show dev $DEV protocol ip parent ffff: >> tc filter add dev $DEV protocol ipv6 parent ffff: >> flower skip_sw ip_proto tcp action police index 99 >> tc -s -d filter show dev $DEV protocol ipv6 parent ffff: >> tc actions list action police >> >> tc qdisc del dev $DEV ingress && sleep 1 >> tc actions delete action police index 99 >> tc actions list action police >> > > Actions are also (implicitly) instantiated when filters are created. > In the following example the mirred action instance (created by the > first filter) is shared by the second filter: > > tc filter add dev $DEV1 proto ip parent ffff: flower \ > ip_proto tcp action mirred egress redirect dev $DEV3 > > tc filter add dev $DEV2 proto ip parent ffff: flower \ > ip_proto tcp action mirred index 1 > > I sure hope this is supported. At least the discussions so far are a nod in that direction... I know there is hardware that is not capable of achieving this (little CPE type devices) but lets not make that the common case. cheers, jamal
On 2021-10-31 08:03, Dave Taht wrote: [..] > > Just as an on-going grump: It has been my hope that policing as a > technique would have died a horrible death by now. Seeing it come back > as an "easy to offload" operation here - fresh from the 1990s! does > not mean it's a good idea, and I'd rather like it if we were finding > ways to > offload newer things that work better, such as modern aqm, fair > queuing, and shaping technologies that are in pie, fq_codel, and cake. > > policing leads to bursty loss, especially at higher rates, BBR has a > specific mode designed to defeat it, and I ripped it out of > wondershaper > long ago for very good reasons: > https://www.bufferbloat.net/projects/bloat/wiki/Wondershaper_Must_Die/ > > I did a long time ago start working on a better policing idea based on > some good aqm ideas like AFD, but dropped it figuring that policing > was going to vanish > from the planet. It's baaaaaack. A lot of enthusiasm for fq_codel in that link ;-> Root cause for burstiness is typically due to large transient queues (which are sometimes not under your admin control) and of course if you use a policer and dont have your double leaky buckets set properly to compensate for both short and long term rates you will have bursts of drops with the policer. It would be the same with shaper as well if the packet burst shows up when the queue is full. Intuitively it would feel, for non-work conserving approaches, delaying a packet (as in shaping) is better than dropping (as in policing) - but i have not a study which scientifically proves it. Any pointers in that regard? TCP would recover either way (either detecting sequence gaps or RTO). In Linux kernel level i am not sure i see much difference in either since we actually feedback an indicator to TCP to indicate a local drop (as opposed to guessing when it is dropped in the network) and the TCP code is smart enough to utilize that knowledge. For hardware offload there is no such feedback for either of those two approaches (so no difference with drop in the blackhole). As to "policer must die" - not possible i am afraid;-> I mean there has to be strong evidence that it is a bad idea and besides that _a lot of hardware_ supports it;-> Ergo, we have to support it as well. Note: RED for example has been proven almost impossible to configure properly but we still support it and there's a good set of hardware offload support for it. For RED - and i should say the policer as well - if you configure properly, _it works_. BTW, Some mellanox NICs offload HTB. See for example: https://legacy.netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-offload cheers, jamal
On 2021-10-31 10:14, Jamal Hadi Salim wrote:
> BTW, Some mellanox NICs offload HTB. See for example:
^^^^^^^^
Sorry, to be politically correct s/mellanox/Nvidia ;->
cheers,
jamal
On Sun 31 Oct 2021 at 15:40, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2021-10-31 05:50, Oz Shlomo wrote: >> >> On 10/28/2021 2:06 PM, Simon Horman wrote: >>> Baowen Zheng says: >>> >>> Allow use of flow_indr_dev_register/flow_indr_dev_setup_offload to offload >>> tc actions independent of flows. >>> >>> The motivation for this work is to prepare for using TC police action >>> instances to provide hardware offload of OVS metering feature - which calls >>> for policers that may be used by multiple flows and whose lifecycle is >>> independent of any flows that use them. >>> >>> This patch includes basic changes to offload drivers to return EOPNOTSUPP >>> if this feature is used - it is not yet supported by any driver. >>> >>> Tc cli command to offload and quote an action: >>> >>> tc qdisc del dev $DEV ingress && sleep 1 || true >>> tc actions delete action police index 99 || true >>> >>> tc qdisc add dev $DEV ingress >>> tc qdisc show dev $DEV ingress >>> >>> tc actions add action police index 99 rate 1mbit burst 100k skip_sw >>> tc actions list action police >>> >>> tc filter add dev $DEV protocol ip parent ffff: >>> flower ip_proto tcp action police index 99 >>> tc -s -d filter show dev $DEV protocol ip parent ffff: >>> tc filter add dev $DEV protocol ipv6 parent ffff: >>> flower skip_sw ip_proto tcp action police index 99 >>> tc -s -d filter show dev $DEV protocol ipv6 parent ffff: >>> tc actions list action police >>> >>> tc qdisc del dev $DEV ingress && sleep 1 >>> tc actions delete action police index 99 >>> tc actions list action police >>> >> Actions are also (implicitly) instantiated when filters are created. >> In the following example the mirred action instance (created by the first >> filter) is shared by the second filter: >> tc filter add dev $DEV1 proto ip parent ffff: flower \ >> ip_proto tcp action mirred egress redirect dev $DEV3 >> tc filter add dev $DEV2 proto ip parent ffff: flower \ >> ip_proto tcp action mirred index 1 >> > > I sure hope this is supported. At least the discussions so far > are a nod in that direction... > I know there is hardware that is not capable of achieving this > (little CPE type devices) but lets not make that the common case. Looks like it isn't supported in this change since tcf_action_offload_add() is only called by tcf_action_init() when BIND flag is not set (the flag is always set when called from cls code). Moreover, I don't think it is good idea to support such use-case because that would require to increase number of calls to driver offload infrastructure from 1 per filter to 1+number_of_actions, which would significantly impact insertion rate.
On Sun, Oct 31, 2021 at 7:14 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On 2021-10-31 08:03, Dave Taht wrote: > [..] > > > > > Just as an on-going grump: It has been my hope that policing as a > > technique would have died a horrible death by now. Seeing it come back > > as an "easy to offload" operation here - fresh from the 1990s! does > > not mean it's a good idea, and I'd rather like it if we were finding > > ways to > > offload newer things that work better, such as modern aqm, fair > > queuing, and shaping technologies that are in pie, fq_codel, and cake. > > > > policing leads to bursty loss, especially at higher rates, BBR has a > > specific mode designed to defeat it, and I ripped it out of > > wondershaper > > long ago for very good reasons: > > https://www.bufferbloat.net/projects/bloat/wiki/Wondershaper_Must_Die/ > > > > I did a long time ago start working on a better policing idea based on > > some good aqm ideas like AFD, but dropped it figuring that policing > > was going to vanish > > from the planet. It's baaaaaack. > > A lot of enthusiasm for fq_codel in that link ;-> Wrote that in 2013. It's not every day you solve tcp global synchronization, achieve a queue depth of 5ms no matter the rate, develop something that has zero latency for sparse packets, only shoots at the fat flows, drops from head so there's always an immediate signal of congestion from the packet just behind, makes opus's PLC and simpler forms of FEC "just work", and requires near zero configuration. The plots at the end made a very convincing case for abandoning policing. > Root cause for burstiness is typically due to large transient queues > (which are sometimes not under your admin control) and of course if > you use a policer and dont have your double leaky buckets set properly > to compensate for both short and long term rates you will have bursts > of drops with the policer. I would really like to see a good configuration guide for policing at multiple real-world bandwidths and at real-world workloads. > It would be the same with shaper as well > if the packet burst shows up when the queue is full. Queues are shock absorbers as Van always says. We do drop packets still, on the rx ring. The default queue depth of codel is 32MB. It takes a really really really large burst to overwhelm that. I wonder where all the userspace wireguard vpns are dropping packets nowdays. > Intuitively it would feel, for non-work conserving approaches, > delaying a packet (as in shaping) is better than dropping (as in It's shaping + flow queueing that's the win, if you are going to queue. It gets all the flows statistically multiplexed and in flow balance orders of magnitude faster than a policer could. (flow queuing is different from classic fair queuing) The tiny flows pass through untouched at zero delay also. At the time, I was considering applying a codel-like technique to policing - I'd called it "bobbie", where once you exceed the rate, a virtual clock moves forward as to how long you would have delayed packet delivery if you were queueing and then starts shooting at packets once your burst tolerance is exceeded until But inbound fq+shaping did wonders faster, and selfishly I didn't feel like abandoning floating point to work with in the kernel. That said, it's taken breaking the qdisc lock and xpf to make inbound shaping scale decently (see: https://github.com/rchac/LibreQoS#how-do-cake-and-fq_codel-work ) > policing) - but i have not a study which scientifically proves it. > Any pointers in that regard? Neither do I. Matt Mathis has ranted about it, and certainly the workarounds in BBRv1 to defeat others desperate attempts to control their bandwidth with a policer is obvious from their data. If there really is a resurgence of interest in policing, a good paper would compare a classic 3 color policer to bobbie, and to shaping vs a vs BBR and cubic. I'm low on students at the moment... > TCP would recover either way (either detecting sequence gaps or RTO). Yes, it does. But policing is often devastating to voip and videoconferencing traffic. > In Linux kernel level i am not sure i see much difference in either > since we actually feedback an indicator to TCP to indicate a local > drop (as opposed to guessing when it is dropped in the network) > and the TCP code is smart enough to utilize that knowledge. There is an extremely long and difficult conversation I'd had over the differences between sch_fq and fq_codel and the differences between a server and a router, for real world applications over here: https://github.com/systemd/systemd/issues/9725#issuecomment-413369212 > For hardware offload there is no such feedback for either of those > two approaches (so no difference with drop in the blackhole). Yes, now you've built a *router* and lost the local control loop. TSQ, sch_fq's pacing, and other host optimizations no longer work. I encourage more folk to regularly take packet captures of the end results of offloads vs a vs network latency. Look! MORE BANDWIDTH for a single flow! Wait! There's 600ms of latency and new flows can't even get started! > > As to "policer must die" - not possible i am afraid;-> I mean there > has to be strong evidence that it is a bad idea and besides that > _a lot of hardware_ supports it;-> Ergo, we have to support it as well. I agree that supporting hardware features is good. I merely wish that certain other software features were making it into modern hardware. I'm encouraged by this work in p4, at least. https://arxiv.org/pdf/2010.04528.pdf > Note: RED for example has been proven almost impossible to configure > properly but we still support it and there's a good set of hardware > offload support for it. For RED - and i should say the policer as well - > if you configure properly, _it works_. I have no idea how often RED is used nowadays. The *only* requests for offloading it Ive heard is for configuring it as a brick wall ecn marking tool, which does indeed work for dctcp. The hope was with pie, being similar in construction, would end up implemented in hardware, however it's so far turned out that codel was easier to implement in hw and more effective. > > > BTW, Some mellanox NICs offload HTB. See for example: > https://legacy.netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-offload Yes. Too bad they then attach it to fifos. I'd so love to help a hardware company willing to do the work to put modern algorithms in hw... > cheers, > jamal I note that although I've enjoyed ranting, I don't actually have any objections to this particular patch. -- I tried to build a better future, a few times: https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org Dave Täht CEO, TekLibre, LLC
On Mon, Nov 01, 2021 at 10:01:28AM +0200, Vlad Buslov wrote: > On Sun 31 Oct 2021 at 15:40, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On 2021-10-31 05:50, Oz Shlomo wrote: > >> > >> On 10/28/2021 2:06 PM, Simon Horman wrote: ... > >> Actions are also (implicitly) instantiated when filters are created. > >> In the following example the mirred action instance (created by the first > >> filter) is shared by the second filter: > >> tc filter add dev $DEV1 proto ip parent ffff: flower \ > >> ip_proto tcp action mirred egress redirect dev $DEV3 > >> tc filter add dev $DEV2 proto ip parent ffff: flower \ > >> ip_proto tcp action mirred index 1 > >> > > > > I sure hope this is supported. At least the discussions so far > > are a nod in that direction... > > I know there is hardware that is not capable of achieving this > > (little CPE type devices) but lets not make that the common case. > > Looks like it isn't supported in this change since > tcf_action_offload_add() is only called by tcf_action_init() when BIND > flag is not set (the flag is always set when called from cls code). > Moreover, I don't think it is good idea to support such use-case because > that would require to increase number of calls to driver offload > infrastructure from 1 per filter to 1+number_of_actions, which would > significantly impact insertion rate. Hi, I feel that I am missing some very obvious point here. But from my perspective the use case described by Oz is supported by existing offload of the flower classifier (since ~4.13 IIRC).
On Tue 02 Nov 2021 at 14:51, Simon Horman <simon.horman@corigine.com> wrote: > On Mon, Nov 01, 2021 at 10:01:28AM +0200, Vlad Buslov wrote: >> On Sun 31 Oct 2021 at 15:40, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >> > On 2021-10-31 05:50, Oz Shlomo wrote: >> >> >> >> On 10/28/2021 2:06 PM, Simon Horman wrote: > > ... > >> >> Actions are also (implicitly) instantiated when filters are created. >> >> In the following example the mirred action instance (created by the first >> >> filter) is shared by the second filter: >> >> tc filter add dev $DEV1 proto ip parent ffff: flower \ >> >> ip_proto tcp action mirred egress redirect dev $DEV3 >> >> tc filter add dev $DEV2 proto ip parent ffff: flower \ >> >> ip_proto tcp action mirred index 1 >> >> >> > >> > I sure hope this is supported. At least the discussions so far >> > are a nod in that direction... >> > I know there is hardware that is not capable of achieving this >> > (little CPE type devices) but lets not make that the common case. >> >> Looks like it isn't supported in this change since >> tcf_action_offload_add() is only called by tcf_action_init() when BIND >> flag is not set (the flag is always set when called from cls code). >> Moreover, I don't think it is good idea to support such use-case because >> that would require to increase number of calls to driver offload >> infrastructure from 1 per filter to 1+number_of_actions, which would >> significantly impact insertion rate. > > Hi, > > I feel that I am missing some very obvious point here. > > But from my perspective the use case described by Oz is supported > by existing offload of the flower classifier (since ~4.13 IIRC). Mlx5 driver can't support such case without infrastructure change in kernel for following reasons: - Action index is not provided by flow_action offload infrastructure for most of the actions, so there is no way for driver to determine whether the action is shared. - If we extend the infrastructure to always provide tcfa_index (a trivial change), there would be not much use for it because there is no way to properly update shared action counters without infrastructure code similar to what you implemented as part of this series. How do you support shared actions created through cls_api in your driver, considering described limitations?
On Tue, Nov 02, 2021 at 05:33:14PM +0200, Vlad Buslov wrote: > On Tue 02 Nov 2021 at 14:51, Simon Horman <simon.horman@corigine.com> wrote: > > On Mon, Nov 01, 2021 at 10:01:28AM +0200, Vlad Buslov wrote: > >> On Sun 31 Oct 2021 at 15:40, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > >> > On 2021-10-31 05:50, Oz Shlomo wrote: > >> >> > >> >> On 10/28/2021 2:06 PM, Simon Horman wrote: > > > > ... > > > >> >> Actions are also (implicitly) instantiated when filters are created. > >> >> In the following example the mirred action instance (created by the first > >> >> filter) is shared by the second filter: > >> >> tc filter add dev $DEV1 proto ip parent ffff: flower \ > >> >> ip_proto tcp action mirred egress redirect dev $DEV3 > >> >> tc filter add dev $DEV2 proto ip parent ffff: flower \ > >> >> ip_proto tcp action mirred index 1 > >> >> > >> > > >> > I sure hope this is supported. At least the discussions so far > >> > are a nod in that direction... > >> > I know there is hardware that is not capable of achieving this > >> > (little CPE type devices) but lets not make that the common case. > >> > >> Looks like it isn't supported in this change since > >> tcf_action_offload_add() is only called by tcf_action_init() when BIND > >> flag is not set (the flag is always set when called from cls code). > >> Moreover, I don't think it is good idea to support such use-case because > >> that would require to increase number of calls to driver offload > >> infrastructure from 1 per filter to 1+number_of_actions, which would > >> significantly impact insertion rate. > > > > Hi, > > > > I feel that I am missing some very obvious point here. > > > > But from my perspective the use case described by Oz is supported > > by existing offload of the flower classifier (since ~4.13 IIRC). > > Mlx5 driver can't support such case without infrastructure change in > kernel for following reasons: > > - Action index is not provided by flow_action offload infrastructure for > most of the actions, so there is no way for driver to determine > whether the action is shared. > > - If we extend the infrastructure to always provide tcfa_index (a > trivial change), there would be not much use for it because there is > no way to properly update shared action counters without > infrastructure code similar to what you implemented as part of this > series. > > How do you support shared actions created through cls_api in your > driver, considering described limitations? Thanks, I misread the use case described by Oz, but I believe I understand it now. I agree that the case described is neither currently supported, nor supported by this patchset (to be honest I for one had not considered it). So, I think the question is: does upporting this use-case make sense - from implementation, use-case, and consistency perspectives - in the context of this patchset? Am I on the right track?
On 11/2/2021 6:15 PM, Simon Horman wrote: > On Tue, Nov 02, 2021 at 05:33:14PM +0200, Vlad Buslov wrote: >> On Tue 02 Nov 2021 at 14:51, Simon Horman <simon.horman@corigine.com> wrote: >>> On Mon, Nov 01, 2021 at 10:01:28AM +0200, Vlad Buslov wrote: >>>> On Sun 31 Oct 2021 at 15:40, Jamal Hadi Salim <jhs@mojatatu.com> wrote: >>>>> On 2021-10-31 05:50, Oz Shlomo wrote: >>>>>> >>>>>> On 10/28/2021 2:06 PM, Simon Horman wrote: >>> >>> ... >>> >>>>>> Actions are also (implicitly) instantiated when filters are created. >>>>>> In the following example the mirred action instance (created by the first >>>>>> filter) is shared by the second filter: >>>>>> tc filter add dev $DEV1 proto ip parent ffff: flower \ >>>>>> ip_proto tcp action mirred egress redirect dev $DEV3 >>>>>> tc filter add dev $DEV2 proto ip parent ffff: flower \ >>>>>> ip_proto tcp action mirred index 1 >>>>>> >>>>> >>>>> I sure hope this is supported. At least the discussions so far >>>>> are a nod in that direction... >>>>> I know there is hardware that is not capable of achieving this >>>>> (little CPE type devices) but lets not make that the common case. >>>> >>>> Looks like it isn't supported in this change since >>>> tcf_action_offload_add() is only called by tcf_action_init() when BIND >>>> flag is not set (the flag is always set when called from cls code). >>>> Moreover, I don't think it is good idea to support such use-case because >>>> that would require to increase number of calls to driver offload >>>> infrastructure from 1 per filter to 1+number_of_actions, which would >>>> significantly impact insertion rate. >>> >>> Hi, >>> >>> I feel that I am missing some very obvious point here. >>> >>> But from my perspective the use case described by Oz is supported >>> by existing offload of the flower classifier (since ~4.13 IIRC). >> >> Mlx5 driver can't support such case without infrastructure change in >> kernel for following reasons: >> >> - Action index is not provided by flow_action offload infrastructure for >> most of the actions, so there is no way for driver to determine >> whether the action is shared. >> >> - If we extend the infrastructure to always provide tcfa_index (a >> trivial change), there would be not much use for it because there is >> no way to properly update shared action counters without >> infrastructure code similar to what you implemented as part of this >> series. >> >> How do you support shared actions created through cls_api in your >> driver, considering described limitations? > > Thanks, > > I misread the use case described by Oz, but I believe I understand it now. > > I agree that the case described is neither currently supported, nor > supported by this patchset (to be honest I for one had not considered it). > > So, I think the question is: does upporting this use-case make sense - from > implementation, use-case, and consistency perspectives - in the context of > this patchset? > > Am I on the right track? > Currently we don't have a specific application use case for sharing actions that were created by tc filters. However, we do have future use case in mind. We could add such functionality on top of this series when a use case will materialize. Perhaps, at that point, we can also introduce a control flag in order to avoid unnecessary insertion rate performance degradation.