diff mbox series

[net-next,1/7] net: sched: Add a trap-and-forward action

Message ID 20210408133829.2135103-2-petrm@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series tc: Introduce a trap-and-forward action | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 14 maintainers not CCed: jiri@resnulli.us yhs@fb.com kpsingh@kernel.org bpf@vger.kernel.org andrii@kernel.org daniel@iogearbox.net weiwan@google.com kafai@fb.com ast@kernel.org ap420073@gmail.com john.fastabend@gmail.com songliubraving@fb.com atenart@kernel.org edumazet@google.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 6680 this patch: 6680
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 77 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 6893 this patch: 6893
netdev/header_inline success Link

Commit Message

Petr Machata April 8, 2021, 1:38 p.m. UTC
The TC action "trap" is used to instruct the HW datapath to drop the
matched packet and transfer it for processing in the SW pipeline. If
instead it is desirable to forward the packet and transferring a _copy_ to
the SW pipeline, there is no practical way to achieve that.

To that end add a new generic action, trap_fwd. In the software pipeline,
it is equivalent to an OK. When offloading, it should forward the packet to
the host, but unlike trap it should not drop the packet.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/uapi/linux/pkt_cls.h       |  6 +++++-
 net/core/dev.c                     |  2 ++
 net/sched/act_bpf.c                | 13 +++++++++++--
 net/sched/cls_bpf.c                |  1 +
 net/sched/sch_dsmark.c             |  1 +
 tools/include/uapi/linux/pkt_cls.h |  6 +++++-
 6 files changed, 25 insertions(+), 4 deletions(-)

Comments

Jamal Hadi Salim April 8, 2021, 2:05 p.m. UTC | #1
Hi Petr,

On 2021-04-08 9:38 a.m., Petr Machata wrote:
> The TC action "trap" is used to instruct the HW datapath to drop the
> matched packet and transfer it for processing in the SW pipeline. If
> instead it is desirable to forward the packet and transferring a _copy_ to
> the SW pipeline, there is no practical way to achieve that.
> 
> To that end add a new generic action, trap_fwd. In the software pipeline,
> it is equivalent to an OK. When offloading, it should forward the packet to
> the host, but unlike trap it should not drop the packet.
> 

I am concerned about adding new opcodes which only make sense if you
offload (or make sense only if you are running in s/w).

Those opcodes are intended to be generic abstractions so the dispatcher
can decide what to do next. Adding things that are specific only
to scenarios of hardware offload removes that opaqueness.
I must have missed the discussion on ACT_TRAP because it is the
same issue there i.e shouldnt be an opcode. For details see:
https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf

IMO:
It seems to me there are two actions here encapsulated in one.
The first is to "trap" and the second is to "drop".
This is no different semantically than say "mirror and drop"
offload being enunciated by "skip_sw".

Does the spectrum not support multiple actions?
e.g with a policy like:
  match blah action trap action drop skip_sw

cheers,
jamal
Jakub Kicinski April 8, 2021, 9:25 p.m. UTC | #2
On Thu, 8 Apr 2021 10:05:07 -0400 Jamal Hadi Salim wrote:
> On 2021-04-08 9:38 a.m., Petr Machata wrote:
> > The TC action "trap" is used to instruct the HW datapath to drop the
> > matched packet and transfer it for processing in the SW pipeline. If
> > instead it is desirable to forward the packet and transferring a _copy_ to
> > the SW pipeline, there is no practical way to achieve that.
> > 
> > To that end add a new generic action, trap_fwd. In the software pipeline,
> > it is equivalent to an OK. When offloading, it should forward the packet to
> > the host, but unlike trap it should not drop the packet.
> 
> I am concerned about adding new opcodes which only make sense if you
> offload (or make sense only if you are running in s/w).
> 
> Those opcodes are intended to be generic abstractions so the dispatcher
> can decide what to do next. Adding things that are specific only
> to scenarios of hardware offload removes that opaqueness.
> I must have missed the discussion on ACT_TRAP because it is the
> same issue there i.e shouldnt be an opcode. For details see:
> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf
> 
> IMO:
> It seems to me there are two actions here encapsulated in one.
> The first is to "trap" and the second is to "drop".
> This is no different semantically than say "mirror and drop"
> offload being enunciated by "skip_sw".
> 
> Does the spectrum not support multiple actions?
> e.g with a policy like:
>   match blah action trap action drop skip_sw

To make sure I understand - are you saying that trap should become 
more general and support both "and then drop" as well as "and then 
pass" semantics?

Seems like that ship has sailed, but also - how does it make it any
better WRT not having HW only opcodes? Or are you saying one is better
than two?
Petr Machata April 9, 2021, 11:03 a.m. UTC | #3
Jamal Hadi Salim <jhs@mojatatu.com> writes:

> I am concerned about adding new opcodes which only make sense if you
> offload (or make sense only if you are running in s/w).
>
> Those opcodes are intended to be generic abstractions so the dispatcher
> can decide what to do next. Adding things that are specific only
> to scenarios of hardware offload removes that opaqueness.
> I must have missed the discussion on ACT_TRAP because it is the
> same issue there i.e shouldnt be an opcode. For details see:
> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf

Trap has been in since 4.13, so 2017ish. It's done and dusted at this
point.

> IMO:
> It seems to me there are two actions here encapsulated in one.
> The first is to "trap" and the second is to "drop".
>
> This is no different semantically than say "mirror and drop"
> offload being enunciated by "skip_sw".
>
> Does the spectrum not support multiple actions?
> e.g with a policy like:
>  match blah action trap action drop skip_sw

Trap drops implicitly. We need a "trap, but don't drop". Expressed in
terms of existing actions it would be "mirred egress redirect dev
$cpu_port". But how to express $cpu_port except again by a HW-specific
magic token I don't know.
Jamal Hadi Salim April 9, 2021, 11:13 a.m. UTC | #4
On 2021-04-08 5:25 p.m., Jakub Kicinski wrote:
> On Thu, 8 Apr 2021 10:05:07 -0400 Jamal Hadi Salim wrote:
>> On 2021-04-08 9:38 a.m., Petr Machata wrote:
>>> The TC action "trap" is used to instruct the HW datapath to drop the
>>> matched packet and transfer it for processing in the SW pipeline. If
>>> instead it is desirable to forward the packet and transferring a _copy_ to
>>> the SW pipeline, there is no practical way to achieve that.
>>>
>>> To that end add a new generic action, trap_fwd. In the software pipeline,
>>> it is equivalent to an OK. When offloading, it should forward the packet to
>>> the host, but unlike trap it should not drop the packet.
>>
>> I am concerned about adding new opcodes which only make sense if you
>> offload (or make sense only if you are running in s/w).
>>
>> Those opcodes are intended to be generic abstractions so the dispatcher
>> can decide what to do next. Adding things that are specific only
>> to scenarios of hardware offload removes that opaqueness.
>> I must have missed the discussion on ACT_TRAP because it is the
>> same issue there i.e shouldnt be an opcode. For details see:
>> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf
>>
>> IMO:
>> It seems to me there are two actions here encapsulated in one.
>> The first is to "trap" and the second is to "drop".
>> This is no different semantically than say "mirror and drop"
>> offload being enunciated by "skip_sw".
>>
>> Does the spectrum not support multiple actions?
>> e.g with a policy like:
>>    match blah action trap action drop skip_sw
> 
> To make sure I understand - are you saying that trap should become
> more general and support both "and then drop" as well as "and then
> pass" semantics?
> 

No.
Main issue is the pollution of the opcodes - whether it is
one or multiple actions is less of a concern.
Those opcodes are intended to be for the core action dispatcher's
consumption. See figure 6 and table 1 of the document i referred to.

Basically:
You dont an action then add an opcode for it even if it is hardware
offloaded (otherwise that opcode space would have grown a lot more by
now for all those actions that are offloaded).
Trap, for example, could have been a dummy action that just returns
the STOLEN/DROP/PASS opcode and does nothing else.
Typically we expect things that are offloaded to have a software
equivalent. It makes for good control consistency etc clean.

> Seems like that ship has sailed, but also - how does it make it any
> better WRT not having HW only opcodes? Or are you saying one is better
> than two?

The opcodes are not tied to whether an action is offloaded or not.
That role belongs to the "skip_sw" axes - which works well
today since we dont offload actions on their own without some
filter rule which specifies the offload option.

I will barf if someone implements 3 actions: "trap", "trap and forward",
"trap and drop" - but that is not messing up with the core architecture
so the barfing is more due to the bad taste of that approach.
A cleaner approach is to code one and change the return code for those
3 to "STOLEN", "PIPE", and "DROP"

cheers,
jamal
Jamal Hadi Salim April 9, 2021, 11:44 a.m. UTC | #5
On 2021-04-09 7:03 a.m., Petr Machata wrote:
> 
> Jamal Hadi Salim <jhs@mojatatu.com> writes:
> 
>> I am concerned about adding new opcodes which only make sense if you
>> offload (or make sense only if you are running in s/w).
>>
>> Those opcodes are intended to be generic abstractions so the dispatcher
>> can decide what to do next. Adding things that are specific only
>> to scenarios of hardware offload removes that opaqueness.
>> I must have missed the discussion on ACT_TRAP because it is the
>> same issue there i.e shouldnt be an opcode. For details see:
>> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf
> 
> Trap has been in since 4.13, so 2017ish. It's done and dusted at this
> point.
> 

I am afraid that is not a good arguement. With all due respect,
here's how it translates:
"We already made a mistake, therefore, its ok to build on it and
make more mistakes". Touching those opcodes is really dirty; at
least i have seen no convincing arguement _at all_ for it. And,
it is not too late not to make more mistakes.
I dont remember, I may have spoken against TRAP; what i know is had
i seen the patch i would have said something - maybe i did and should
have been louder. Mea culpa.

>> IMO:
>> It seems to me there are two actions here encapsulated in one.
>> The first is to "trap" and the second is to "drop".
>>
>> This is no different semantically than say "mirror and drop"
>> offload being enunciated by "skip_sw".
>>
>> Does the spectrum not support multiple actions?
>> e.g with a policy like:
>>   match blah action trap action drop skip_sw
> 
> Trap drops implicitly. We need a "trap, but don't drop". Expressed in
> terms of existing actions it would be "mirred egress redirect dev
> $cpu_port". But how to express $cpu_port except again by a HW-specific
> magic token I don't know.


Note: mirred was originally intended to send redirect/mirror
packets to user space (the comment is still there in the code).
Infact there is a patch lying around somewhere that does that with
packet sockets (the author hasnt been serious about pushing it
upstream). In that case the semantics are redirecting to a file
descriptor. Could we have something like that here which points
to whatever representation $cpu_port has? Sounds like semantics
for "trap and forward" are just "mirror and forward".

I think there is value in having something like trap action
which generalizes the combinations only to the fact that
it will make it easier to relay the info to the offload without
much transformation.
If i was to do it i would write one action configured by user space:
- to return DROP if you want action trap-and-drop semantics.
- to return STOLEN if you want trap
- to return PIPE if you want trap and forward. You will need a second
action composed to forward.

I said dummy because this action has no value in s/w. Someone could
use it in s/w but it would be no different than gact.
Maybe it could be extended to work also in s/w by adding the "trap to
fd" in userspace.

cheers,
jamal
Petr Machata April 9, 2021, 1:43 p.m. UTC | #6
Jamal Hadi Salim <jhs@mojatatu.com> writes:

> On 2021-04-09 7:03 a.m., Petr Machata wrote:
>> Jamal Hadi Salim <jhs@mojatatu.com> writes:
>> 
>>> I am concerned about adding new opcodes which only make sense if you
>>> offload (or make sense only if you are running in s/w).
>>>
>>> Those opcodes are intended to be generic abstractions so the dispatcher
>>> can decide what to do next.
>>> [...]
>>> For details see:
>>> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf
>>
>> Trap has been in since 4.13, so 2017ish. It's done and dusted at this
>> point.
>
> here's how it translates:
> "We already made a mistake, therefore, its ok to build on it and
> make more mistakes".

I can see how it reads that way, but that was not the intention. I was
actually thinking about whether there might be a way to gradually
migrate all this stuff over to mirred, but at this point, trap is very
much baked in.

>>> IMO:
>>> It seems to me there are two actions here encapsulated in one.
>>> The first is to "trap" and the second is to "drop".
>>>
>>> This is no different semantically than say "mirror and drop"
>>> offload being enunciated by "skip_sw".
>>>
>>> Does the spectrum not support multiple actions?
>>> e.g with a policy like:
>>>   match blah action trap action drop skip_sw
>> Trap drops implicitly. We need a "trap, but don't drop". Expressed in
>> terms of existing actions it would be "mirred egress redirect dev
>> $cpu_port". But how to express $cpu_port except again by a HW-specific
>> magic token I don't know.

(I meant mirred egress mirror, not redirect.)

> Note: mirred was originally intended to send redirect/mirror
> packets to user space (the comment is still there in the code).
> Infact there is a patch lying around somewhere that does that with
> packet sockets (the author hasnt been serious about pushing it
> upstream). In that case the semantics are redirecting to a file
> descriptor. Could we have something like that here which points
> to whatever representation $cpu_port has? Sounds like semantics
> for "trap and forward" are just "mirror and forward".

Hmm, we have devlink ports, the CPU port is exposed there. But that's
the only thing that comes to mind. Those are specific for the given
device though, it doesn't look suitable...

> I think there is value in having something like trap action
> which generalizes the combinations only to the fact that
> it will make it easier to relay the info to the offload without
> much transformation.
> If i was to do it i would write one action configured by user space:
> - to return DROP if you want action trap-and-drop semantics.
> - to return STOLEN if you want trap
> - to return PIPE if you want trap and forward. You will need a second
> action composed to forward.

I think your STOLEN and PIPE are the same behavior. Both are "transfer
the packet to the SW datapath, but keep it in the HW datapath".

In general I have no issue expressing this stuff as a new action,
instead of an opcode. I'll take a look at this.
Jamal Hadi Salim April 11, 2021, 7:23 p.m. UTC | #7
On 2021-04-09 9:43 a.m., Petr Machata wrote:
> 
> Jamal Hadi Salim <jhs@mojatatu.com> writes:
> 

>>>> Does the spectrum not support multiple actions?
>>>> e.g with a policy like:
>>>>    match blah action trap action drop skip_sw
>>> Trap drops implicitly. We need a "trap, but don't drop". Expressed in
>>> terms of existing actions it would be "mirred egress redirect dev
>>> $cpu_port". But how to express $cpu_port except again by a HW-specific
>>> magic token I don't know.
> 
> (I meant mirred egress mirror, not redirect.)
>

Ok.

>> Note: mirred was originally intended to send redirect/mirror
>> packets to user space (the comment is still there in the code).
>> Infact there is a patch lying around somewhere that does that with
>> packet sockets (the author hasnt been serious about pushing it
>> upstream). In that case the semantics are redirecting to a file
>> descriptor. Could we have something like that here which points
>> to whatever representation $cpu_port has? Sounds like semantics
>> for "trap and forward" are just "mirror and forward".
> 
> Hmm, we have devlink ports, the CPU port is exposed there. But that's
> the only thing that comes to mind. Those are specific for the given
> device though, it doesn't look suitable...
> 

If it has an ifindex should be good enough for abstraction
purposes.

>> I think there is value in having something like trap action
>> which generalizes the combinations only to the fact that
>> it will make it easier to relay the info to the offload without
>> much transformation.
>> If i was to do it i would write one action configured by user space:
>> - to return DROP if you want action trap-and-drop semantics.
>> - to return STOLEN if you want trap
>> - to return PIPE if you want trap and forward. You will need a second
>> action composed to forward.
> 
> I think your STOLEN and PIPE are the same behavior. Both are "transfer
> the packet to the SW datapath, but keep it in the HW datapath".
> 
> In general I have no issue expressing this stuff as a new action,
> instead of an opcode. I'll take a look at this.
> 

Ok, thanks.

cheers,
jamal
diff mbox series

Patch

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 025c40fef93d..a1bbccb88e67 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -72,7 +72,11 @@  enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
-#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
+#define TC_ACT_TRAP_FWD		9 /* For hw path, this means "send a copy
+				   * of the packet to the cpu". For sw
+				   * datapath, this is like TC_ACT_OK.
+				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP_FWD
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in
diff --git a/net/core/dev.c b/net/core/dev.c
index 9d1a8fac793f..f0b8c16dbf12 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3975,6 +3975,7 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+	case TC_ACT_TRAP_FWD:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
@@ -5083,6 +5084,7 @@  sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 				     &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+	case TC_ACT_TRAP_FWD:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index e48e980c3b93..be2a51c6f84e 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -54,8 +54,16 @@  static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
 		bpf_compute_data_pointers(skb);
 		filter_res = BPF_PROG_RUN(filter, skb);
 	}
-	if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
-		skb_orphan(skb);
+	if (skb_sk_is_prefetched(skb)) {
+		switch (filter_res) {
+		case TC_ACT_OK:
+		case TC_ACT_TRAP_FWD:
+			break;
+		default:
+			skb_orphan(skb);
+			break;
+		}
+	}
 	rcu_read_unlock();
 
 	/* A BPF program may overwrite the default action opcode.
@@ -72,6 +80,7 @@  static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
 	case TC_ACT_PIPE:
 	case TC_ACT_RECLASSIFY:
 	case TC_ACT_OK:
+	case TC_ACT_TRAP_FWD:
 	case TC_ACT_REDIRECT:
 		action = filter_res;
 		break;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6e3e63db0e01..5fd96cf2dca7 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -69,6 +69,7 @@  static int cls_bpf_exec_opcode(int code)
 	case TC_ACT_SHOT:
 	case TC_ACT_STOLEN:
 	case TC_ACT_TRAP:
+	case TC_ACT_TRAP_FWD:
 	case TC_ACT_REDIRECT:
 	case TC_ACT_UNSPEC:
 		return code;
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index cd2748e2d4a2..054a06bd9dc8 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -258,6 +258,7 @@  static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			goto drop;
 #endif
 		case TC_ACT_OK:
+		case TC_ACT_TRAP_FWD:
 			skb->tc_index = TC_H_MIN(res.classid);
 			break;
 
diff --git a/tools/include/uapi/linux/pkt_cls.h b/tools/include/uapi/linux/pkt_cls.h
index 12153771396a..ccfa424dfeaf 100644
--- a/tools/include/uapi/linux/pkt_cls.h
+++ b/tools/include/uapi/linux/pkt_cls.h
@@ -45,7 +45,11 @@  enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
-#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
+#define TC_ACT_TRAP_FWD		9 /* For hw path, this means "send a copy
+				   * of the packet to the cpu". For sw
+				   * datapath, this is like TC_ACT_OK.
+				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP_FWD
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in