diff mbox series

[RFC,net-next] net: sched: act_mirred: Extend the cpu mirred nest guard with an explicit loop ttl

Message ID 20231215180827.3638838-1-victor@mojatatu.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] net: sched: act_mirred: Extend the cpu mirred nest guard with an explicit loop ttl | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1181 this patch: 1181
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1145 this patch: 1145
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1237 this patch: 1237
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 59 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Victor Nogueira Dec. 15, 2023, 6:08 p.m. UTC
As pointed out by Jamal in:
https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/

Mirred is allowing for infinite loops in certain use cases, such as the
following:

----
sudo ip netns add p4node
sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \
   port0 address 10:00:00:02:AA:BB

sudo ip link set dev port0 netns p4node
sudo ip a add 10.0.0.1/24 dev p4port0
sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb
sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0
sudo ip netns exec p4node ip l set dev port0 up
sudo ip l set dev p4port0 up
sudo ip netns exec p4node tc qdisc add dev port0 clsact
sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \
   prio 10 matchall action mirred ingress redirect dev port0

ping -I p4port0 10.0.0.2 -c 1
-----

To solve this, we reintroduced a ttl variable attached to the skb (in
struct tc_skb_cb) which will prevent infinite loops for use cases such as
the one described above.

The nest per cpu variable (tcf_mirred_nest_level) is now only used for
detecting whether we should call netif_rx or netif_receive_skb when
sending the packet to ingress.

Note that we do increment the ttl in every redirect/mirror so if you
have policies that redirect or mirror between devices of up to
MAX_REC_LOOP (4) with this patch that will be considered to be a loop.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
---
 include/net/pkt_sched.h | 11 +++++++++++
 net/sched/act_mirred.c  | 11 +++++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

Comments

Davide Caratti Dec. 18, 2023, 8:49 a.m. UTC | #1
hello Victor, thanks for the patch!

On Fri, Dec 15, 2023 at 03:08:27PM -0300, Victor Nogueira wrote:
> As pointed out by Jamal in:
> https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/
> 
> Mirred is allowing for infinite loops in certain use cases, such as the
> following:
> 
> ----
> sudo ip netns add p4node
> sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \
>    port0 address 10:00:00:02:AA:BB
> 
> sudo ip link set dev port0 netns p4node
> sudo ip a add 10.0.0.1/24 dev p4port0
> sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb
> sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0
> sudo ip netns exec p4node ip l set dev port0 up
> sudo ip l set dev p4port0 up
> sudo ip netns exec p4node tc qdisc add dev port0 clsact
> sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \
>    prio 10 matchall action mirred ingress redirect dev port0
> 
> ping -I p4port0 10.0.0.2 -c 1
> -----
> 
> To solve this, we reintroduced a ttl variable attached to the skb (in
> struct tc_skb_cb) which will prevent infinite loops for use cases such as
> the one described above.
> 
> The nest per cpu variable (tcf_mirred_nest_level) is now only used for
> detecting whether we should call netif_rx or netif_receive_skb when
> sending the packet to ingress.

looks good to me. Do you think it's worth setting an initial value (0, AFAIU)
for tc_skb_cb(skb)->ttl inside tc_run() ?

other than this,

Acked-by: Davide Caratti <dcaratti@redhat.com>
Jamal Hadi Salim Dec. 18, 2023, 3:35 p.m. UTC | #2
On Mon, Dec 18, 2023 at 3:49 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> hello Victor, thanks for the patch!
>
> On Fri, Dec 15, 2023 at 03:08:27PM -0300, Victor Nogueira wrote:
> > As pointed out by Jamal in:
> > https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/
> >
> > Mirred is allowing for infinite loops in certain use cases, such as the
> > following:
> >
> > ----
> > sudo ip netns add p4node
> > sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \
> >    port0 address 10:00:00:02:AA:BB
> >
> > sudo ip link set dev port0 netns p4node
> > sudo ip a add 10.0.0.1/24 dev p4port0
> > sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb
> > sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0
> > sudo ip netns exec p4node ip l set dev port0 up
> > sudo ip l set dev p4port0 up
> > sudo ip netns exec p4node tc qdisc add dev port0 clsact
> > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \
> >    prio 10 matchall action mirred ingress redirect dev port0
> >
> > ping -I p4port0 10.0.0.2 -c 1
> > -----
> >
> > To solve this, we reintroduced a ttl variable attached to the skb (in
> > struct tc_skb_cb) which will prevent infinite loops for use cases such as
> > the one described above.
> >
> > The nest per cpu variable (tcf_mirred_nest_level) is now only used for
> > detecting whether we should call netif_rx or netif_receive_skb when
> > sending the packet to ingress.
>
> looks good to me. Do you think it's worth setting an initial value (0, AFAIU)
> for tc_skb_cb(skb)->ttl inside tc_run() ?
>

Good point but I am afraid that will reset the loop counter (imagine
ingress->ingress, egress->ingress etc). So it wont work. Unfortunately
we've hit a snag with cb because it is shared across multiple layers.
I am afraid we cant ignore it.
If the packet came downward from some upper layer (or driver, buggy
mostly) using the same ttl spot in the cb, then the ttl field will be
either 0 or > 0.
1) 0 < ttl < 4 then we will interpret it as "packet has looped before"
and we wont drop it, so we are good here and we will end up dropping
it later in one or more loop.
2) If the retrieved ttl is >=4 we will immediately drop it. This is
catastrophic because we cant stop ipv4 or esp from using this field to
their pleasure and they certainly dont reset these fields.

I dont see a way out unless we extend the skb->tc_at_ingress to be to
2 bits as it was originally. In the original code it was called we had
SET_TC_AT() which set two bits to in the skb->verdict to say "this
packet was last seen at ingress/egress/elsewhere". Elsewhere was a 0
and this got changed to a boolean which translates to "this packet was
last seen at ingress/egress". So if it came from a freshly allocated
skb eg from driver or if it came from the stack it will be 0 (since
the field is reserved for tc).

+Cc Florian.

cheers,
jamal

> other than this,
>
> Acked-by: Davide Caratti <dcaratti@redhat.com>
>
diff mbox series

Patch

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 9fa1d0794dfa..fb8234fd5324 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -282,6 +282,7 @@  struct tc_skb_cb {
 	u8 post_ct:1;
 	u8 post_ct_snat:1;
 	u8 post_ct_dnat:1;
+	u8 ttl:3;
 	u16 zone; /* Only valid if post_ct = true */
 };
 
@@ -293,6 +294,16 @@  static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb)
 	return cb;
 }
 
+static inline void tcf_ttl_set(struct sk_buff *skb, const u8 ttl)
+{
+	tc_skb_cb(skb)->ttl = ttl;
+}
+
+static inline u8 tcf_ttl_get(struct sk_buff *skb)
+{
+	return tc_skb_cb(skb)->ttl;
+}
+
 static inline bool tc_qdisc_stats_dump(struct Qdisc *sch,
 				       unsigned long cl,
 				       struct qdisc_walker *arg)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 0a711c184c29..42b267817f3c 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -29,7 +29,7 @@ 
 static LIST_HEAD(mirred_list);
 static DEFINE_SPINLOCK(mirred_list_lock);
 
-#define MIRRED_NEST_LIMIT    4
+#define MAX_REC_LOOP    4
 static DEFINE_PER_CPU(unsigned int, mirred_nest_level);
 
 static bool tcf_mirred_is_act_redirect(int action)
@@ -233,7 +233,6 @@  TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 	struct sk_buff *skb2 = skb;
 	bool m_mac_header_xmit;
 	struct net_device *dev;
-	unsigned int nest_level;
 	int retval, err = 0;
 	bool use_reinsert;
 	bool want_ingress;
@@ -243,9 +242,12 @@  TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 	int m_eaction;
 	int mac_len;
 	bool at_nh;
+	u8 ttl;
 
-	nest_level = __this_cpu_inc_return(mirred_nest_level);
-	if (unlikely(nest_level > MIRRED_NEST_LIMIT)) {
+	__this_cpu_inc(mirred_nest_level);
+
+	ttl = tcf_ttl_get(skb);
+	if (unlikely(ttl + 1 > MAX_REC_LOOP)) {
 		net_warn_ratelimited("Packet exceeded mirred recursion limit on dev %s\n",
 				     netdev_name(skb->dev));
 		__this_cpu_dec(mirred_nest_level);
@@ -307,6 +309,7 @@  TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 
 	skb2->skb_iif = skb->dev->ifindex;
 	skb2->dev = dev;
+	tcf_ttl_set(skb2, ttl + 1);
 
 	/* mirror is always swallowed */
 	if (is_redirect) {