Message ID | 20241107231123.298299-1-alexandre.ferrieux@orange.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v5] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. | expand |
On Fri, Nov 08, 2024 at 12:11:23AM +0100, Alexandre Ferrieux wrote: > To generate hnode handles (in gen_new_htid()), u32 uses IDR and > encodes the returned small integer into a structured 32-bit > word. Unfortunately, at disposal time, the needed decoding > is not done. As a result, idr_remove() fails, and the IDR > fills up. Since its size is 2048, the following script ends up > with "Filter already exists": > > tc filter add dev myve $FILTER1 > tc filter add dev myve $FILTER2 > for i in {1..2048} > do > echo $i > tc filter del dev myve $FILTER2 > tc filter add dev myve $FILTER2 > done > > This patch adds the missing decoding logic for handles that > deserve it, along with a corresponding tdc test. Good catch! I have a few comments below. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Reviewed-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> > --- > v5: fix title - again > v4: add tdc test > v3: prepend title with subsystem ident > v2: use u32 type in handle encoder/decoder > > > net/sched/cls_u32.c | 18 ++++++++++---- > .../tc-testing/tc-tests/filters/u32.json | 24 +++++++++++++++++++ > 2 files changed, 38 insertions(+), 4 deletions(-) > > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c > index 9412d88a99bc..6da94b809926 100644 > --- a/net/sched/cls_u32.c > +++ b/net/sched/cls_u32.c > @@ -41,6 +41,16 @@ > #include <linux/idr.h> > #include <net/tc_wrapper.h> > > +static inline u32 handle2id(u32 h) > +{ > + return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h); > +} > + > +static inline u32 id2handle(u32 id) > +{ > + return (id | 0x800U) << 20; > +} > + > struct tc_u_knode { > struct tc_u_knode __rcu *next; > u32 handle; > @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr) > int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL); > if (id < 0) > return 0; > - return (id | 0x800U) << 20; > + return id2handle(id); > } > > static struct hlist_head *tc_u_common_hash; > @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp) > return -ENOBUFS; > > refcount_set(&root_ht->refcnt, 1); > - root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; > + root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0); > root_ht->prio = tp->prio; > root_ht->is_root = true; > idr_init(&root_ht->handle_idr); > @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, > if (phn == ht) { > u32_clear_hw_hnode(tp, ht, extack); > idr_destroy(&ht->handle_idr); > - idr_remove(&tp_c->handle_idr, ht->handle); > + idr_remove(&tp_c->handle_idr, handle2id(ht->handle)); > RCU_INIT_POINTER(*hn, ht->next); > kfree_rcu(ht, rcu); > return 0; > @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, > > err = u32_replace_hw_hnode(tp, ht, userflags, extack); > if (err) { > - idr_remove(&tp_c->handle_idr, handle); > + idr_remove(&tp_c->handle_idr, handle2id(handle)); > kfree(ht); > return err; > } It seems you missed the idr_replace() case? - idr_replace(&ht->handle_idr, n, n->handle); + idr_replace(&ht->handle_idr, n, handle2id(n->handle)); > diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json > index 24bd0c2a3014..2095baa19c6a 100644 > --- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json > +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json > @@ -329,5 +329,29 @@ > "teardown": [ > "$TC qdisc del dev $DEV1 parent root drr" > ] > + }, > + { > + "id": "1234", > + "name": "Exercise IDR leaks by creating/deleting a filter many (2048) times", > + "category": [ > + "filter", > + "u32" > + ], > + "plugins": { > + "requires": "nsPlugin" > + }, > + "setup": [ > + "$TC qdisc add dev $DEV1 parent root handle 10: drr", > + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 2 u32 match ip src 0.0.0.2/32 action drop", > + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop" > + ], > + "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'", Any reason using this for loop instead of tdc_multibatch.py? Thanks.
On 08/11/2024 02:34, Cong Wang wrote: > On Fri, Nov 08, 2024 at 12:11:23AM +0100, Alexandre Ferrieux wrote: >> To generate hnode handles (in gen_new_htid()), u32 uses IDR and >> encodes the returned small integer into a structured 32-bit >> word. Unfortunately, at disposal time, the needed decoding >> is not done. As a result, idr_remove() fails, and the IDR >> fills up. [...] >> This patch adds the missing decoding logic for handles that >> deserve it, along with a corresponding tdc test. > > Good catch! I have a few comments below. > [...] > It seems you missed the idr_replace() case? > > - idr_replace(&ht->handle_idr, n, n->handle); > + idr_replace(&ht->handle_idr, n, handle2id(n->handle)); Unless I'm mistaken, there are two different kinds of IDR: - the "toplevel" IDR tp_c->handle_idr used to generate the IDs of hnodes (hashtables) - the individual IDR ht->handle_idr holding IDs of knodes As it turns out, hnodes use small integers in [1..0x7FF] in the IDR space, and carry them along "encoded" (with id2handle() now), while knodes are directly allocated in a higher range of the IDR space. As a consequence, it does not make sense to encode/decode them with the hnode-oriented scheme. It looks like idr_replace() is used on knodes, so it should not make sense to add the decoding here. Now, this is all based on reverse engineering, as I'm not aware of detailed documentation (beyond code comments). Please correct me if I'm wrong. >> + "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'", > > Any reason using this for loop instead of tdc_multibatch.py? I have zero experience with tdc, so I'm just working on the README and user-other oriented documentation, suggesting to extend the provided JSON test specification. So quite possibly I missed some "other ways". However, I have just posted a v6 of the patch that uses tc in batch mode (tc -b) and reduces the execution time of this test from over 10 secs to a split second, and to me the resulting line remains readable: for i in {1..2048} ;do echo filter delete dev $DEV1 pref 3;echo filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop;done | $TC -b - Please tell me if you deem necessary to switch to another method and/or syntax.
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 9412d88a99bc..6da94b809926 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -41,6 +41,16 @@ #include <linux/idr.h> #include <net/tc_wrapper.h> +static inline u32 handle2id(u32 h) +{ + return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h); +} + +static inline u32 id2handle(u32 id) +{ + return (id | 0x800U) << 20; +} + struct tc_u_knode { struct tc_u_knode __rcu *next; u32 handle; @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr) int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL); if (id < 0) return 0; - return (id | 0x800U) << 20; + return id2handle(id); } static struct hlist_head *tc_u_common_hash; @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp) return -ENOBUFS; refcount_set(&root_ht->refcnt, 1); - root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; + root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0); root_ht->prio = tp->prio; root_ht->is_root = true; idr_init(&root_ht->handle_idr); @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, if (phn == ht) { u32_clear_hw_hnode(tp, ht, extack); idr_destroy(&ht->handle_idr); - idr_remove(&tp_c->handle_idr, ht->handle); + idr_remove(&tp_c->handle_idr, handle2id(ht->handle)); RCU_INIT_POINTER(*hn, ht->next); kfree_rcu(ht, rcu); return 0; @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, err = u32_replace_hw_hnode(tp, ht, userflags, extack); if (err) { - idr_remove(&tp_c->handle_idr, handle); + idr_remove(&tp_c->handle_idr, handle2id(handle)); kfree(ht); return err; } diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json index 24bd0c2a3014..2095baa19c6a 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json @@ -329,5 +329,29 @@ "teardown": [ "$TC qdisc del dev $DEV1 parent root drr" ] + }, + { + "id": "1234", + "name": "Exercise IDR leaks by creating/deleting a filter many (2048) times", + "category": [ + "filter", + "u32" + ], + "plugins": { + "requires": "nsPlugin" + }, + "setup": [ + "$TC qdisc add dev $DEV1 parent root handle 10: drr", + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 2 u32 match ip src 0.0.0.2/32 action drop", + "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop" + ], + "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'", + "expExitCode": "0", + "verifyCmd": "$TC filter show dev $DEV1", + "matchPattern": "protocol ip pref 3 u32", + "matchCount": "3", + "teardown": [ + "$TC qdisc del dev $DEV1 parent root drr" + ] } ]