diff mbox series

[net,v5] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.

Message ID 20241107231123.298299-1-alexandre.ferrieux@orange.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v5] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success Errors and warnings before: 2 (+0) this patch: 2 (+0)
netdev/cc_maintainers warning 6 maintainers not CCed: horms@kernel.org pctammela@mojatatu.com shuah@kernel.org pabeni@redhat.com linux-kselftest@vger.kernel.org kuba@kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>' != 'Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 2

Commit Message

Alexandre Ferrieux Nov. 7, 2024, 11:11 p.m. UTC
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":

  tc filter add dev myve $FILTER1
  tc filter add dev myve $FILTER2
  for i in {1..2048}
  do
    echo $i
    tc filter del dev myve $FILTER2
    tc filter add dev myve $FILTER2
  done

This patch adds the missing decoding logic for handles that
deserve it, along with a corresponding tdc test.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
---
v5: fix title - again
v4: add tdc test
v3: prepend title with subsystem ident
v2: use u32 type in handle encoder/decoder


 net/sched/cls_u32.c                           | 18 ++++++++++----
 .../tc-testing/tc-tests/filters/u32.json      | 24 +++++++++++++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)

Comments

Cong Wang Nov. 8, 2024, 1:34 a.m. UTC | #1
On Fri, Nov 08, 2024 at 12:11:23AM +0100, Alexandre Ferrieux wrote:
> To generate hnode handles (in gen_new_htid()), u32 uses IDR and
> encodes the returned small integer into a structured 32-bit
> word. Unfortunately, at disposal time, the needed decoding
> is not done. As a result, idr_remove() fails, and the IDR
> fills up. Since its size is 2048, the following script ends up
> with "Filter already exists":
> 
>   tc filter add dev myve $FILTER1
>   tc filter add dev myve $FILTER2
>   for i in {1..2048}
>   do
>     echo $i
>     tc filter del dev myve $FILTER2
>     tc filter add dev myve $FILTER2
>   done
> 
> This patch adds the missing decoding logic for handles that
> deserve it, along with a corresponding tdc test.

Good catch! I have a few comments below.

> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
> ---
> v5: fix title - again
> v4: add tdc test
> v3: prepend title with subsystem ident
> v2: use u32 type in handle encoder/decoder
> 
> 
>  net/sched/cls_u32.c                           | 18 ++++++++++----
>  .../tc-testing/tc-tests/filters/u32.json      | 24 +++++++++++++++++++
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 9412d88a99bc..6da94b809926 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -41,6 +41,16 @@
>  #include <linux/idr.h>
>  #include <net/tc_wrapper.h>
>  
> +static inline u32 handle2id(u32 h)
> +{
> +	return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h);
> +}
> +
> +static inline u32 id2handle(u32 id)
> +{
> +	return (id | 0x800U) << 20;
> +}
> +
>  struct tc_u_knode {
>  	struct tc_u_knode __rcu	*next;
>  	u32			handle;
> @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
>  	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
>  	if (id < 0)
>  		return 0;
> -	return (id | 0x800U) << 20;
> +	return id2handle(id);
>  }
>  
>  static struct hlist_head *tc_u_common_hash;
> @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp)
>  		return -ENOBUFS;
>  
>  	refcount_set(&root_ht->refcnt, 1);
> -	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
> +	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0);
>  	root_ht->prio = tp->prio;
>  	root_ht->is_root = true;
>  	idr_init(&root_ht->handle_idr);
> @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
>  		if (phn == ht) {
>  			u32_clear_hw_hnode(tp, ht, extack);
>  			idr_destroy(&ht->handle_idr);
> -			idr_remove(&tp_c->handle_idr, ht->handle);
> +			idr_remove(&tp_c->handle_idr, handle2id(ht->handle));
>  			RCU_INIT_POINTER(*hn, ht->next);
>  			kfree_rcu(ht, rcu);
>  			return 0;
> @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
>  
>  		err = u32_replace_hw_hnode(tp, ht, userflags, extack);
>  		if (err) {
> -			idr_remove(&tp_c->handle_idr, handle);
> +			idr_remove(&tp_c->handle_idr, handle2id(handle));
>  			kfree(ht);
>  			return err;
>  		}

It seems you missed the idr_replace() case?

 - idr_replace(&ht->handle_idr, n, n->handle);
 + idr_replace(&ht->handle_idr, n, handle2id(n->handle));


> diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
> index 24bd0c2a3014..2095baa19c6a 100644
> --- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
> +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
> @@ -329,5 +329,29 @@
>          "teardown": [
>              "$TC qdisc del dev $DEV1 parent root drr"
>          ]
> +    },
> +    {
> +        "id": "1234",
> +        "name": "Exercise IDR leaks by creating/deleting a filter many (2048) times",
> +        "category": [
> +            "filter",
> +            "u32"
> +        ],
> +        "plugins": {
> +            "requires": "nsPlugin"
> +        },
> +        "setup": [
> +            "$TC qdisc add dev $DEV1 parent root handle 10: drr",
> +            "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 2 u32 match ip src 0.0.0.2/32 action drop",
> +            "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop"
> +        ],
> +        "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'",

Any reason using this for loop instead of tdc_multibatch.py?

Thanks.
Alexandre Ferrieux Nov. 8, 2024, 2:46 p.m. UTC | #2
On 08/11/2024 02:34, Cong Wang wrote:
> On Fri, Nov 08, 2024 at 12:11:23AM +0100, Alexandre Ferrieux wrote:
>> To generate hnode handles (in gen_new_htid()), u32 uses IDR and
>> encodes the returned small integer into a structured 32-bit
>> word. Unfortunately, at disposal time, the needed decoding
>> is not done. As a result, idr_remove() fails, and the IDR
>> fills up. [...]
>> This patch adds the missing decoding logic for handles that
>> deserve it, along with a corresponding tdc test.
> 
> Good catch! I have a few comments below.
> [...]
> It seems you missed the idr_replace() case?
> 
>  - idr_replace(&ht->handle_idr, n, n->handle);
>  + idr_replace(&ht->handle_idr, n, handle2id(n->handle));

Unless I'm mistaken, there are two different kinds of IDR:
 - the "toplevel" IDR tp_c->handle_idr used to generate the IDs of hnodes
(hashtables)
 - the individual IDR ht->handle_idr holding IDs of knodes
As it turns out, hnodes use small integers in [1..0x7FF] in the IDR space, and
carry them along "encoded" (with id2handle() now), while knodes are directly
allocated in a higher range of the IDR space. As a consequence, it does not make
sense to encode/decode them with the hnode-oriented scheme.
It looks like idr_replace() is used on knodes, so it should not make sense to
add the decoding here.

Now, this is all based on reverse engineering, as I'm not aware of detailed
documentation (beyond code comments). Please correct me if I'm wrong.

>> +        "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'",
> 
> Any reason using this for loop instead of tdc_multibatch.py?

I have zero experience with tdc, so I'm just working on the README and
user-other oriented documentation, suggesting to extend the provided JSON test
specification. So quite possibly I missed some "other ways". However, I have
just posted a v6 of the patch that uses tc in batch mode (tc -b) and reduces the
execution time of this test from over 10 secs to a split second, and to me the
resulting line remains readable:

	for i in {1..2048} ;do echo filter delete dev $DEV1 pref 3;echo filter add dev
$DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action
drop;done | $TC -b -

Please tell me if you deem necessary to switch to another method and/or syntax.
diff mbox series

Patch

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9412d88a99bc..6da94b809926 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -41,6 +41,16 @@ 
 #include <linux/idr.h>
 #include <net/tc_wrapper.h>
 
+static inline u32 handle2id(u32 h)
+{
+	return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h);
+}
+
+static inline u32 id2handle(u32 id)
+{
+	return (id | 0x800U) << 20;
+}
+
 struct tc_u_knode {
 	struct tc_u_knode __rcu	*next;
 	u32			handle;
@@ -310,7 +320,7 @@  static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
 	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
 	if (id < 0)
 		return 0;
-	return (id | 0x800U) << 20;
+	return id2handle(id);
 }
 
 static struct hlist_head *tc_u_common_hash;
@@ -360,7 +370,7 @@  static int u32_init(struct tcf_proto *tp)
 		return -ENOBUFS;
 
 	refcount_set(&root_ht->refcnt, 1);
-	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
+	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0);
 	root_ht->prio = tp->prio;
 	root_ht->is_root = true;
 	idr_init(&root_ht->handle_idr);
@@ -612,7 +622,7 @@  static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 		if (phn == ht) {
 			u32_clear_hw_hnode(tp, ht, extack);
 			idr_destroy(&ht->handle_idr);
-			idr_remove(&tp_c->handle_idr, ht->handle);
+			idr_remove(&tp_c->handle_idr, handle2id(ht->handle));
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -989,7 +999,7 @@  static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		err = u32_replace_hw_hnode(tp, ht, userflags, extack);
 		if (err) {
-			idr_remove(&tp_c->handle_idr, handle);
+			idr_remove(&tp_c->handle_idr, handle2id(handle));
 			kfree(ht);
 			return err;
 		}
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
index 24bd0c2a3014..2095baa19c6a 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
@@ -329,5 +329,29 @@ 
         "teardown": [
             "$TC qdisc del dev $DEV1 parent root drr"
         ]
+    },
+    {
+        "id": "1234",
+        "name": "Exercise IDR leaks by creating/deleting a filter many (2048) times",
+        "category": [
+            "filter",
+            "u32"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [
+            "$TC qdisc add dev $DEV1 parent root handle 10: drr",
+            "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 2 u32 match ip src 0.0.0.2/32 action drop",
+            "$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop"
+        ],
+        "cmdUnderTest": "bash -c 'for i in {1..2048} ;do $TC filter delete dev $DEV1 pref 3;$TC filter add dev $DEV1 parent 10:0 protocol ip prio 3 u32 match ip src 0.0.0.3/32 action drop || exit 1;i=`expr $i + 1`;done'",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show dev $DEV1",
+        "matchPattern": "protocol ip pref 3 u32",
+        "matchCount": "3",
+        "teardown": [
+            "$TC qdisc del dev $DEV1 parent root drr"
+        ]
     }
 ]