diff mbox series

[bpf-next] bpf, lpm: fix check prefixlen before walking trie

Message ID 20231105085801.3742-1-dev@der-flo.net (mailing list archive)
State Accepted
Commit 856624f12b04a3f51094fa277a31a333ee81cb3f
Delegated to: BPF
Headers show
Series [bpf-next] bpf, lpm: fix check prefixlen before walking trie | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1352 this patch: 1352
netdev/cc_maintainers success CCed 15 of 15 maintainers
netdev/build_clang success Errors and warnings before: 1379 this patch: 1379
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1380 this patch: 1380
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc

Commit Message

Florian Lehner Nov. 5, 2023, 8:58 a.m. UTC
When looking up an element in LPM trie, the condition 'matchlen ==
trie->max_prefixlen' will never return true, if key->prefixlen is larger
than trie->max_prefixlen. Consequently all elements in the LPM trie will
be visited and no element is returned in the end.

To resolve this, check key->prefixlen first before walking the LPM trie.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Signed-off-by: Florian Lehner <dev@der-flo.net>
---
 kernel/bpf/lpm_trie.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

David Rheinsberg Nov. 5, 2023, 7:08 p.m. UTC | #1
Hi

On Sun, Nov 5, 2023, at 9:58 AM, Florian Lehner wrote:
> When looking up an element in LPM trie, the condition 'matchlen ==
> trie->max_prefixlen' will never return true, if key->prefixlen is larger
> than trie->max_prefixlen. Consequently all elements in the LPM trie will
> be visited and no element is returned in the end.
>
> To resolve this, check key->prefixlen first before walking the LPM trie.

Am I understanding you right that this is an optimization to avoid walking the entire trie? Because the way I read your commit-message I assume the output has always been NULL? Or am I missing something.

Do you have a specific use-case where such lookups are common? Can you explain why it is important to optimize this case? Because you now add a condition for every lookup just to optimize for the lookup-miss of a special case. I don't think I understand your reasoning here, but I might be missing some context.

Thanks!
David

> Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
> Signed-off-by: Florian Lehner <dev@der-flo.net>
> ---
>  kernel/bpf/lpm_trie.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 17c7e7782a1f..b32be680da6c 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -231,6 +231,9 @@ static void *trie_lookup_elem(struct bpf_map *map, 
> void *_key)
>  	struct lpm_trie_node *node, *found = NULL;
>  	struct bpf_lpm_trie_key *key = _key;
> 
> +	if (key->prefixlen > trie->max_prefixlen)
> +		return NULL;
> +
>  	/* Start walking the trie from the root node ... */
> 
>  	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> -- 
> 2.39.2
Florian Lehner Nov. 5, 2023, 8:33 p.m. UTC | #2
On Sun, Nov 05, 2023 at 08:08:43PM +0100, David Rheinsberg wrote:
> Hi
> 
> On Sun, Nov 5, 2023, at 9:58 AM, Florian Lehner wrote:
> > When looking up an element in LPM trie, the condition 'matchlen ==
> > trie->max_prefixlen' will never return true, if key->prefixlen is larger
> > than trie->max_prefixlen. Consequently all elements in the LPM trie will
> > be visited and no element is returned in the end.
> >
> 
> Am I understanding you right that this is an optimization to avoid walking the entire trie? Because the way I read your commit-message I assume the output has always been NULL? Or am I missing something.
> 
> Do you have a specific use-case where such lookups are common? Can you explain why it is important to optimize this case? Because you now add a condition for every lookup just to optimize for the lookup-miss of a special case. I don't think I understand your reasoning here, but I might be missing some context.
> 
> Thanks!
> David

Hi David,

Your understanding is correct. The return value currently and with this patch is
in both cases the same for the case where key->prefixlen > trie->max_prefixlen.

The optimization is to avoid the locking mechanism, walking the trie and
checking its elements. It might not be the most common use case, so I see your
point.

> 
> > Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
> > Signed-off-by: Florian Lehner <dev@der-flo.net>
> > ---
> >  kernel/bpf/lpm_trie.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> > index 17c7e7782a1f..b32be680da6c 100644
> > --- a/kernel/bpf/lpm_trie.c
> > +++ b/kernel/bpf/lpm_trie.c
> > @@ -231,6 +231,9 @@ static void *trie_lookup_elem(struct bpf_map *map, 
> > void *_key)
> >  	struct lpm_trie_node *node, *found = NULL;
> >  	struct bpf_lpm_trie_key *key = _key;
> > 
> > +	if (key->prefixlen > trie->max_prefixlen)
> > +		return NULL;
> > +
> >  	/* Start walking the trie from the root node ... */
> > 
> >  	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> > -- 
> > 2.39.2
David Rheinsberg Nov. 6, 2023, 8 a.m. UTC | #3
Hi

On Sun, Nov 5, 2023, at 9:33 PM, Florian Lehner wrote:
> On Sun, Nov 05, 2023 at 08:08:43PM +0100, David Rheinsberg wrote:
>> On Sun, Nov 5, 2023, at 9:58 AM, Florian Lehner wrote:
>> > When looking up an element in LPM trie, the condition 'matchlen ==
>> > trie->max_prefixlen' will never return true, if key->prefixlen is larger
>> > than trie->max_prefixlen. Consequently all elements in the LPM trie will
>> > be visited and no element is returned in the end.
>> 
>> Am I understanding you right that this is an optimization to avoid walking the entire trie? Because the way I read your commit-message I assume the output has always been NULL? Or am I missing something.
>> 
>> Do you have a specific use-case where such lookups are common? Can you explain why it is important to optimize this case? Because you now add a condition for every lookup just to optimize for the lookup-miss of a special case. I don't think I understand your reasoning here, but I might be missing some context.
>
> Your understanding is correct. The return value currently and with this patch is
> in both cases the same for the case where key->prefixlen > trie->max_prefixlen.

Thanks for clarifying! I think using "fix" to describe the patch is misleading and confused me. Similarly, your "Fixes:" tag implies you repaired something that was broken.

> The optimization is to avoid the locking mechanism, walking the trie and
> checking its elements. It might not be the most common use case, so I see your
> point.

Can you elaborate on how you encountered this? Do you have an actual use-case where such lookups better be fast? Is it worth it slowing down every other lookup just to make this one faster?

The patch looks good, but I also don't see the benefit. I am not against it, though, if you insist you need it.

Thanks
David
patchwork-bot+netdevbpf@kernel.org Nov. 6, 2023, 8 p.m. UTC | #4
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Sun,  5 Nov 2023 09:58:01 +0100 you wrote:
> When looking up an element in LPM trie, the condition 'matchlen ==
> trie->max_prefixlen' will never return true, if key->prefixlen is larger
> than trie->max_prefixlen. Consequently all elements in the LPM trie will
> be visited and no element is returned in the end.
> 
> To resolve this, check key->prefixlen first before walking the LPM trie.
> 
> [...]

Here is the summary with links:
  - [bpf-next] bpf, lpm: fix check prefixlen before walking trie
    https://git.kernel.org/bpf/bpf-next/c/856624f12b04

You are awesome, thank you!
diff mbox series

Patch

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 17c7e7782a1f..b32be680da6c 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -231,6 +231,9 @@  static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 	struct lpm_trie_node *node, *found = NULL;
 	struct bpf_lpm_trie_key *key = _key;
 
+	if (key->prefixlen > trie->max_prefixlen)
+		return NULL;
+
 	/* Start walking the trie from the root node ... */
 
 	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());