diff mbox series

bpf: Fix out-of-bounds write in trie_get_next_key()

Message ID ZxcDzT/iv/f0Gyz0@localhost.localdomain (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: Fix out-of-bounds write in trie_get_next_key() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 10 maintainers not CCed: song@kernel.org haoluo@google.com ast@kernel.org andrii@kernel.org john.fastabend@gmail.com sdf@fomichev.me martin.lau@linux.dev kpsingh@kernel.org eddyz87@gmail.com jolsa@kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

Commit Message

Byeonguk Jeong Oct. 22, 2024, 1:45 a.m. UTC
trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---
 kernel/bpf/lpm_trie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Toke Høiland-Jørgensen Oct. 22, 2024, 9:43 a.m. UTC | #1
Byeonguk Jeong <jungbu2855@gmail.com> writes:

> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>

Makes sense!

Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Alexei Starovoitov Oct. 22, 2024, 7:51 p.m. UTC | #2
On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
>
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Hmm. It sounds possible, but pls demonstrate it with a selftest.
With the amount of fuzzing I'm surprised it was not discovered earlier.

pw-bot: cr
Byeonguk Jeong Oct. 23, 2024, 1:29 a.m. UTC | #3
On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
> >
> > trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> > full paths from the root to leaves. For example, consider a trie with
> > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> > .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
> 
> Hmm. It sounds possible, but pls demonstrate it with a selftest.
> With the amount of fuzzing I'm surprised it was not discovered earlier.
> 
> pw-bot: cr

With a simple test below, the kernel crashes in a minute or you can easily
discover the bug on KFENCE-enabled kernels.

#!/bin/bash
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \
entries 16 flags 0x1name lpm

for i in {0..8}; do
	bpftool map update pinned /sys/fs/bpf/lpm \
	key hex 0$i 00 00 00 00 \
	value hex 00 any
done

while true; do
	bpftool map dump pinned /sys/fs/bpf/lpm
done

In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this
message as expected.

[  463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670

[  463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156):
[  463.144438]  trie_get_next_key+0x2f2/0x670
[  463.145439]  map_get_next_key+0x261/0x410
[  463.146444]  __sys_bpf+0xad4/0x1170
[  463.147438]  __x64_sys_bpf+0x74/0xc0
[  463.148431]  do_syscall_64+0x79/0x150
[  463.149425]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256

[  463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago):
[  463.154413]  trie_get_next_key+0x252/0x670
[  463.155411]  map_get_next_key+0x261/0x410
[  463.156402]  __sys_bpf+0xad4/0x1170
[  463.157390]  __x64_sys_bpf+0x74/0xc0
[  463.158386]  do_syscall_64+0x79/0x150
[  463.159372]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Hou Tao Oct. 23, 2024, 2:03 a.m. UTC | #4
On 10/22/2024 9:45 AM, Byeonguk Jeong wrote:
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> ---

Tested-by: Hou Tao <houtao1@huawei.com>

Without the fix, there will be KASAN report as show below when dumping
all keys in the lpm-trie through bpf_map_get_next_key().

However, I have a dumb question: does it make sense to reject the
element with prefixlen = 0 ? Because I can't think of a use case where a
zero-length prefix will be useful.


 ==================================================================
 BUG: KASAN: slab-out-of-bounds in trie_get_next_key+0x133/0x530
 Write of size 8 at addr ffff8881076c2fc0 by task test_lpm_trie.b/446

 CPU: 0 UID: 0 PID: 446 Comm: test_lpm_trie.b Not tainted 6.11.0+ #52
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
 Call Trace:
  <TASK>
  dump_stack_lvl+0x6e/0xb0
  print_report+0xce/0x610
  ? trie_get_next_key+0x133/0x530
  ? kasan_complete_mode_report_info+0x3c/0x200
  ? trie_get_next_key+0x133/0x530
  kasan_report+0x9c/0xd0
  ? trie_get_next_key+0x133/0x530
  __asan_store8+0x81/0xb0
  trie_get_next_key+0x133/0x530
  __sys_bpf+0x1b03/0x3140
  ? __pfx___sys_bpf+0x10/0x10
  ? __pfx_vfs_write+0x10/0x10
  ? find_held_lock+0x8e/0xb0
  ? ksys_write+0xee/0x180
  ? syscall_exit_to_user_mode+0xb3/0x220
  ? mark_held_locks+0x28/0x90
  ? mark_held_locks+0x28/0x90
  __x64_sys_bpf+0x45/0x60
  x64_sys_call+0x1b2a/0x20d0
  do_syscall_64+0x5d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f9c5e9c9c5d
  ......
  </TASK>
 Allocated by task 446:
  kasan_save_stack+0x28/0x50
  kasan_save_track+0x14/0x30
  kasan_save_alloc_info+0x36/0x40
  __kasan_kmalloc+0x84/0xa0
  __kmalloc_noprof+0x214/0x540
  trie_get_next_key+0xa7/0x530
  __sys_bpf+0x1b03/0x3140
  __x64_sys_bpf+0x45/0x60
  x64_sys_call+0x1b2a/0x20d0
  do_syscall_64+0x5d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

 The buggy address belongs to the object at ffff8881076c2f80
  which belongs to the cache kmalloc-rnd-09-64 of size 64
 The buggy address is located 0 bytes to the right of
  allocated 64-byte region [ffff8881076c2f80, ffff8881076c2fc0)

>  kernel/bpf/lpm_trie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0218a5132ab5..9b60eda0f727 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
>  	if (!key || key->prefixlen > trie->max_prefixlen)
>  		goto find_leftmost;
>  
> -	node_stack = kmalloc_array(trie->max_prefixlen,
> +	node_stack = kmalloc_array(trie->max_prefixlen + 1,
>  				   sizeof(struct lpm_trie_node *),
>  				   GFP_ATOMIC | __GFP_NOWARN);
>  	if (!node_stack)
diff mbox series

Patch

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0218a5132ab5..9b60eda0f727 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -655,7 +655,7 @@  static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
 	if (!key || key->prefixlen > trie->max_prefixlen)
 		goto find_leftmost;
 
-	node_stack = kmalloc_array(trie->max_prefixlen,
+	node_stack = kmalloc_array(trie->max_prefixlen + 1,
 				   sizeof(struct lpm_trie_node *),
 				   GFP_ATOMIC | __GFP_NOWARN);
 	if (!node_stack)