diff mbox series

[bpf] bpf/arena: fix softlockup in arena_map_free on 64k page kernel

Message ID 20250204092455.3693003-1-alan.maguire@oracle.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf] bpf/arena: fix softlockup in arena_map_free on 64k page kernel | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-11 success Logs for aarch64-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for s390x-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-19 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / build-release
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch warning CHECK: No space is necessary after a cast WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-gcc / veristat-kernel / x86_64-gcc veristat_kernel
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-gcc / veristat-meta / x86_64-gcc veristat_meta
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-17 / veristat-kernel
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-17 / veristat-meta
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-44 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-46 success Logs for x86_64-llvm-18 / veristat-meta
bpf/vmtest-bpf-VM_Test-45 success Logs for x86_64-llvm-18 / veristat-kernel
bpf/vmtest-bpf-VM_Test-43 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc

Commit Message

Alan Maguire Feb. 4, 2025, 9:24 a.m. UTC
On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y (64k pages),
arena_htab tests cause a segmentation fault and soft lockup.

$ sudo ./test_progs -t arena_htab
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1c)[0x7bd4d8]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb34a0968]
./test_progs[0x420f74]
./test_progs(htab_lookup_elem+0x3c)[0x421090]
./test_progs[0x421320]
./test_progs[0x421bb8]
./test_progs(test_arena_htab+0x40)[0x421c14]
./test_progs[0x7bda84]
./test_progs(main+0x65c)[0x7bf670]
/usr/lib64/libc.so.6(+0x2caa0)[0xffffb31ecaa0]
/usr/lib64/libc.so.6(__libc_start_main+0x98)[0xffffb31ecb78]
./test_progs(_start+0x30)[0x41b4f0]

Message from syslogd@bpfol9aarch64 at Feb  4 08:50:09 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u8:4:7589]

The same failure is not observed with 4k pages on aarch64.

Investigating further, it turns out arena_map_free() was calling
apply_to_existing_page_range() with the address returned by
bpf_arena_get_kern_vm_start().  If this address is not page-aligned -
as is the case for a 64k page kernel - we wind up calling apply_to_pte_range()
with that unaligned address.  The problem is apply_to_pte_range() implicitly
assumes that the addr passed in is page-aligned, specifically in this loop:

		do {
                        if (create || !pte_none(ptep_get(pte))) {
                                err = fn(pte++, addr, data);
                                if (err)
                                        break;
                        }
                } while (addr += PAGE_SIZE, addr != end);

If addr is _not_ page-aligned, it will never equal end exactly.

One solution is to round up the address returned by bpf_arena_get_kern_vm_start()
to a page-aligned value.  With that change in place the test passes:

$ sudo ./test_progs -t arena_htab
Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED

Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 kernel/bpf/arena.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Alexei Starovoitov Feb. 4, 2025, 10:10 a.m. UTC | #1
On Tue, Feb 4, 2025 at 9:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y (64k pages),
> arena_htab tests cause a segmentation fault and soft lockup.
>
> $ sudo ./test_progs -t arena_htab
> Caught signal #11!
> Stack trace:
> ./test_progs(crash_handler+0x1c)[0x7bd4d8]
> linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb34a0968]
> ./test_progs[0x420f74]
> ./test_progs(htab_lookup_elem+0x3c)[0x421090]
> ./test_progs[0x421320]
> ./test_progs[0x421bb8]
> ./test_progs(test_arena_htab+0x40)[0x421c14]
> ./test_progs[0x7bda84]
> ./test_progs(main+0x65c)[0x7bf670]
> /usr/lib64/libc.so.6(+0x2caa0)[0xffffb31ecaa0]
> /usr/lib64/libc.so.6(__libc_start_main+0x98)[0xffffb31ecb78]
> ./test_progs(_start+0x30)[0x41b4f0]
>
> Message from syslogd@bpfol9aarch64 at Feb  4 08:50:09 ...
>  kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u8:4:7589]
>
> The same failure is not observed with 4k pages on aarch64.
>
> Investigating further, it turns out arena_map_free() was calling
> apply_to_existing_page_range() with the address returned by
> bpf_arena_get_kern_vm_start().  If this address is not page-aligned -
> as is the case for a 64k page kernel - we wind up calling apply_to_pte_range()
> with that unaligned address.  The problem is apply_to_pte_range() implicitly
> assumes that the addr passed in is page-aligned, specifically in this loop:
>
>                 do {
>                         if (create || !pte_none(ptep_get(pte))) {
>                                 err = fn(pte++, addr, data);
>                                 if (err)
>                                         break;
>                         }
>                 } while (addr += PAGE_SIZE, addr != end);
>
> If addr is _not_ page-aligned, it will never equal end exactly.
>
> One solution is to round up the address returned by bpf_arena_get_kern_vm_start()
> to a page-aligned value.  With that change in place the test passes:
>
> $ sudo ./test_progs -t arena_htab
> Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED
>
> Reported-by: Colm Harrington <colm.harrington@oracle.com>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  kernel/bpf/arena.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index 870aeb51d70a..07395c55833e 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -54,7 +54,7 @@ struct bpf_arena {
>
>  u64 bpf_arena_get_kern_vm_start(struct bpf_arena *arena)
>  {
> -       return arena ? (u64) (long) arena->kern_vm->addr + GUARD_SZ / 2 : 0;
> +       return arena ? (u64) round_up((long) arena->kern_vm->addr + GUARD_SZ / 2, PAGE_SIZE) : 0;

Thanks for the report. The fix is incorrect though.
GUARD_SZ/2 is 32k,
so with roundup the upper guard is gone.
We probably need to:
-#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8)
+#define GUARD_SZ round_up(1ull << sizeof_field(struct bpf_insn, off)
* 8, PAGE_SIZE << 1)

Better ideas?

pw-bot: cr
Alan Maguire Feb. 4, 2025, 3:53 p.m. UTC | #2
On 04/02/2025 10:10, Alexei Starovoitov wrote:
> On Tue, Feb 4, 2025 at 9:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y (64k pages),
>> arena_htab tests cause a segmentation fault and soft lockup.
>>
>> $ sudo ./test_progs -t arena_htab
>> Caught signal #11!
>> Stack trace:
>> ./test_progs(crash_handler+0x1c)[0x7bd4d8]
>> linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb34a0968]
>> ./test_progs[0x420f74]
>> ./test_progs(htab_lookup_elem+0x3c)[0x421090]
>> ./test_progs[0x421320]
>> ./test_progs[0x421bb8]
>> ./test_progs(test_arena_htab+0x40)[0x421c14]
>> ./test_progs[0x7bda84]
>> ./test_progs(main+0x65c)[0x7bf670]
>> /usr/lib64/libc.so.6(+0x2caa0)[0xffffb31ecaa0]
>> /usr/lib64/libc.so.6(__libc_start_main+0x98)[0xffffb31ecb78]
>> ./test_progs(_start+0x30)[0x41b4f0]
>>
>> Message from syslogd@bpfol9aarch64 at Feb  4 08:50:09 ...
>>  kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u8:4:7589]
>>
>> The same failure is not observed with 4k pages on aarch64.
>>
>> Investigating further, it turns out arena_map_free() was calling
>> apply_to_existing_page_range() with the address returned by
>> bpf_arena_get_kern_vm_start().  If this address is not page-aligned -
>> as is the case for a 64k page kernel - we wind up calling apply_to_pte_range()
>> with that unaligned address.  The problem is apply_to_pte_range() implicitly
>> assumes that the addr passed in is page-aligned, specifically in this loop:
>>
>>                 do {
>>                         if (create || !pte_none(ptep_get(pte))) {
>>                                 err = fn(pte++, addr, data);
>>                                 if (err)
>>                                         break;
>>                         }
>>                 } while (addr += PAGE_SIZE, addr != end);
>>
>> If addr is _not_ page-aligned, it will never equal end exactly.
>>
>> One solution is to round up the address returned by bpf_arena_get_kern_vm_start()
>> to a page-aligned value.  With that change in place the test passes:
>>
>> $ sudo ./test_progs -t arena_htab
>> Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED
>>
>> Reported-by: Colm Harrington <colm.harrington@oracle.com>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  kernel/bpf/arena.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
>> index 870aeb51d70a..07395c55833e 100644
>> --- a/kernel/bpf/arena.c
>> +++ b/kernel/bpf/arena.c
>> @@ -54,7 +54,7 @@ struct bpf_arena {
>>
>>  u64 bpf_arena_get_kern_vm_start(struct bpf_arena *arena)
>>  {
>> -       return arena ? (u64) (long) arena->kern_vm->addr + GUARD_SZ / 2 : 0;
>> +       return arena ? (u64) round_up((long) arena->kern_vm->addr + GUARD_SZ / 2, PAGE_SIZE) : 0;
> 
> Thanks for the report. The fix is incorrect though.
> GUARD_SZ/2 is 32k,
> so with roundup the upper guard is gone.
> We probably need to:
> -#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8)
> +#define GUARD_SZ round_up(1ull << sizeof_field(struct bpf_insn, off)
> * 8, PAGE_SIZE << 1)
>

I tested this and it also resolves the test failure/softlockup. I'll
wait for a bit but can follow up with v2 incorporating your fix if there
are no further suggestions for refinements. Thanks!

Alan

> Better ideas?
> 
> pw-bot: cr
>
diff mbox series

Patch

diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index 870aeb51d70a..07395c55833e 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -54,7 +54,7 @@  struct bpf_arena {
 
 u64 bpf_arena_get_kern_vm_start(struct bpf_arena *arena)
 {
-	return arena ? (u64) (long) arena->kern_vm->addr + GUARD_SZ / 2 : 0;
+	return arena ? (u64) round_up((long) arena->kern_vm->addr + GUARD_SZ / 2, PAGE_SIZE) : 0;
 }
 
 u64 bpf_arena_get_user_vm_start(struct bpf_arena *arena)