diff mbox series

[-next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb()

Message ID 20221101040440.3637007-1-zhongbaisong@huawei.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [-next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb() | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 35 this patch: 35
netdev/cc_maintainers fail 2 blamed authors not CCed: daniel@iogearbox.net martin.lau@linux.dev; 7 maintainers not CCed: sdf@google.com john.fastabend@gmail.com andrii@kernel.org jolsa@kernel.org kpsingh@kernel.org daniel@iogearbox.net martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 35 this patch: 35
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 pending Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_parallel on s390x with gcc

Commit Message

Baisong Zhong Nov. 1, 2022, 4:04 a.m. UTC
Recently, we got a syzkaller problem because of aarch64
alignment fault if KFENCE enabled.

When the size from user bpf program is an odd number, like
399, 407, etc, it will cause skb shard info's alignment access,
as seen below:

BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
 __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
 arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
 arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
 __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
 skb_clone+0xf4/0x214 net/core/skbuff.c:1481
 ____bpf_clone_redirect net/core/filter.c:2433 [inline]
 bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
 bpf_prog_d3839dd9068ceb51+0x80/0x330
 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
 bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
 __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
 __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

allocated by task 15074 on cpu 0 at 1342.585390s:
 kmalloc include/linux/slab.h:568 [inline]
 kzalloc include/linux/slab.h:675 [inline]
 bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
 __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
 __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we round up allocations with kmalloc_size_roundup()
so that build_skb()'s use of kize() is always alignment and no special
handling of the memory is needed by KFENCE.

Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
---
 net/bpf/test_run.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Daniel Borkmann Nov. 1, 2022, 4:45 p.m. UTC | #1
[ +kfence folks ]

On 11/1/22 5:04 AM, Baisong Zhong wrote:
> Recently, we got a syzkaller problem because of aarch64
> alignment fault if KFENCE enabled.
> 
> When the size from user bpf program is an odd number, like
> 399, 407, etc, it will cause skb shard info's alignment access,
> as seen below:
> 
> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> 
> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
>   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
>   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
>   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
>   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
>   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
>   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
>   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
>   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
>   bpf_prog_d3839dd9068ceb51+0x80/0x330
>   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
>   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
>   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> 
> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
> 
> allocated by task 15074 on cpu 0 at 1342.585390s:
>   kmalloc include/linux/slab.h:568 [inline]
>   kzalloc include/linux/slab.h:675 [inline]
>   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
>   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
> 
> To fix the problem, we round up allocations with kmalloc_size_roundup()
> so that build_skb()'s use of kize() is always alignment and no special
> handling of the memory is needed by KFENCE.
> 
> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
> ---
>   net/bpf/test_run.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 13d578ce2a09..058b67108873 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
>   	if (user_size > size)
>   		return ERR_PTR(-EMSGSIZE);
>   
> +	size = kmalloc_size_roundup(size);
>   	data = kzalloc(size + headroom + tailroom, GFP_USER);

The fact that you need to do this roundup on call sites feels broken, no?
Was there some discussion / consensus that now all k*alloc() call sites
would need to be fixed up? Couldn't this be done transparently in k*alloc()
when KFENCE is enabled? I presume there may be lots of other such occasions
in the kernel where similar issue triggers, fixing up all call-sites feels
like ton of churn compared to api-internal, generic fix.

>   	if (!data)
>   		return ERR_PTR(-ENOMEM);
> 

Thanks,
Daniel
Baisong Zhong Nov. 2, 2022, 2:59 a.m. UTC | #2
On 2022/11/2 0:45, Daniel Borkmann wrote:
> [ +kfence folks ]

+ cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov

Do you have any suggestions about this problem?

Thanks,

.

> 
> On 11/1/22 5:04 AM, Baisong Zhong wrote:
>> Recently, we got a syzkaller problem because of aarch64
>> alignment fault if KFENCE enabled.
>>
>> When the size from user bpf program is an odd number, like
>> 399, 407, etc, it will cause skb shard info's alignment access,
>> as seen below:
>>
>> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 
>> net/core/skbuff.c:1032
>>
>> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
>>   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
>>   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
>>   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
>>   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
>>   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
>>   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
>>   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
>>   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
>>   bpf_prog_d3839dd9068ceb51+0x80/0x330
>>   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
>>   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
>>   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
>>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>>
>> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, 
>> cache=kmalloc-512
>>
>> allocated by task 15074 on cpu 0 at 1342.585390s:
>>   kmalloc include/linux/slab.h:568 [inline]
>>   kzalloc include/linux/slab.h:675 [inline]
>>   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
>>   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
>>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>>   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
>>
>> To fix the problem, we round up allocations with kmalloc_size_roundup()
>> so that build_skb()'s use of kize() is always alignment and no special
>> handling of the memory is needed by KFENCE.
>>
>> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
>> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
>> ---
>>   net/bpf/test_run.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>> index 13d578ce2a09..058b67108873 100644
>> --- a/net/bpf/test_run.c
>> +++ b/net/bpf/test_run.c
>> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr 
>> *kattr, u32 user_size,
>>       if (user_size > size)
>>           return ERR_PTR(-EMSGSIZE);
>> +    size = kmalloc_size_roundup(size);
>>       data = kzalloc(size + headroom + tailroom, GFP_USER);
> 
> The fact that you need to do this roundup on call sites feels broken, no?
> Was there some discussion / consensus that now all k*alloc() call sites
> would need to be fixed up? Couldn't this be done transparently in k*alloc()
> when KFENCE is enabled? I presume there may be lots of other such occasions
> in the kernel where similar issue triggers, fixing up all call-sites feels
> like ton of churn compared to api-internal, generic fix.
> 
>>       if (!data)
>>           return ERR_PTR(-ENOMEM);
>>
> 
> Thanks,
> Daniel
>
Jakub Kicinski Nov. 2, 2022, 4:05 a.m. UTC | #3
On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote:
> On 2022/11/2 0:45, Daniel Borkmann wrote:
> > [ +kfence folks ]  
> 
> + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov
> 
> Do you have any suggestions about this problem?

+ Kees who has been sending similar patches for drivers

> > On 11/1/22 5:04 AM, Baisong Zhong wrote:  
> >> Recently, we got a syzkaller problem because of aarch64
> >> alignment fault if KFENCE enabled.
> >>
> >> When the size from user bpf program is an odd number, like
> >> 399, 407, etc, it will cause skb shard info's alignment access,
> >> as seen below:
> >>
> >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 
> >> net/core/skbuff.c:1032
> >>
> >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
> >>   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
> >>   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
> >>   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
> >>   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
> >>   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> >>   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
> >>   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
> >>   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
> >>   bpf_prog_d3839dd9068ceb51+0x80/0x330
> >>   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
> >>   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
> >>   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
> >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> >>
> >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, 
> >> cache=kmalloc-512
> >>
> >> allocated by task 15074 on cpu 0 at 1342.585390s:
> >>   kmalloc include/linux/slab.h:568 [inline]
> >>   kzalloc include/linux/slab.h:675 [inline]
> >>   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
> >>   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
> >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> >>   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
> >>
> >> To fix the problem, we round up allocations with kmalloc_size_roundup()
> >> so that build_skb()'s use of kize() is always alignment and no special
> >> handling of the memory is needed by KFENCE.
> >>
> >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
> >> ---
> >>   net/bpf/test_run.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> >> index 13d578ce2a09..058b67108873 100644
> >> --- a/net/bpf/test_run.c
> >> +++ b/net/bpf/test_run.c
> >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr 
> >> *kattr, u32 user_size,
> >>       if (user_size > size)
> >>           return ERR_PTR(-EMSGSIZE);
> >> +    size = kmalloc_size_roundup(size);
> >>       data = kzalloc(size + headroom + tailroom, GFP_USER);  
> > 
> > The fact that you need to do this roundup on call sites feels broken, no?
> > Was there some discussion / consensus that now all k*alloc() call sites
> > would need to be fixed up? Couldn't this be done transparently in k*alloc()
> > when KFENCE is enabled? I presume there may be lots of other such occasions
> > in the kernel where similar issue triggers, fixing up all call-sites feels
> > like ton of churn compared to api-internal, generic fix.
> >   
> >>       if (!data)
> >>           return ERR_PTR(-ENOMEM);
> >>  
> > 
> > Thanks,
> > Daniel
> >  
> 
>
Kees Cook Nov. 2, 2022, 4:27 a.m. UTC | #4
On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote:
> On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote:
> > On 2022/11/2 0:45, Daniel Borkmann wrote:
> > > [ +kfence folks ]  
> > 
> > + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov
> > 
> > Do you have any suggestions about this problem?
> 
> + Kees who has been sending similar patches for drivers
> 
> > > On 11/1/22 5:04 AM, Baisong Zhong wrote:  
> > >> Recently, we got a syzkaller problem because of aarch64
> > >> alignment fault if KFENCE enabled.
> > >>
> > >> When the size from user bpf program is an odd number, like
> > >> 399, 407, etc, it will cause skb shard info's alignment access,
> > >> as seen below:
> > >>
> > >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 
> > >> net/core/skbuff.c:1032
> > >>
> > >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
> > >>   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
> > >>   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
> > >>   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
> > >>   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
> > >>   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> > >>   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
> > >>   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
> > >>   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
> > >>   bpf_prog_d3839dd9068ceb51+0x80/0x330
> > >>   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
> > >>   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
> > >>   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
> > >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> > >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> > >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> > >>
> > >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, 
> > >> cache=kmalloc-512
> > >>
> > >> allocated by task 15074 on cpu 0 at 1342.585390s:
> > >>   kmalloc include/linux/slab.h:568 [inline]
> > >>   kzalloc include/linux/slab.h:675 [inline]
> > >>   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
> > >>   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
> > >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> > >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> > >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> > >>   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
> > >>
> > >> To fix the problem, we round up allocations with kmalloc_size_roundup()
> > >> so that build_skb()'s use of kize() is always alignment and no special
> > >> handling of the memory is needed by KFENCE.
> > >>
> > >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> > >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
> > >> ---
> > >>   net/bpf/test_run.c | 1 +
> > >>   1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > >> index 13d578ce2a09..058b67108873 100644
> > >> --- a/net/bpf/test_run.c
> > >> +++ b/net/bpf/test_run.c
> > >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr 
> > >> *kattr, u32 user_size,
> > >>       if (user_size > size)
> > >>           return ERR_PTR(-EMSGSIZE);
> > >> +    size = kmalloc_size_roundup(size);
> > >>       data = kzalloc(size + headroom + tailroom, GFP_USER);  
> > > 
> > > The fact that you need to do this roundup on call sites feels broken, no?
> > > Was there some discussion / consensus that now all k*alloc() call sites
> > > would need to be fixed up? Couldn't this be done transparently in k*alloc()
> > > when KFENCE is enabled? I presume there may be lots of other such occasions
> > > in the kernel where similar issue triggers, fixing up all call-sites feels
> > > like ton of churn compared to api-internal, generic fix.

I hope I answer this in more detail here:
https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/

The problem is that ksize() should never have existed in the first
place. :P Every runtime bounds checker has tripped over it, and with
the addition of the __alloc_size attribute, I had to start ripping
ksize() out: it can't be used to pretend an allocation grew in size.
Things need to either preallocate more or go through *realloc() like
everything else. Luckily, ksize() is rare.

FWIW, the above fix doesn't look correct to me -- I would expect this to
be:

	size_t alloc_size;
	...
	alloc_size = kmalloc_size_roundup(size + headroom + tailroom);
	data = kzalloc(alloc_size, GFP_USER);
Eric Dumazet Nov. 2, 2022, 4:37 a.m. UTC | #5
On Tue, Nov 1, 2022 at 9:27 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote:
> > On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote:
> > > On 2022/11/2 0:45, Daniel Borkmann wrote:
> > > > [ +kfence folks ]
> > >
> > > + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov
> > >
> > > Do you have any suggestions about this problem?
> >
> > + Kees who has been sending similar patches for drivers
> >
> > > > On 11/1/22 5:04 AM, Baisong Zhong wrote:
> > > >> Recently, we got a syzkaller problem because of aarch64
> > > >> alignment fault if KFENCE enabled.
> > > >>
> > > >> When the size from user bpf program is an odd number, like
> > > >> 399, 407, etc, it will cause skb shard info's alignment access,
> > > >> as seen below:
> > > >>
> > > >> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0
> > > >> net/core/skbuff.c:1032
> > > >>
> > > >> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
> > > >>   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
> > > >>   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
> > > >>   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
> > > >>   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
> > > >>   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> > > >>   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
> > > >>   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
> > > >>   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
> > > >>   bpf_prog_d3839dd9068ceb51+0x80/0x330
> > > >>   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
> > > >>   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
> > > >>   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
> > > >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> > > >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> > > >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> > > >>
> > > >> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407,
> > > >> cache=kmalloc-512
> > > >>
> > > >> allocated by task 15074 on cpu 0 at 1342.585390s:
> > > >>   kmalloc include/linux/slab.h:568 [inline]
> > > >>   kzalloc include/linux/slab.h:675 [inline]
> > > >>   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
> > > >>   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
> > > >>   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
> > > >>   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
> > > >>   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
> > > >>   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
> > > >>
> > > >> To fix the problem, we round up allocations with kmalloc_size_roundup()
> > > >> so that build_skb()'s use of kize() is always alignment and no special
> > > >> handling of the memory is needed by KFENCE.
> > > >>
> > > >> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> > > >> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
> > > >> ---
> > > >>   net/bpf/test_run.c | 1 +
> > > >>   1 file changed, 1 insertion(+)
> > > >>
> > > >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > > >> index 13d578ce2a09..058b67108873 100644
> > > >> --- a/net/bpf/test_run.c
> > > >> +++ b/net/bpf/test_run.c
> > > >> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr
> > > >> *kattr, u32 user_size,
> > > >>       if (user_size > size)
> > > >>           return ERR_PTR(-EMSGSIZE);
> > > >> +    size = kmalloc_size_roundup(size);
> > > >>       data = kzalloc(size + headroom + tailroom, GFP_USER);
> > > >
> > > > The fact that you need to do this roundup on call sites feels broken, no?
> > > > Was there some discussion / consensus that now all k*alloc() call sites
> > > > would need to be fixed up? Couldn't this be done transparently in k*alloc()
> > > > when KFENCE is enabled? I presume there may be lots of other such occasions
> > > > in the kernel where similar issue triggers, fixing up all call-sites feels
> > > > like ton of churn compared to api-internal, generic fix.
>
> I hope I answer this in more detail here:
> https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/
>
> The problem is that ksize() should never have existed in the first
> place. :P Every runtime bounds checker has tripped over it, and with
> the addition of the __alloc_size attribute, I had to start ripping
> ksize() out: it can't be used to pretend an allocation grew in size.
> Things need to either preallocate more or go through *realloc() like
> everything else. Luckily, ksize() is rare.
>
> FWIW, the above fix doesn't look correct to me -- I would expect this to
> be:
>
>         size_t alloc_size;
>         ...
>         alloc_size = kmalloc_size_roundup(size + headroom + tailroom);
>         data = kzalloc(alloc_size, GFP_USER);

Making sure the struct skb_shared_info is aligned to a cache line does
not need kmalloc_size_roundup().

What is needed is to adjust @size so that (@size + @headroom) is a
multiple of SMP_CACHE_BYTES
Baisong Zhong Nov. 2, 2022, 7:19 a.m. UTC | #6
On 2022/11/2 12:37, Eric Dumazet wrote:
> On Tue, Nov 1, 2022 at 9:27 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote:
>>> On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote:
>>>> On 2022/11/2 0:45, Daniel Borkmann wrote:
>>>>> [ +kfence folks ]
>>>>
>>>> + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov
>>>>
>>>> Do you have any suggestions about this problem?
>>>
>>> + Kees who has been sending similar patches for drivers
>>>
>>>>> On 11/1/22 5:04 AM, Baisong Zhong wrote:
>>>>>> Recently, we got a syzkaller problem because of aarch64
>>>>>> alignment fault if KFENCE enabled.
>>>>>>
>>>>>> When the size from user bpf program is an odd number, like
>>>>>> 399, 407, etc, it will cause skb shard info's alignment access,
>>>>>> as seen below:
>>>>>>
>>>>>> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0
>>>>>> net/core/skbuff.c:1032
>>>>>>
>>>>>> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
>>>>>>    __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
>>>>>>    arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
>>>>>>    arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
>>>>>>    atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
>>>>>>    __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
>>>>>>    skb_clone+0xf4/0x214 net/core/skbuff.c:1481
>>>>>>    ____bpf_clone_redirect net/core/filter.c:2433 [inline]
>>>>>>    bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
>>>>>>    bpf_prog_d3839dd9068ceb51+0x80/0x330
>>>>>>    bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
>>>>>>    bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
>>>>>>    bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
>>>>>>    bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>>>>>>    __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>>>>>>    __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>>>>>>
>>>>>> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407,
>>>>>> cache=kmalloc-512
>>>>>>
>>>>>> allocated by task 15074 on cpu 0 at 1342.585390s:
>>>>>>    kmalloc include/linux/slab.h:568 [inline]
>>>>>>    kzalloc include/linux/slab.h:675 [inline]
>>>>>>    bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
>>>>>>    bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
>>>>>>    bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>>>>>>    __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>>>>>>    __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>>>>>>    __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
>>>>>>
>>>>>> To fix the problem, we round up allocations with kmalloc_size_roundup()
>>>>>> so that build_skb()'s use of kize() is always alignment and no special
>>>>>> handling of the memory is needed by KFENCE.
>>>>>>
>>>>>> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
>>>>>> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
>>>>>> ---
>>>>>>    net/bpf/test_run.c | 1 +
>>>>>>    1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>>>>> index 13d578ce2a09..058b67108873 100644
>>>>>> --- a/net/bpf/test_run.c
>>>>>> +++ b/net/bpf/test_run.c
>>>>>> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr
>>>>>> *kattr, u32 user_size,
>>>>>>        if (user_size > size)
>>>>>>            return ERR_PTR(-EMSGSIZE);
>>>>>> +    size = kmalloc_size_roundup(size);
>>>>>>        data = kzalloc(size + headroom + tailroom, GFP_USER);
>>>>>
>>>>> The fact that you need to do this roundup on call sites feels broken, no?
>>>>> Was there some discussion / consensus that now all k*alloc() call sites
>>>>> would need to be fixed up? Couldn't this be done transparently in k*alloc()
>>>>> when KFENCE is enabled? I presume there may be lots of other such occasions
>>>>> in the kernel where similar issue triggers, fixing up all call-sites feels
>>>>> like ton of churn compared to api-internal, generic fix.
>>
>> I hope I answer this in more detail here:
>> https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/
>>
>> The problem is that ksize() should never have existed in the first
>> place. :P Every runtime bounds checker has tripped over it, and with
>> the addition of the __alloc_size attribute, I had to start ripping
>> ksize() out: it can't be used to pretend an allocation grew in size.
>> Things need to either preallocate more or go through *realloc() like
>> everything else. Luckily, ksize() is rare.
>>
>> FWIW, the above fix doesn't look correct to me -- I would expect this to
>> be:
>>
>>          size_t alloc_size;
>>          ...
>>          alloc_size = kmalloc_size_roundup(size + headroom + tailroom);
>>          data = kzalloc(alloc_size, GFP_USER);
> 
> Making sure the struct skb_shared_info is aligned to a cache line does
> not need kmalloc_size_roundup().
> 
> What is needed is to adjust @size so that (@size + @headroom) is a
> multiple of SMP_CACHE_BYTES

ok, I'll fix it and send v2.

Thanks

.
diff mbox series

Patch

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..058b67108873 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -774,6 +774,7 @@  static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
 	if (user_size > size)
 		return ERR_PTR(-EMSGSIZE);
 
+	size = kmalloc_size_roundup(size);
 	data = kzalloc(size + headroom + tailroom, GFP_USER);
 	if (!data)
 		return ERR_PTR(-ENOMEM);