diff mbox series

[-next,v2] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb()

Message ID 20221102081620.1465154-1-zhongbaisong@huawei.com (mailing list archive)
State Accepted
Commit d3fd203f36d46aa29600a72d57a1b61af80e4a25
Delegated to: BPF
Headers show
Series [-next,v2] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on x86_64 with llvm-16
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 35 this patch: 35
netdev/cc_maintainers fail 1 blamed authors not CCed: martin.lau@linux.dev; 9 maintainers not CCed: sdf@google.com john.fastabend@gmail.com andrii@kernel.org yhs@fb.com haoluo@google.com jolsa@kernel.org kpsingh@kernel.org song@kernel.org martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 35 this patch: 35
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 pending Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on s390x with gcc

Commit Message

Baisong Zhong Nov. 2, 2022, 8:16 a.m. UTC
we got a syzkaller problem because of aarch64 alignment fault
if KFENCE enabled.

When the size from user bpf program is an odd number, like
399, 407, etc, it will cause the struct skb_shared_info's
unaligned access. As seen below:

BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
 __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
 arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
 arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
 __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
 skb_clone+0xf4/0x214 net/core/skbuff.c:1481
 ____bpf_clone_redirect net/core/filter.c:2433 [inline]
 bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
 bpf_prog_d3839dd9068ceb51+0x80/0x330
 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
 bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
 __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
 __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

allocated by task 15074 on cpu 0 at 1342.585390s:
 kmalloc include/linux/slab.h:568 [inline]
 kzalloc include/linux/slab.h:675 [inline]
 bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
 __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
 __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we adjust @size so that (@size + @hearoom) is a
multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
is aligned to a cache line.

Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
---
v2: use SKB_DATA_ALIGN instead kmalloc_size_roundup
---
 net/bpf/test_run.c | 1 +
 1 file changed, 1 insertion(+)

Comments

patchwork-bot+netdevbpf@kernel.org Nov. 4, 2022, 3:30 p.m. UTC | #1
Hello:

This patch was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Wed, 2 Nov 2022 16:16:20 +0800 you wrote:
> we got a syzkaller problem because of aarch64 alignment fault
> if KFENCE enabled.
> 
> When the size from user bpf program is an odd number, like
> 399, 407, etc, it will cause the struct skb_shared_info's
> unaligned access. As seen below:
> 
> [...]

Here is the summary with links:
  - [-next,v2] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb()
    https://git.kernel.org/bpf/bpf/c/d3fd203f36d4

You are awesome, thank you!
Alexander Potapenko Nov. 4, 2022, 5:06 p.m. UTC | #2
On Wed, Nov 2, 2022 at 9:16 AM Baisong Zhong <zhongbaisong@huawei.com> wrote:
>
> we got a syzkaller problem because of aarch64 alignment fault
> if KFENCE enabled.
>
> When the size from user bpf program is an odd number, like
> 399, 407, etc, it will cause the struct skb_shared_info's
> unaligned access. As seen below:
>
> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

It's interesting that KFENCE is reporting a UAF without a deallocation
stack here.

Looks like an unaligned access to 0xffff6254fffac077 causes the ARM
CPU to throw a fault handled by __do_kernel_fault()
This isn't technically a page fault, but anyway the access address
gets passed to kfence_handle_page_fault(), which defaults to a
use-after-free, because the address belongs to the object page, not
the redzone page.

Catalin, Mark, what is the right way to only handle traps caused by
reading/writing to a page for which `set_memory_valid(addr, 1, 0)` was
called?

> Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
>  __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
>  arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
>  arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
>  atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
>  __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
>  skb_clone+0xf4/0x214 net/core/skbuff.c:1481
>  ____bpf_clone_redirect net/core/filter.c:2433 [inline]
>  bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
>  bpf_prog_d3839dd9068ceb51+0x80/0x330
>  bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
>  bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
>  bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
>  bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>  __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>  __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>
> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
>
> allocated by task 15074 on cpu 0 at 1342.585390s:
>  kmalloc include/linux/slab.h:568 [inline]
>  kzalloc include/linux/slab.h:675 [inline]
>  bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
>  bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
>  bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
>  __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
>  __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
>  __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
>
> To fix the problem, we adjust @size so that (@size + @hearoom) is a
> multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
> is aligned to a cache line.
>
> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
> ---
> v2: use SKB_DATA_ALIGN instead kmalloc_size_roundup
> ---
>  net/bpf/test_run.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 4b855af267b1..bfdd7484b93f 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -259,6 +259,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
>         if (user_size > size)
>                 return ERR_PTR(-EMSGSIZE);
>
> +       size = SKB_DATA_ALIGN(size);
>         data = kzalloc(size + headroom + tailroom, GFP_USER);
>         if (!data)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.25.1
>


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Mark Rutland Nov. 7, 2022, 10:33 a.m. UTC | #3
On Fri, Nov 04, 2022 at 06:06:05PM +0100, Alexander Potapenko wrote:
> On Wed, Nov 2, 2022 at 9:16 AM Baisong Zhong <zhongbaisong@huawei.com> wrote:
> >
> > we got a syzkaller problem because of aarch64 alignment fault
> > if KFENCE enabled.
> >
> > When the size from user bpf program is an odd number, like
> > 399, 407, etc, it will cause the struct skb_shared_info's
> > unaligned access. As seen below:
> >
> > BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> 
> It's interesting that KFENCE is reporting a UAF without a deallocation
> stack here.
> 
> Looks like an unaligned access to 0xffff6254fffac077 causes the ARM
> CPU to throw a fault handled by __do_kernel_fault()

Importantly, an unaligned *atomic*, which is a bug regardless of KFENCE.

> This isn't technically a page fault, but anyway the access address
> gets passed to kfence_handle_page_fault(), which defaults to a
> use-after-free, because the address belongs to the object page, not
> the redzone page.
> 
> Catalin, Mark, what is the right way to only handle traps caused by
> reading/writing to a page for which `set_memory_valid(addr, 1, 0)` was
> called?

That should appear as a translation fault, so we could add an
is_el1_translation_fault() helper for that. I can't immediately recall how
misaligned atomics are presented, but I presume as something other than a
translation fault.

If the below works for you, I can go spin that as a real patch.

Mark.

---->8----
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5b391490e045b..1de4b6afa8515 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -239,6 +239,11 @@ static bool is_el1_data_abort(unsigned long esr)
        return ESR_ELx_EC(esr) == ESR_ELx_EC_DABT_CUR;
 }
 
+static bool is_el1_translation_fault(unsigned long esr)
+{
+       return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_FAULT;
+}
+
 static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr,
                                           struct pt_regs *regs)
 {
@@ -385,7 +390,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
        } else if (addr < PAGE_SIZE) {
                msg = "NULL pointer dereference";
        } else {
-               if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
+               if (is_el1_translation_fault(esr) &&
+                   kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
                        return;
 
                msg = "paging request";
Alexander Potapenko Nov. 7, 2022, 1:17 p.m. UTC | #4
On Mon, Nov 7, 2022 at 11:33 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, Nov 04, 2022 at 06:06:05PM +0100, Alexander Potapenko wrote:
> > On Wed, Nov 2, 2022 at 9:16 AM Baisong Zhong <zhongbaisong@huawei.com> wrote:
> > >
> > > we got a syzkaller problem because of aarch64 alignment fault
> > > if KFENCE enabled.
> > >
> > > When the size from user bpf program is an odd number, like
> > > 399, 407, etc, it will cause the struct skb_shared_info's
> > > unaligned access. As seen below:
> > >
> > > BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
> >
> > It's interesting that KFENCE is reporting a UAF without a deallocation
> > stack here.
> >
> > Looks like an unaligned access to 0xffff6254fffac077 causes the ARM
> > CPU to throw a fault handled by __do_kernel_fault()
>
> Importantly, an unaligned *atomic*, which is a bug regardless of KFENCE.
>
> > This isn't technically a page fault, but anyway the access address
> > gets passed to kfence_handle_page_fault(), which defaults to a
> > use-after-free, because the address belongs to the object page, not
> > the redzone page.
> >
> > Catalin, Mark, what is the right way to only handle traps caused by
> > reading/writing to a page for which `set_memory_valid(addr, 1, 0)` was
> > called?
>
> That should appear as a translation fault, so we could add an
> is_el1_translation_fault() helper for that. I can't immediately recall how
> misaligned atomics are presented, but I presume as something other than a
> translation fault.
>
> If the below works for you, I can go spin that as a real patch.

Thanks!
It works for me in QEMU (doesn't report UAF for an unaligned atomic
access and doesn't break the original KFENCE tests), and matches my
reading of https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-

Feel free to add:
  Reviewed-by: Alexander Potapenko <glider@google.com>
  Tested-by: Alexander Potapenko <glider@google.com>

> Mark.
>
> ---->8----
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 5b391490e045b..1de4b6afa8515 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -239,6 +239,11 @@ static bool is_el1_data_abort(unsigned long esr)
>         return ESR_ELx_EC(esr) == ESR_ELx_EC_DABT_CUR;
>  }
>
> +static bool is_el1_translation_fault(unsigned long esr)
> +{
> +       return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_FAULT;

Should we also introduce ESR_ELx_FSC(esr) for this?

> +}
> +
>  static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr,
>                                            struct pt_regs *regs)
>  {
> @@ -385,7 +390,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
>         } else if (addr < PAGE_SIZE) {
>                 msg = "NULL pointer dereference";
>         } else {
> -               if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
> +               if (is_el1_translation_fault(esr) &&
> +                   kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
>                         return;
>
>                 msg = "paging request";
--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
diff mbox series

Patch

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 4b855af267b1..bfdd7484b93f 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -259,6 +259,7 @@  static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
 	if (user_size > size)
 		return ERR_PTR(-EMSGSIZE);
 
+	size = SKB_DATA_ALIGN(size);
 	data = kzalloc(size + headroom + tailroom, GFP_USER);
 	if (!data)
 		return ERR_PTR(-ENOMEM);