[bpf] bpf: refcount task stack in bpf_get_task_stack

Message ID	20210401000747.3648767-1-davemarchevsky@fb.com (mailing list archive)
State	Accepted
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Dave Marchevsky <davemarchevsky@fb.com> To: <bpf@vger.kernel.org> CC: <kernel-team@fb.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Song Liu <songliubraving@fb.com>, Dave Marchevsky <davemarchevsky@fb.com> Subject: [PATCH bpf] bpf: refcount task stack in bpf_get_task_stack Date: Wed, 31 Mar 2021 17:07:47 -0700 Message-ID: <20210401000747.3648767-1-davemarchevsky@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Precedence: bulk
Series	[bpf] bpf: refcount task stack in bpf_get_task_stack \| expand [bpf] bpf: refcount task stack in bpf_get_task_stack

Message ID

20210401000747.3648767-1-davemarchevsky@fb.com (mailing list archive)

State

Accepted

Delegated to:

BPF

Headers

From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: <kernel-team@fb.com>, Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Song Liu <songliubraving@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH bpf] bpf: refcount task stack in bpf_get_task_stack
Date: Wed, 31 Mar 2021 17:07:47 -0700
Message-ID: <20210401000747.3648767-1-davemarchevsky@fb.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
Precedence: bulk

Series

[bpf] bpf: refcount task stack in bpf_get_task_stack | expand

Checks

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf
netdev/subject_prefix	success	Link
netdev/cc_maintainers	fail	1 blamed authors not CCed: andrii@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com john.fastabend@gmail.com
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 1 this patch: 1
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1 this patch: 1
netdev/header_inline	success	Link

Context

Check

Description

netdev/cover_letter

success

Link

netdev/fixes_present

success

Link

netdev/patch_count

success

Link

netdev/tree_selection

success

Clearly marked for bpf

netdev/subject_prefix

success

Link

netdev/cc_maintainers

fail

1 blamed authors not CCed: andrii@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com john.fastabend@gmail.com

netdev/source_inline

success

Was 0 now: 0

netdev/verify_signedoff

success

Link

netdev/module_param

success

Was 0 now: 0

netdev/build_32bit

success

Errors and warnings before: 1 this patch: 1

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/verify_fixes

success

Link

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 19 lines checked

netdev/build_allmodconfig_warn

success

Errors and warnings before: 1 this patch: 1

netdev/header_inline

success

Link

Commit Message

Dave Marchevsky April 1, 2021, 12:07 a.m. UTC

On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
offset of task->stack. The pt_regs are later dereferenced in
__bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
the task in question exits while bpf_get_task_stack is executing, as
warned by task_stack_page's comment:

* When accessing the stack of a non-current task that might exit, use
* try_get_task_stack() instead.  task_stack_page will return a pointer
* that could get freed out from under you.

Taking the comment's advice and using try_get_task_stack() and
put_task_stack() to hold task->stack refcount, or bail early if it's
already 0. Incrementing stack_refcount will ensure the task's stack
sticks around while we're using its data.

I noticed this bug while testing a bpf task iter similar to
bpf_iter_task_stack in selftests, except mine grabbed user stack, and
getting intermittent crashes, which resulted in dumps like:

  BUG: unable to handle page fault for address: 0000000000003fe0
  \#PF: supervisor read access in kernel mode
  \#PF: error_code(0x0000) - not-present page
  RIP: 0010:__bpf_get_stack+0xd0/0x230
  <snip...>
  Call Trace:
  bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
  bpf_iter_run_prog+0x24/0x81
  __task_seq_show+0x58/0x80
  bpf_seq_read+0xf7/0x3d0
  vfs_read+0x91/0x140
  ksys_read+0x59/0xd0
  do_syscall_64+0x48/0x120
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/stackmap.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Song Liu April 1, 2021, 6:48 a.m. UTC | #1

> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote:
> 
> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
> offset of task->stack. The pt_regs are later dereferenced in
> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
> the task in question exits while bpf_get_task_stack is executing, as
> warned by task_stack_page's comment:
> 
> * When accessing the stack of a non-current task that might exit, use
> * try_get_task_stack() instead.  task_stack_page will return a pointer
> * that could get freed out from under you.
> 
> Taking the comment's advice and using try_get_task_stack() and
> put_task_stack() to hold task->stack refcount, or bail early if it's
> already 0. Incrementing stack_refcount will ensure the task's stack
> sticks around while we're using its data.
> 
> I noticed this bug while testing a bpf task iter similar to
> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
> getting intermittent crashes, which resulted in dumps like:
> 
>  BUG: unable to handle page fault for address: 0000000000003fe0
>  \#PF: supervisor read access in kernel mode
>  \#PF: error_code(0x0000) - not-present page
>  RIP: 0010:__bpf_get_stack+0xd0/0x230
>  <snip...>
>  Call Trace:
>  bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>  bpf_iter_run_prog+0x24/0x81
>  __task_seq_show+0x58/0x80
>  bpf_seq_read+0xf7/0x3d0
>  vfs_read+0x91/0x140
>  ksys_read+0x59/0xd0
>  do_syscall_64+0x48/0x120
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Thanks for the fix!

Acked-by: Song Liu <songliubraving@fb.com>

Could you please extend bpf_iter_task_stack to also grab user stack? 

Thanks,
Song

[...]

Song Liu April 1, 2021, 5:47 p.m. UTC | #2

> On Mar 31, 2021, at 11:48 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote:
>> 
>> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
>> offset of task->stack. The pt_regs are later dereferenced in
>> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
>> the task in question exits while bpf_get_task_stack is executing, as
>> warned by task_stack_page's comment:
>> 
>> * When accessing the stack of a non-current task that might exit, use
>> * try_get_task_stack() instead.  task_stack_page will return a pointer
>> * that could get freed out from under you.
>> 
>> Taking the comment's advice and using try_get_task_stack() and
>> put_task_stack() to hold task->stack refcount, or bail early if it's
>> already 0. Incrementing stack_refcount will ensure the task's stack
>> sticks around while we're using its data.
>> 
>> I noticed this bug while testing a bpf task iter similar to
>> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
>> getting intermittent crashes, which resulted in dumps like:
>> 
>> BUG: unable to handle page fault for address: 0000000000003fe0
>> \#PF: supervisor read access in kernel mode
>> \#PF: error_code(0x0000) - not-present page
>> RIP: 0010:__bpf_get_stack+0xd0/0x230
>> <snip...>
>> Call Trace:
>> bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>> bpf_iter_run_prog+0x24/0x81
>> __task_seq_show+0x58/0x80
>> bpf_seq_read+0xf7/0x3d0
>> vfs_read+0x91/0x140
>> ksys_read+0x59/0xd0
>> do_syscall_64+0x48/0x120
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> 
>> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> 
> Thanks for the fix!
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> 
> Could you please extend bpf_iter_task_stack to also grab user stack? 

I think we can extend bpf_iter_task_stack in a follow up patch. It is
not necessary to bundle these two patches in the same set. 

Thanks,
Song

Alexei Starovoitov April 1, 2021, 9 p.m. UTC | #3

On Wed, Mar 31, 2021 at 5:08 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
> offset of task->stack. The pt_regs are later dereferenced in
> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
> the task in question exits while bpf_get_task_stack is executing, as
> warned by task_stack_page's comment:
>
> * When accessing the stack of a non-current task that might exit, use
> * try_get_task_stack() instead.  task_stack_page will return a pointer
> * that could get freed out from under you.
>
> Taking the comment's advice and using try_get_task_stack() and
> put_task_stack() to hold task->stack refcount, or bail early if it's
> already 0. Incrementing stack_refcount will ensure the task's stack
> sticks around while we're using its data.
>
> I noticed this bug while testing a bpf task iter similar to
> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
> getting intermittent crashes, which resulted in dumps like:
>
>   BUG: unable to handle page fault for address: 0000000000003fe0
>   \#PF: supervisor read access in kernel mode
>   \#PF: error_code(0x0000) - not-present page
>   RIP: 0010:__bpf_get_stack+0xd0/0x230
>   <snip...>
>   Call Trace:
>   bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>   bpf_iter_run_prog+0x24/0x81
>   __task_seq_show+0x58/0x80
>   bpf_seq_read+0xf7/0x3d0
>   vfs_read+0x91/0x140
>   ksys_read+0x59/0xd0
>   do_syscall_64+0x48/0x120
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Applied. Thanks

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index be35bfb7fb13..6fbc2abe9c91 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -517,9 +517,17 @@  const struct bpf_func_proto bpf_get_stack_proto = {
 BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
 	   u32, size, u64, flags)
 {
-	struct pt_regs *regs = task_pt_regs(task);
+	struct pt_regs *regs;
+	long res;
 
-	return __bpf_get_stack(regs, task, NULL, buf, size, flags);
+	if (!try_get_task_stack(task))
+		return -EFAULT;
+
+	regs = task_pt_regs(task);
+	res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
+	put_task_stack(task);
+
+	return res;
 }
 
 BTF_ID_LIST_SINGLE(bpf_get_task_stack_btf_ids, struct, task_struct)