diff mbox series

[bpf] bpf: refcount task stack in bpf_get_task_stack

Message ID 20210401000747.3648767-1-davemarchevsky@fb.com (mailing list archive)
State Accepted
Delegated to: BPF
Headers show
Series [bpf] bpf: refcount task stack in bpf_get_task_stack | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: andrii@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com john.fastabend@gmail.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link

Commit Message

Dave Marchevsky April 1, 2021, 12:07 a.m. UTC
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
offset of task->stack. The pt_regs are later dereferenced in
__bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
the task in question exits while bpf_get_task_stack is executing, as
warned by task_stack_page's comment:

* When accessing the stack of a non-current task that might exit, use
* try_get_task_stack() instead.  task_stack_page will return a pointer
* that could get freed out from under you.

Taking the comment's advice and using try_get_task_stack() and
put_task_stack() to hold task->stack refcount, or bail early if it's
already 0. Incrementing stack_refcount will ensure the task's stack
sticks around while we're using its data.

I noticed this bug while testing a bpf task iter similar to
bpf_iter_task_stack in selftests, except mine grabbed user stack, and
getting intermittent crashes, which resulted in dumps like:

  BUG: unable to handle page fault for address: 0000000000003fe0
  \#PF: supervisor read access in kernel mode
  \#PF: error_code(0x0000) - not-present page
  RIP: 0010:__bpf_get_stack+0xd0/0x230
  <snip...>
  Call Trace:
  bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
  bpf_iter_run_prog+0x24/0x81
  __task_seq_show+0x58/0x80
  bpf_seq_read+0xf7/0x3d0
  vfs_read+0x91/0x140
  ksys_read+0x59/0xd0
  do_syscall_64+0x48/0x120
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/stackmap.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Song Liu April 1, 2021, 6:48 a.m. UTC | #1
> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote:
> 
> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
> offset of task->stack. The pt_regs are later dereferenced in
> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
> the task in question exits while bpf_get_task_stack is executing, as
> warned by task_stack_page's comment:
> 
> * When accessing the stack of a non-current task that might exit, use
> * try_get_task_stack() instead.  task_stack_page will return a pointer
> * that could get freed out from under you.
> 
> Taking the comment's advice and using try_get_task_stack() and
> put_task_stack() to hold task->stack refcount, or bail early if it's
> already 0. Incrementing stack_refcount will ensure the task's stack
> sticks around while we're using its data.
> 
> I noticed this bug while testing a bpf task iter similar to
> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
> getting intermittent crashes, which resulted in dumps like:
> 
>  BUG: unable to handle page fault for address: 0000000000003fe0
>  \#PF: supervisor read access in kernel mode
>  \#PF: error_code(0x0000) - not-present page
>  RIP: 0010:__bpf_get_stack+0xd0/0x230
>  <snip...>
>  Call Trace:
>  bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>  bpf_iter_run_prog+0x24/0x81
>  __task_seq_show+0x58/0x80
>  bpf_seq_read+0xf7/0x3d0
>  vfs_read+0x91/0x140
>  ksys_read+0x59/0xd0
>  do_syscall_64+0x48/0x120
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Thanks for the fix!

Acked-by: Song Liu <songliubraving@fb.com>

Could you please extend bpf_iter_task_stack to also grab user stack? 

Thanks,
Song

[...]
Song Liu April 1, 2021, 5:47 p.m. UTC | #2
> On Mar 31, 2021, at 11:48 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote:
>> 
>> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
>> offset of task->stack. The pt_regs are later dereferenced in
>> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
>> the task in question exits while bpf_get_task_stack is executing, as
>> warned by task_stack_page's comment:
>> 
>> * When accessing the stack of a non-current task that might exit, use
>> * try_get_task_stack() instead.  task_stack_page will return a pointer
>> * that could get freed out from under you.
>> 
>> Taking the comment's advice and using try_get_task_stack() and
>> put_task_stack() to hold task->stack refcount, or bail early if it's
>> already 0. Incrementing stack_refcount will ensure the task's stack
>> sticks around while we're using its data.
>> 
>> I noticed this bug while testing a bpf task iter similar to
>> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
>> getting intermittent crashes, which resulted in dumps like:
>> 
>> BUG: unable to handle page fault for address: 0000000000003fe0
>> \#PF: supervisor read access in kernel mode
>> \#PF: error_code(0x0000) - not-present page
>> RIP: 0010:__bpf_get_stack+0xd0/0x230
>> <snip...>
>> Call Trace:
>> bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>> bpf_iter_run_prog+0x24/0x81
>> __task_seq_show+0x58/0x80
>> bpf_seq_read+0xf7/0x3d0
>> vfs_read+0x91/0x140
>> ksys_read+0x59/0xd0
>> do_syscall_64+0x48/0x120
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> 
>> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> 
> Thanks for the fix!
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> 
> Could you please extend bpf_iter_task_stack to also grab user stack? 

I think we can extend bpf_iter_task_stack in a follow up patch. It is
not necessary to bundle these two patches in the same set. 

Thanks,
Song
Alexei Starovoitov April 1, 2021, 9 p.m. UTC | #3
On Wed, Mar 31, 2021 at 5:08 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an
> offset of task->stack. The pt_regs are later dereferenced in
> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if
> the task in question exits while bpf_get_task_stack is executing, as
> warned by task_stack_page's comment:
>
> * When accessing the stack of a non-current task that might exit, use
> * try_get_task_stack() instead.  task_stack_page will return a pointer
> * that could get freed out from under you.
>
> Taking the comment's advice and using try_get_task_stack() and
> put_task_stack() to hold task->stack refcount, or bail early if it's
> already 0. Incrementing stack_refcount will ensure the task's stack
> sticks around while we're using its data.
>
> I noticed this bug while testing a bpf task iter similar to
> bpf_iter_task_stack in selftests, except mine grabbed user stack, and
> getting intermittent crashes, which resulted in dumps like:
>
>   BUG: unable to handle page fault for address: 0000000000003fe0
>   \#PF: supervisor read access in kernel mode
>   \#PF: error_code(0x0000) - not-present page
>   RIP: 0010:__bpf_get_stack+0xd0/0x230
>   <snip...>
>   Call Trace:
>   bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec
>   bpf_iter_run_prog+0x24/0x81
>   __task_seq_show+0x58/0x80
>   bpf_seq_read+0xf7/0x3d0
>   vfs_read+0x91/0x140
>   ksys_read+0x59/0xd0
>   do_syscall_64+0x48/0x120
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Applied. Thanks
diff mbox series

Patch

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index be35bfb7fb13..6fbc2abe9c91 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -517,9 +517,17 @@  const struct bpf_func_proto bpf_get_stack_proto = {
 BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
 	   u32, size, u64, flags)
 {
-	struct pt_regs *regs = task_pt_regs(task);
+	struct pt_regs *regs;
+	long res;
 
-	return __bpf_get_stack(regs, task, NULL, buf, size, flags);
+	if (!try_get_task_stack(task))
+		return -EFAULT;
+
+	regs = task_pt_regs(task);
+	res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
+	put_task_stack(task);
+
+	return res;
 }
 
 BTF_ID_LIST_SINGLE(bpf_get_task_stack_btf_ids, struct, task_struct)