diff mbox series

[bpf-next] bpf: clear per_cpu pointers in bpf_prog_clone_create()

Message ID 20210217035844.53746-1-xiyou.wangcong@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] bpf: clear per_cpu pointers in bpf_prog_clone_create() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: daniel@iogearbox.net andrii@kernel.org; 7 maintainers not CCed: daniel@iogearbox.net andrii@kernel.org yhs@fb.com john.fastabend@gmail.com kpsingh@kernel.org songliubraving@fb.com kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Cong Wang Feb. 17, 2021, 3:58 a.m. UTC
From: Cong Wang <cong.wang@bytedance.com>

Pretty much similar to commit 1336c662474e
("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to
clear these two percpu pointers in bpf_prog_clone_create(), otherwise
would get a double free:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
  5.11.0-rc4.bm.1-amd64+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
 test_bpf: #1 TXA
 Workqueue: events bpf_prog_free_deferred
 RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
 Code: [...]
 RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
 RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
 RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
 R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
 FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
 CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
    refill_obj_stock+0x5e/0xd0
    free_percpu+0xee/0x550
    __bpf_prog_free+0x4d/0x60
    process_one_work+0x26a/0x590
    worker_thread+0x3c/0x390
    ? process_one_work+0x590/0x590
    kthread+0x130/0x150
    ? kthread_park+0x80/0x80
    ret_from_fork+0x1f/0x30

This bug is 100% reproducible with test_kmod.sh.

Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Fixes: 700d4796ef59 ("bpf: Optimize program stats")
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

John Fastabend Feb. 17, 2021, 9:33 p.m. UTC | #1
Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Pretty much similar to commit 1336c662474e
> ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to
> clear these two percpu pointers in bpf_prog_clone_create(), otherwise
> would get a double free:
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: 0000 [#1] SMP PTI
>  CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
>   5.11.0-rc4.bm.1-amd64+ #1
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>  test_bpf: #1 TXA
>  Workqueue: events bpf_prog_free_deferred
>  RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
>  Code: [...]
>  RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
>  RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
>  RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
>  RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
>  R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
>  R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
>  FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
>  CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>     refill_obj_stock+0x5e/0xd0
>     free_percpu+0xee/0x550
>     __bpf_prog_free+0x4d/0x60
>     process_one_work+0x26a/0x590
>     worker_thread+0x3c/0x390
>     ? process_one_work+0x590/0x590
>     kthread+0x130/0x150
>     ? kthread_park+0x80/0x80
>     ret_from_fork+0x1f/0x30
> 
> This bug is 100% reproducible with test_kmod.sh.
> 
> Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> Fixes: 700d4796ef59 ("bpf: Optimize program stats")
> Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>
Daniel Borkmann Feb. 17, 2021, 10:01 p.m. UTC | #2
On 2/17/21 4:58 AM, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Pretty much similar to commit 1336c662474e
> ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to
> clear these two percpu pointers in bpf_prog_clone_create(), otherwise
> would get a double free:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] SMP PTI
>   CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
>   5.11.0-rc4.bm.1-amd64+ #1
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>   test_bpf: #1 TXA
>   Workqueue: events bpf_prog_free_deferred
>   RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
>   Code: [...]
>   RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
>   RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
>   RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
>   RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
>   R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
>   R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
>   FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
>   CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>     refill_obj_stock+0x5e/0xd0
>     free_percpu+0xee/0x550
>     __bpf_prog_free+0x4d/0x60
>     process_one_work+0x26a/0x590
>     worker_thread+0x3c/0x390
>     ? process_one_work+0x590/0x590
>     kthread+0x130/0x150
>     ? kthread_park+0x80/0x80
>     ret_from_fork+0x1f/0x30
> 
> This bug is 100% reproducible with test_kmod.sh.
> 
> Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> Fixes: 700d4796ef59 ("bpf: Optimize program stats")
> Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>   kernel/bpf/core.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 0ae015ad1e05..b0c11532e535 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other,
>   		 * this still needs to be adapted.
>   		 */
>   		memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE);
> +		fp_other->stats = NULL;
> +		fp_other->active = NULL;
>   	}
>   
>   	return fp;
> 

This is not correct. I presume if you enable blinding and stats, then this will still
crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free()
since the clone can be promoted as the actual prog and the prog ptr released instead.

Thanks,
Daniel
Cong Wang Feb. 17, 2021, 10:46 p.m. UTC | #3
On Wed, Feb 17, 2021 at 2:01 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 2/17/21 4:58 AM, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Pretty much similar to commit 1336c662474e
> > ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to
> > clear these two percpu pointers in bpf_prog_clone_create(), otherwise
> > would get a double free:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> >   #PF: supervisor read access in kernel mode
> >   #PF: error_code(0x0000) - not-present page
> >   PGD 0 P4D 0
> >   Oops: 0000 [#1] SMP PTI
> >   CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G         W   OE
> >  5.11.0-rc4.bm.1-amd64+ #1
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >   test_bpf: #1 TXA
> >   Workqueue: events bpf_prog_free_deferred
> >   RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
> >   Code: [...]
> >   RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
> >   RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
> >   RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
> >   RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
> >   R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
> >   R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
> >   FS:   0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
> >   CS:   0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
> >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >   Call Trace:
> >   refill_obj_stock+0x5e/0xd0
> >   free_percpu+0xee/0x550
> >   __bpf_prog_free+0x4d/0x60
> >   process_one_work+0x26a/0x590
> >   worker_thread+0x3c/0x390
> >   ? process_one_work+0x590/0x590
> >   kthread+0x130/0x150
> >   ? kthread_park+0x80/0x80
> >   ret_from_fork+0x1f/0x30
> >
> > This bug is 100% reproducible with test_kmod.sh.
> >
> > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > Fixes: 700d4796ef59 ("bpf: Optimize program stats")
> > Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >   kernel/bpf/core.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 0ae015ad1e05..b0c11532e535 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other,
> >                * this still needs to be adapted.
> >                */
> >               memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE);
> > +             fp_other->stats = NULL;
> > +             fp_other->active = NULL;
> >       }
> >
> >       return fp;
> >
>
> This is not correct. I presume if you enable blinding and stats, then this will still

Well, at least I ran all BPF selftests and found no crash. (Before my patch, the
crash happened 100%.)

> crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free()
> since the clone can be promoted as the actual prog and the prog ptr released instead.
>

Not sure if I understand your point, but what I cleared is fp_other,
which is the original, not the clone. And of course, the original would
be overriden:

        tmp = bpf_jit_blind_constants(prog);
        if (IS_ERR(tmp))
                return orig_prog;
        if (tmp != prog) {
                tmp_blinded = true;
                prog = tmp;  // <=== HERE
        }

I think this is precisely why the crash does not happen after my patch.

However, it does seem to me patching bpf_prog_clone_free() is better,
as there would be no assumption on using the original. All I want to
say here is that both ways could fix the crash, which one is better is
arguable.

Thanks.
Daniel Borkmann Feb. 17, 2021, 11:13 p.m. UTC | #4
On 2/17/21 11:46 PM, Cong Wang wrote:
> On Wed, Feb 17, 2021 at 2:01 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 2/17/21 4:58 AM, Cong Wang wrote:
>>> From: Cong Wang <cong.wang@bytedance.com>
>>>
>>> Pretty much similar to commit 1336c662474e
>>> ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to
>>> clear these two percpu pointers in bpf_prog_clone_create(), otherwise
>>> would get a double free:
>>>
>>>    BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>    #PF: supervisor read access in kernel mode
>>>    #PF: error_code(0x0000) - not-present page
>>>    PGD 0 P4D 0
>>>    Oops: 0000 [#1] SMP PTI
>>>    CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G         W   OE
>>>   5.11.0-rc4.bm.1-amd64+ #1
>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>    test_bpf: #1 TXA
>>>    Workqueue: events bpf_prog_free_deferred
>>>    RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
>>>    Code: [...]
>>>    RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
>>>    RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
>>>    RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
>>>    RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
>>>    R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
>>>    R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
>>>    FS:   0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
>>>    CS:   0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>    CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
>>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>    Call Trace:
>>>    refill_obj_stock+0x5e/0xd0
>>>    free_percpu+0xee/0x550
>>>    __bpf_prog_free+0x4d/0x60
>>>    process_one_work+0x26a/0x590
>>>    worker_thread+0x3c/0x390
>>>    ? process_one_work+0x590/0x590
>>>    kthread+0x130/0x150
>>>    ? kthread_park+0x80/0x80
>>>    ret_from_fork+0x1f/0x30
>>>
>>> This bug is 100% reproducible with test_kmod.sh.
>>>
>>> Reported-by: Jiang Wang <jiang.wang@bytedance.com>
>>> Fixes: 700d4796ef59 ("bpf: Optimize program stats")
>>> Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
>>> ---
>>>    kernel/bpf/core.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index 0ae015ad1e05..b0c11532e535 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other,
>>>                 * this still needs to be adapted.
>>>                 */
>>>                memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE);
>>> +             fp_other->stats = NULL;
>>> +             fp_other->active = NULL;
>>>        }
>>>
>>>        return fp;
>>
>> This is not correct. I presume if you enable blinding and stats, then this will still
> 
> Well, at least I ran all BPF selftests and found no crash. (Before my patch, the
> crash happened 100%.)
> 
>> crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free()
>> since the clone can be promoted as the actual prog and the prog ptr released instead.
> 
> Not sure if I understand your point, but what I cleared is fp_other,
> which is the original, not the clone. And of course, the original would
> be overriden:
> 
>          tmp = bpf_jit_blind_constants(prog);
>          if (IS_ERR(tmp))
>                  return orig_prog;
>          if (tmp != prog) {
>                  tmp_blinded = true;
>                  prog = tmp;  // <=== HERE
>          }
> 
> I think this is precisely why the crash does not happen after my patch.
> 
> However, it does seem to me patching bpf_prog_clone_free() is better,
> as there would be no assumption on using the original. All I want to
> say here is that both ways could fix the crash, which one is better is
> arguable.

The problem is that at the time of bpf_prog_clone_create() we don't know whether
the original prog or the clone will be used eventually. If the original (fp_other)
will in-fact be used, then stats/active there is NULL. And if the bpf_stats_enabled_key
static key is active, then __BPF_PROG_RUN() will just try to update stats and trigger
a NULL ptr deref, but it won't if done in bpf_prog_clone_free(). So the latter really
is necessary.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0ae015ad1e05..b0c11532e535 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1103,6 +1103,8 @@  static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other,
 		 * this still needs to be adapted.
 		 */
 		memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE);
+		fp_other->stats = NULL;
+		fp_other->active = NULL;
 	}
 
 	return fp;