Message ID | 20210217035844.53746-1-xiyou.wangcong@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] bpf: clear per_cpu pointers in bpf_prog_clone_create() | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 2 blamed authors not CCed: daniel@iogearbox.net andrii@kernel.org; 7 maintainers not CCed: daniel@iogearbox.net andrii@kernel.org yhs@fb.com john.fastabend@gmail.com kpsingh@kernel.org songliubraving@fb.com kafai@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 8 this patch: 8 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 8 this patch: 8 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
Cong Wang wrote: > From: Cong Wang <cong.wang@bytedance.com> > > Pretty much similar to commit 1336c662474e > ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to > clear these two percpu pointers in bpf_prog_clone_create(), otherwise > would get a double free: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G W OE > 5.11.0-rc4.bm.1-amd64+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > test_bpf: #1 TXA > Workqueue: events bpf_prog_free_deferred > RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0 > Code: [...] > RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b > RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046 > RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0 > FS: 0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > refill_obj_stock+0x5e/0xd0 > free_percpu+0xee/0x550 > __bpf_prog_free+0x4d/0x60 > process_one_work+0x26a/0x590 > worker_thread+0x3c/0x390 > ? process_one_work+0x590/0x590 > kthread+0x130/0x150 > ? kthread_park+0x80/0x80 > ret_from_fork+0x1f/0x30 > > This bug is 100% reproducible with test_kmod.sh. > > Reported-by: Jiang Wang <jiang.wang@bytedance.com> > Fixes: 700d4796ef59 ("bpf: Optimize program stats") > Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") > Cc: Alexei Starovoitov <ast@kernel.org> > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > --- Acked-by: John Fastabend <john.fastabend@gmail.com>
On 2/17/21 4:58 AM, Cong Wang wrote: > From: Cong Wang <cong.wang@bytedance.com> > > Pretty much similar to commit 1336c662474e > ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to > clear these two percpu pointers in bpf_prog_clone_create(), otherwise > would get a double free: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G W OE > 5.11.0-rc4.bm.1-amd64+ #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > test_bpf: #1 TXA > Workqueue: events bpf_prog_free_deferred > RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0 > Code: [...] > RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b > RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046 > RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0 > FS: 0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > refill_obj_stock+0x5e/0xd0 > free_percpu+0xee/0x550 > __bpf_prog_free+0x4d/0x60 > process_one_work+0x26a/0x590 > worker_thread+0x3c/0x390 > ? process_one_work+0x590/0x590 > kthread+0x130/0x150 > ? kthread_park+0x80/0x80 > ret_from_fork+0x1f/0x30 > > This bug is 100% reproducible with test_kmod.sh. > > Reported-by: Jiang Wang <jiang.wang@bytedance.com> > Fixes: 700d4796ef59 ("bpf: Optimize program stats") > Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") > Cc: Alexei Starovoitov <ast@kernel.org> > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > --- > kernel/bpf/core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 0ae015ad1e05..b0c11532e535 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other, > * this still needs to be adapted. > */ > memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE); > + fp_other->stats = NULL; > + fp_other->active = NULL; > } > > return fp; > This is not correct. I presume if you enable blinding and stats, then this will still crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free() since the clone can be promoted as the actual prog and the prog ptr released instead. Thanks, Daniel
On Wed, Feb 17, 2021 at 2:01 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 2/17/21 4:58 AM, Cong Wang wrote: > > From: Cong Wang <cong.wang@bytedance.com> > > > > Pretty much similar to commit 1336c662474e > > ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to > > clear these two percpu pointers in bpf_prog_clone_create(), otherwise > > would get a double free: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP PTI > > CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G W OE > > 5.11.0-rc4.bm.1-amd64+ #1 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > test_bpf: #1 TXA > > Workqueue: events bpf_prog_free_deferred > > RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0 > > Code: [...] > > RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002 > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b > > RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046 > > RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001 > > R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280 > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0 > > FS: 0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > refill_obj_stock+0x5e/0xd0 > > free_percpu+0xee/0x550 > > __bpf_prog_free+0x4d/0x60 > > process_one_work+0x26a/0x590 > > worker_thread+0x3c/0x390 > > ? process_one_work+0x590/0x590 > > kthread+0x130/0x150 > > ? kthread_park+0x80/0x80 > > ret_from_fork+0x1f/0x30 > > > > This bug is 100% reproducible with test_kmod.sh. > > > > Reported-by: Jiang Wang <jiang.wang@bytedance.com> > > Fixes: 700d4796ef59 ("bpf: Optimize program stats") > > Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") > > Cc: Alexei Starovoitov <ast@kernel.org> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > > --- > > kernel/bpf/core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index 0ae015ad1e05..b0c11532e535 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other, > > * this still needs to be adapted. > > */ > > memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE); > > + fp_other->stats = NULL; > > + fp_other->active = NULL; > > } > > > > return fp; > > > > This is not correct. I presume if you enable blinding and stats, then this will still Well, at least I ran all BPF selftests and found no crash. (Before my patch, the crash happened 100%.) > crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free() > since the clone can be promoted as the actual prog and the prog ptr released instead. > Not sure if I understand your point, but what I cleared is fp_other, which is the original, not the clone. And of course, the original would be overriden: tmp = bpf_jit_blind_constants(prog); if (IS_ERR(tmp)) return orig_prog; if (tmp != prog) { tmp_blinded = true; prog = tmp; // <=== HERE } I think this is precisely why the crash does not happen after my patch. However, it does seem to me patching bpf_prog_clone_free() is better, as there would be no assumption on using the original. All I want to say here is that both ways could fix the crash, which one is better is arguable. Thanks.
On 2/17/21 11:46 PM, Cong Wang wrote: > On Wed, Feb 17, 2021 at 2:01 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 2/17/21 4:58 AM, Cong Wang wrote: >>> From: Cong Wang <cong.wang@bytedance.com> >>> >>> Pretty much similar to commit 1336c662474e >>> ("bpf: Clear per_cpu pointers during bpf_prog_realloc") we also need to >>> clear these two percpu pointers in bpf_prog_clone_create(), otherwise >>> would get a double free: >>> >>> BUG: kernel NULL pointer dereference, address: 0000000000000000 >>> #PF: supervisor read access in kernel mode >>> #PF: error_code(0x0000) - not-present page >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP PTI >>> CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G W OE >>> 5.11.0-rc4.bm.1-amd64+ #1 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 >>> test_bpf: #1 TXA >>> Workqueue: events bpf_prog_free_deferred >>> RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0 >>> Code: [...] >>> RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002 >>> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b >>> RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046 >>> RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001 >>> R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280 >>> R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0 >>> FS: 0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Call Trace: >>> refill_obj_stock+0x5e/0xd0 >>> free_percpu+0xee/0x550 >>> __bpf_prog_free+0x4d/0x60 >>> process_one_work+0x26a/0x590 >>> worker_thread+0x3c/0x390 >>> ? process_one_work+0x590/0x590 >>> kthread+0x130/0x150 >>> ? kthread_park+0x80/0x80 >>> ret_from_fork+0x1f/0x30 >>> >>> This bug is 100% reproducible with test_kmod.sh. >>> >>> Reported-by: Jiang Wang <jiang.wang@bytedance.com> >>> Fixes: 700d4796ef59 ("bpf: Optimize program stats") >>> Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") >>> Cc: Alexei Starovoitov <ast@kernel.org> >>> Signed-off-by: Cong Wang <cong.wang@bytedance.com> >>> --- >>> kernel/bpf/core.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>> index 0ae015ad1e05..b0c11532e535 100644 >>> --- a/kernel/bpf/core.c >>> +++ b/kernel/bpf/core.c >>> @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other, >>> * this still needs to be adapted. >>> */ >>> memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE); >>> + fp_other->stats = NULL; >>> + fp_other->active = NULL; >>> } >>> >>> return fp; >> >> This is not correct. I presume if you enable blinding and stats, then this will still > > Well, at least I ran all BPF selftests and found no crash. (Before my patch, the > crash happened 100%.) > >> crash. The proper way to fix it is to NULL these pointers in bpf_prog_clone_free() >> since the clone can be promoted as the actual prog and the prog ptr released instead. > > Not sure if I understand your point, but what I cleared is fp_other, > which is the original, not the clone. And of course, the original would > be overriden: > > tmp = bpf_jit_blind_constants(prog); > if (IS_ERR(tmp)) > return orig_prog; > if (tmp != prog) { > tmp_blinded = true; > prog = tmp; // <=== HERE > } > > I think this is precisely why the crash does not happen after my patch. > > However, it does seem to me patching bpf_prog_clone_free() is better, > as there would be no assumption on using the original. All I want to > say here is that both ways could fix the crash, which one is better is > arguable. The problem is that at the time of bpf_prog_clone_create() we don't know whether the original prog or the clone will be used eventually. If the original (fp_other) will in-fact be used, then stats/active there is NULL. And if the bpf_stats_enabled_key static key is active, then __BPF_PROG_RUN() will just try to update stats and trigger a NULL ptr deref, but it won't if done in bpf_prog_clone_free(). So the latter really is necessary. Thanks, Daniel
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 0ae015ad1e05..b0c11532e535 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1103,6 +1103,8 @@ static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other, * this still needs to be adapted. */ memcpy(fp, fp_other, fp_other->pages * PAGE_SIZE); + fp_other->stats = NULL; + fp_other->active = NULL; } return fp;