Message ID | 20220104013153.97906-4-kuniyu@amazon.co.jp (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: Batching iter for AF_UNIX sockets. | expand |
On Mon, Jan 3, 2022 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > The commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") > introduces the batching algorithm to iterate TCP sockets with more > consistency. > > This patch uses the same algorithm to iterate AF_UNIX sockets. > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> There is something wrong in this patch: ./test_progs -t bpf_iter_setsockopt_unix [ 14.993474] bpf_testmod: loading out-of-tree module taints kernel. [ 15.068986] [ 15.069203] ===================================== [ 15.069698] WARNING: bad unlock balance detected! [ 15.070187] 5.16.0-rc7-01992-g15d8ab86952d #3780 Tainted: G O [ 15.070937] ------------------------------------- [ 15.071441] test_progs/1438 is trying to release lock (&unix_table_locks[i]) at: [ 15.072209] [<ffffffff831b7ae9>] unix_next_socket+0x169/0x460 [ 15.072825] but there are no more locks to release! [ 15.073329] [ 15.073329] other info that might help us debug this: [ 15.074004] 1 lock held by test_progs/1438: [ 15.074441] #0: ffff8881072c81c8 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x61/0xfa0 [ 15.075279] [ 15.075279] stack backtrace: [ 15.075744] CPU: 0 PID: 1438 Comm: test_progs Tainted: G O 5.16.0-rc7-01992-g15d8ab86952d #3780 [ 15.076792] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 15.077986] Call Trace: [ 15.078250] <TASK> [ 15.078476] dump_stack_lvl+0x44/0x57 [ 15.078873] lock_release+0x48e/0x650 [ 15.079262] ? unix_next_socket+0x169/0x460 [ 15.079712] ? lock_downgrade+0x690/0x690 [ 15.080131] ? lock_downgrade+0x690/0x690 [ 15.080559] _raw_spin_unlock+0x17/0x40 [ 15.080979] unix_next_socket+0x169/0x460 [ 15.081402] ? bpf_iter_unix_seq_show+0x20b/0x270 [ 15.081898] bpf_iter_unix_batch+0xf7/0x580 [ 15.082337] ? trace_kmalloc_node+0x29/0xd0 [ 15.082786] bpf_seq_read+0x4a1/0xfa0 [ 15.083176] ? up_read+0x1a1/0x720 [ 15.083538] vfs_read+0x128/0x4e0 [ 15.083902] ksys_read+0xe7/0x1b0 [ 15.084253] ? vfs_write+0x8b0/0x8b0 [ 15.084638] do_syscall_64+0x34/0x80 [ 15.085016] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.085545] RIP: 0033:0x7f2c4a5ad8b2 [ 15.085931] Code: 97 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 f3 0f 1e fa 8b 05 96 db 20 00 85 c0 75 12 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 [ 15.087875] RSP: 002b:00007fff4c8c24b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 15.088658] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2c4a5ad8b2 [ 15.089396] RDX: 0000000000000001 RSI: 00007fff4c8c24cb RDI: 000000000000000a [ 15.090132] RBP: 00007fff4c8c2550 R08: 0000000000000000 R09: 00007fff4c8c2397 [ 15.090870] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000040d910 [ 15.091618] R13: 00007fff4c8c2750 R14: 0000000000000000 R15: 0000000000000000 [ 15.092403] </TASK> I've applied patches 1 and 2 to bpf-next.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Wed, 5 Jan 2022 14:22:38 -0800 > On Mon, Jan 3, 2022 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > The commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") > > introduces the batching algorithm to iterate TCP sockets with more > > consistency. > > > > This patch uses the same algorithm to iterate AF_UNIX sockets. > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > > There is something wrong in this patch: > > ./test_progs -t bpf_iter_setsockopt_unix > [ 14.993474] bpf_testmod: loading out-of-tree module taints kernel. > [ 15.068986] > [ 15.069203] ===================================== > [ 15.069698] WARNING: bad unlock balance detected! > [ 15.070187] 5.16.0-rc7-01992-g15d8ab86952d #3780 Tainted: G O > [ 15.070937] ------------------------------------- > [ 15.071441] test_progs/1438 is trying to release lock > (&unix_table_locks[i]) at: > [ 15.072209] [<ffffffff831b7ae9>] unix_next_socket+0x169/0x460 > [ 15.072825] but there are no more locks to release! > [ 15.073329] > [ 15.073329] other info that might help us debug this: > [ 15.074004] 1 lock held by test_progs/1438: > [ 15.074441] #0: ffff8881072c81c8 (&p->lock){+.+.}-{3:3}, at: > bpf_seq_read+0x61/0xfa0 > [ 15.075279] > [ 15.075279] stack backtrace: > [ 15.075744] CPU: 0 PID: 1438 Comm: test_progs Tainted: G > O 5.16.0-rc7-01992-g15d8ab86952d #3780 > [ 15.076792] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 > [ 15.077986] Call Trace: > [ 15.078250] <TASK> > [ 15.078476] dump_stack_lvl+0x44/0x57 > [ 15.078873] lock_release+0x48e/0x650 > [ 15.079262] ? unix_next_socket+0x169/0x460 > [ 15.079712] ? lock_downgrade+0x690/0x690 > [ 15.080131] ? lock_downgrade+0x690/0x690 > [ 15.080559] _raw_spin_unlock+0x17/0x40 > [ 15.080979] unix_next_socket+0x169/0x460 > [ 15.081402] ? bpf_iter_unix_seq_show+0x20b/0x270 > [ 15.081898] bpf_iter_unix_batch+0xf7/0x580 > [ 15.082337] ? trace_kmalloc_node+0x29/0xd0 > [ 15.082786] bpf_seq_read+0x4a1/0xfa0 > [ 15.083176] ? up_read+0x1a1/0x720 > [ 15.083538] vfs_read+0x128/0x4e0 > [ 15.083902] ksys_read+0xe7/0x1b0 > [ 15.084253] ? vfs_write+0x8b0/0x8b0 > [ 15.084638] do_syscall_64+0x34/0x80 > [ 15.085016] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 15.085545] RIP: 0033:0x7f2c4a5ad8b2 > [ 15.085931] Code: 97 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb > b6 0f 1f 80 00 00 00 00 f3 0f 1e fa 8b 05 96 db 20 00 85 c0 75 12 31 > c0 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 41 54 49 89 d4 55 > 48 89 > [ 15.087875] RSP: 002b:00007fff4c8c24b8 EFLAGS: 00000246 ORIG_RAX: > 0000000000000000 > [ 15.088658] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2c4a5ad8b2 > [ 15.089396] RDX: 0000000000000001 RSI: 00007fff4c8c24cb RDI: 000000000000000a > [ 15.090132] RBP: 00007fff4c8c2550 R08: 0000000000000000 R09: 00007fff4c8c2397 > [ 15.090870] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000040d910 > [ 15.091618] R13: 00007fff4c8c2750 R14: 0000000000000000 R15: 0000000000000000 > [ 15.092403] </TASK> > > > I've applied patches 1 and 2 to bpf-next. Thanks, I will take a look with lockdep enabled.
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index c19569819866..dd6804086372 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3347,6 +3347,14 @@ static const struct seq_operations unix_seq_ops = { }; #if IS_BUILTIN(CONFIG_UNIX) && defined(CONFIG_BPF_SYSCALL) +struct bpf_unix_iter_state { + struct seq_net_private p; + unsigned int cur_sk; + unsigned int end_sk; + unsigned int max_sk; + struct sock **batch; +}; + struct bpf_iter__unix { __bpf_md_ptr(struct bpf_iter_meta *, meta); __bpf_md_ptr(struct unix_sock *, unix_sk); @@ -3365,24 +3373,155 @@ static int unix_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, return bpf_iter_run_prog(prog, &ctx); } +static int bpf_iter_unix_hold_batch(struct seq_file *seq, struct sock *start_sk) + +{ + struct bpf_unix_iter_state *iter = seq->private; + unsigned int expected = 1; + struct sock *sk; + + sock_hold(start_sk); + iter->batch[iter->end_sk++] = start_sk; + + for (sk = sk_next(start_sk); sk; sk = sk_next(sk)) { + if (sock_net(sk) != seq_file_net(seq)) + continue; + + if (iter->end_sk < iter->max_sk) { + sock_hold(sk); + iter->batch[iter->end_sk++] = sk; + } + + expected++; + } + + spin_unlock(&unix_table_locks[start_sk->sk_hash]); + + return expected; +} + +static void bpf_iter_unix_put_batch(struct bpf_unix_iter_state *iter) +{ + while (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); +} + +static int bpf_iter_unix_realloc_batch(struct bpf_unix_iter_state *iter, + unsigned int new_batch_sz) +{ + struct sock **new_batch; + + new_batch = kvmalloc(sizeof(*new_batch) * new_batch_sz, + GFP_USER | __GFP_NOWARN); + if (!new_batch) + return -ENOMEM; + + bpf_iter_unix_put_batch(iter); + kvfree(iter->batch); + iter->batch = new_batch; + iter->max_sk = new_batch_sz; + + return 0; +} + +static struct sock *bpf_iter_unix_batch(struct seq_file *seq, + struct sock *start_sk, + loff_t *pos) +{ + struct bpf_unix_iter_state *iter = seq->private; + unsigned int expected; + bool resized = false; + struct sock *sk; + +again: + /* Get a new batch */ + iter->cur_sk = 0; + iter->end_sk = 0; + + sk = unix_next_socket(seq, start_sk, pos); + if (!sk) + return NULL; /* Done */ + + expected = bpf_iter_unix_hold_batch(seq, sk); + + if (iter->end_sk == expected) + return sk; + + if (!resized && !bpf_iter_unix_realloc_batch(iter, expected * 3 / 2)) { + resized = true; + goto again; + } + + return sk; +} + +static void *bpf_iter_unix_seq_start(struct seq_file *seq, loff_t *pos) +{ + if (!*pos) + return SEQ_START_TOKEN; + + if (get_bucket(*pos) >= ARRAY_SIZE(unix_socket_table)) + return NULL; + + /* bpf iter does not support lseek, so it always + * continue from where it was stop()-ped. + */ + return bpf_iter_unix_batch(seq, NULL, pos); +} + +static void *bpf_iter_unix_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct bpf_unix_iter_state *iter = seq->private; + struct sock *sk; + + /* Whenever seq_next() is called, the iter->cur_sk is + * done with seq_show(), so advance to the next sk in + * the batch. + */ + if (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); + + ++*pos; + + if (iter->cur_sk < iter->end_sk) + sk = iter->batch[iter->cur_sk]; + else + sk = bpf_iter_unix_batch(seq, v, pos); + + return sk; +} + static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) { struct bpf_iter_meta meta; struct bpf_prog *prog; struct sock *sk = v; uid_t uid; + bool slow; + int ret; if (v == SEQ_START_TOKEN) return 0; + slow = lock_sock_fast(sk); + + if (unlikely(sk_unhashed(sk))) { + ret = SEQ_SKIP; + goto unlock; + } + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)); meta.seq = seq; prog = bpf_iter_get_info(&meta, false); - return unix_prog_seq_show(prog, &meta, v, uid); + ret = unix_prog_seq_show(prog, &meta, v, uid); +unlock: + unlock_sock_fast(sk, slow); + return ret; } static void bpf_iter_unix_seq_stop(struct seq_file *seq, void *v) { + struct bpf_unix_iter_state *iter = seq->private; struct bpf_iter_meta meta; struct bpf_prog *prog; @@ -3393,12 +3532,13 @@ static void bpf_iter_unix_seq_stop(struct seq_file *seq, void *v) (void)unix_prog_seq_show(prog, &meta, v, 0); } - unix_seq_stop(seq, v); + if (iter->cur_sk < iter->end_sk) + bpf_iter_unix_put_batch(iter); } static const struct seq_operations bpf_iter_unix_seq_ops = { - .start = unix_seq_start, - .next = unix_seq_next, + .start = bpf_iter_unix_seq_start, + .next = bpf_iter_unix_seq_next, .stop = bpf_iter_unix_seq_stop, .show = bpf_iter_unix_seq_show, }; @@ -3447,11 +3587,39 @@ static struct pernet_operations unix_net_ops = { DEFINE_BPF_ITER_FUNC(unix, struct bpf_iter_meta *meta, struct unix_sock *unix_sk, uid_t uid) +#define INIT_BATCH_SZ 16 + +static int bpf_iter_init_unix(void *priv_data, struct bpf_iter_aux_info *aux) +{ + struct bpf_unix_iter_state *iter = priv_data; + int err; + + err = bpf_iter_init_seq_net(priv_data, aux); + if (err) + return err; + + err = bpf_iter_unix_realloc_batch(iter, INIT_BATCH_SZ); + if (err) { + bpf_iter_fini_seq_net(priv_data); + return err; + } + + return 0; +} + +static void bpf_iter_fini_unix(void *priv_data) +{ + struct bpf_unix_iter_state *iter = priv_data; + + bpf_iter_fini_seq_net(priv_data); + kvfree(iter->batch); +} + static const struct bpf_iter_seq_info unix_seq_info = { .seq_ops = &bpf_iter_unix_seq_ops, - .init_seq_private = bpf_iter_init_seq_net, - .fini_seq_private = bpf_iter_fini_seq_net, - .seq_priv_size = sizeof(struct seq_net_private), + .init_seq_private = bpf_iter_init_unix, + .fini_seq_private = bpf_iter_fini_unix, + .seq_priv_size = sizeof(struct bpf_unix_iter_state), }; static struct bpf_iter_reg unix_reg_info = {
The commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") introduces the batching algorithm to iterate TCP sockets with more consistency. This patch uses the same algorithm to iterate AF_UNIX sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> --- net/unix/af_unix.c | 182 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 175 insertions(+), 7 deletions(-)