Message ID | 20230728064411.305576-1-tglozar@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 13d2618b48f15966d1adfe1ff6a1985f5eef40ba |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire | expand |
On Fri, Jul 28, 2023 at 08:44:11AM +0200, tglozar@redhat.com wrote: > From: Tomas Glozar <tglozar@redhat.com> > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > patchset notes for details). > > This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to > BUG (sleeping function called from invalid context) on RT kernels. > > preempt_disable was introduced together with lock_sk and rcu_read_lock > in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update > in parallel"), probably to match disabled migration of BPF programs, and > is no longer necessary. > > Remove preempt_disable to fix BUG in sock_map_update_common on RT. FYI, I'm not sure it's related but I started to see following splat recently: [ 189.360689][ T658] ============================= [ 189.361149][ T658] [ BUG: Invalid wait context ] [ 189.361588][ T658] 6.5.0-rc2+ #589 Tainted: G OE [ 189.362174][ T658] ----------------------------- [ 189.362660][ T658] test_progs/658 is trying to lock: [ 189.363176][ T658] ffff8881702652b8 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x1c4/0x340 [ 189.364152][ T658] other info that might help us debug this: [ 189.364689][ T658] context-{5:5} [ 189.365021][ T658] 3 locks held by test_progs/658: [ 189.365508][ T658] #0: ffff888177611a80 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0x82/0x260 [ 189.366503][ T658] #1: ffffffff835a3180 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0x78/0x260 [ 189.367470][ T658] #2: ffff88816cf19240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x12a/0x340 [ 189.368420][ T658] stack backtrace: [ 189.368806][ T658] CPU: 0 PID: 658 Comm: test_progs Tainted: G OE 6.5.0-rc2+ #589 98af30b3c42d747b51da05f1d0e4899e394be6c9 [ 189.369889][ T658] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 [ 189.370736][ T658] Call Trace: [ 189.371063][ T658] <TASK> [ 189.371365][ T658] dump_stack_lvl+0xb2/0x120 [ 189.371798][ T658] __lock_acquire+0x9ad/0x2470 [ 189.372243][ T658] ? lock_acquire+0x104/0x350 [ 189.372680][ T658] lock_acquire+0x104/0x350 [ 189.373104][ T658] ? sock_map_update_common+0x1c4/0x340 [ 189.373615][ T658] ? find_held_lock+0x32/0x90 [ 189.374074][ T658] ? sock_map_update_common+0x12a/0x340 [ 189.374587][ T658] _raw_spin_lock_bh+0x38/0x80 [ 189.375060][ T658] ? sock_map_update_common+0x1c4/0x340 [ 189.375571][ T658] sock_map_update_common+0x1c4/0x340 [ 189.376118][ T658] sock_map_update_elem_sys+0x184/0x260 [ 189.376704][ T658] __sys_bpf+0x181f/0x2840 [ 189.377147][ T658] __x64_sys_bpf+0x1a/0x30 [ 189.377556][ T658] do_syscall_64+0x38/0x90 [ 189.377980][ T658] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 189.378473][ T658] RIP: 0033:0x7fe52f47ab5d the patch did not help with that jirka > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> > --- > net/core/sock_map.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > index 19538d628714..08ab108206bf 100644 > --- a/net/core/sock_map.c > +++ b/net/core/sock_map.c > @@ -115,7 +115,6 @@ static void sock_map_sk_acquire(struct sock *sk) > __acquires(&sk->sk_lock.slock) > { > lock_sock(sk); > - preempt_disable(); > rcu_read_lock(); > } > > @@ -123,7 +122,6 @@ static void sock_map_sk_release(struct sock *sk) > __releases(&sk->sk_lock.slock) > { > rcu_read_unlock(); > - preempt_enable(); > release_sock(sk); > } > > -- > 2.39.3 > >
On 7/28/23 4:48 AM, Jiri Olsa wrote: > On Fri, Jul 28, 2023 at 08:44:11AM +0200, tglozar@redhat.com wrote: >> From: Tomas Glozar <tglozar@redhat.com> >> >> Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC >> allocation later in sk_psock_init_link on PREEMPT_RT kernels, since >> GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist >> patchset notes for details). >> >> This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to >> BUG (sleeping function called from invalid context) on RT kernels. >> >> preempt_disable was introduced together with lock_sk and rcu_read_lock >> in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update >> in parallel"), probably to match disabled migration of BPF programs, and >> is no longer necessary. >> >> Remove preempt_disable to fix BUG in sock_map_update_common on RT. > > FYI, I'm not sure it's related but I started to see following splat recently: > > [ 189.360689][ T658] ============================= > [ 189.361149][ T658] [ BUG: Invalid wait context ] > [ 189.361588][ T658] 6.5.0-rc2+ #589 Tainted: G OE > [ 189.362174][ T658] ----------------------------- > [ 189.362660][ T658] test_progs/658 is trying to lock: > [ 189.363176][ T658] ffff8881702652b8 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x1c4/0x340 > [ 189.364152][ T658] other info that might help us debug this: > [ 189.364689][ T658] context-{5:5} > [ 189.365021][ T658] 3 locks held by test_progs/658: > [ 189.365508][ T658] #0: ffff888177611a80 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0x82/0x260 > [ 189.366503][ T658] #1: ffffffff835a3180 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0x78/0x260 > [ 189.367470][ T658] #2: ffff88816cf19240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x12a/0x340 > [ 189.368420][ T658] stack backtrace: > [ 189.368806][ T658] CPU: 0 PID: 658 Comm: test_progs Tainted: G OE 6.5.0-rc2+ #589 98af30b3c42d747b51da05f1d0e4899e394be6c9 > [ 189.369889][ T658] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 > [ 189.370736][ T658] Call Trace: > [ 189.371063][ T658] <TASK> > [ 189.371365][ T658] dump_stack_lvl+0xb2/0x120 > [ 189.371798][ T658] __lock_acquire+0x9ad/0x2470 > [ 189.372243][ T658] ? lock_acquire+0x104/0x350 > [ 189.372680][ T658] lock_acquire+0x104/0x350 > [ 189.373104][ T658] ? sock_map_update_common+0x1c4/0x340 > [ 189.373615][ T658] ? find_held_lock+0x32/0x90 > [ 189.374074][ T658] ? sock_map_update_common+0x12a/0x340 > [ 189.374587][ T658] _raw_spin_lock_bh+0x38/0x80 > [ 189.375060][ T658] ? sock_map_update_common+0x1c4/0x340 > [ 189.375571][ T658] sock_map_update_common+0x1c4/0x340 > [ 189.376118][ T658] sock_map_update_elem_sys+0x184/0x260 > [ 189.376704][ T658] __sys_bpf+0x181f/0x2840 > [ 189.377147][ T658] __x64_sys_bpf+0x1a/0x30 > [ 189.377556][ T658] do_syscall_64+0x38/0x90 > [ 189.377980][ T658] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [ 189.378473][ T658] RIP: 0033:0x7fe52f47ab5d > > the patch did not help with that I think the above splat is not related to this patch. In function sock_map_update_common func we have raw_spin_lock_bh(&stab->lock); sock_map_add_link(psock, link, map, &stab->sks[idx]); spin_lock_bh(&psock->link_lock); ... spin_unlock_bh(&psock->link_lock); raw_spin_unlock_bh(&stab->lock); I think you probably have CONFIG_PROVE_RAW_LOCK_NESTING turned on in your config. In the above case, for RT kernel, spin_lock_bh will become 'mutex' and it is sleepable, while raw_spin_lock_bh remains to be a spin lock. The warning is about potential locking violation with RT kernel. To fix the issue, you can convert spin_lock_bh to raw_spin_lock_bh to silence the warning. > > jirka > >> >> Signed-off-by: Tomas Glozar <tglozar@redhat.com> >> --- >> net/core/sock_map.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/net/core/sock_map.c b/net/core/sock_map.c >> index 19538d628714..08ab108206bf 100644 >> --- a/net/core/sock_map.c >> +++ b/net/core/sock_map.c >> @@ -115,7 +115,6 @@ static void sock_map_sk_acquire(struct sock *sk) >> __acquires(&sk->sk_lock.slock) >> { >> lock_sock(sk); >> - preempt_disable(); >> rcu_read_lock(); >> } >> >> @@ -123,7 +122,6 @@ static void sock_map_sk_release(struct sock *sk) >> __releases(&sk->sk_lock.slock) >> { >> rcu_read_unlock(); >> - preempt_enable(); >> release_sock(sk); >> } >> >> -- >> 2.39.3 >> >> >
On Fri, Jul 28, 2023 at 08:44 AM +02, tglozar@redhat.com wrote: > From: Tomas Glozar <tglozar@redhat.com> > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > patchset notes for details). > > This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to > BUG (sleeping function called from invalid context) on RT kernels. > > preempt_disable was introduced together with lock_sk and rcu_read_lock > in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update > in parallel"), probably to match disabled migration of BPF programs, and > is no longer necessary. > > Remove preempt_disable to fix BUG in sock_map_update_common on RT. > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> > --- We disable softirq and hold a spin lock when modifying the map/hash in sock_{map,hash}_update_common so this LGTM: Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> You might want some extra tags: Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
Jakub Sitnicki wrote: > > On Fri, Jul 28, 2023 at 08:44 AM +02, tglozar@redhat.com wrote: > > From: Tomas Glozar <tglozar@redhat.com> > > > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > > patchset notes for details). > > > > This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to > > BUG (sleeping function called from invalid context) on RT kernels. > > > > preempt_disable was introduced together with lock_sk and rcu_read_lock > > in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update > > in parallel"), probably to match disabled migration of BPF programs, and > > is no longer necessary. > > > > Remove preempt_disable to fix BUG in sock_map_update_common on RT. > > > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> > > --- > > We disable softirq and hold a spin lock when modifying the map/hash in > sock_{map,hash}_update_common so this LGTM: > Agree. > Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> > > You might want some extra tags: > > Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ > Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Fri, 28 Jul 2023 08:44:11 +0200 you wrote: > From: Tomas Glozar <tglozar@redhat.com> > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > patchset notes for details). > > [...] Here is the summary with links: - [net] bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire https://git.kernel.org/netdev/net/c/13d2618b48f1 You are awesome, thank you!
On Mon, 2023-07-31 at 20:58 -0700, John Fastabend wrote: > Jakub Sitnicki wrote: > > > > On Fri, Jul 28, 2023 at 08:44 AM +02, tglozar@redhat.com wrote: > > > From: Tomas Glozar <tglozar@redhat.com> > > > > > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > > > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > > > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > > > patchset notes for details). > > > > > > This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to > > > BUG (sleeping function called from invalid context) on RT kernels. > > > > > > preempt_disable was introduced together with lock_sk and rcu_read_lock > > > in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update > > > in parallel"), probably to match disabled migration of BPF programs, and > > > is no longer necessary. > > > > > > Remove preempt_disable to fix BUG in sock_map_update_common on RT. > > > > > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> > > > --- > > > > We disable softirq and hold a spin lock when modifying the map/hash in > > sock_{map,hash}_update_common so this LGTM: > > > > Agree. > > > Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> > > Reviewed-by: John Fastabend <john.fastabend@gmail.com> > > > > > You might want some extra tags: > > > > Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ > > Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel") ENOCOFFEE here (which is never an excuse, but at least today is really true), but I considered you may want this patch via the ebpf tree only after applying it to net. Hopefully should not be tragic, but please let me know if you prefer the change reverted from net and going via the other path. Thanks! Paolo
On Tue, Aug 1, 2023 at 12:44 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2023-07-31 at 20:58 -0700, John Fastabend wrote: > > Jakub Sitnicki wrote: > > > > > > On Fri, Jul 28, 2023 at 08:44 AM +02, tglozar@redhat.com wrote: > > > > From: Tomas Glozar <tglozar@redhat.com> > > > > > > > > Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC > > > > allocation later in sk_psock_init_link on PREEMPT_RT kernels, since > > > > GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist > > > > patchset notes for details). > > > > > > > > This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to > > > > BUG (sleeping function called from invalid context) on RT kernels. > > > > > > > > preempt_disable was introduced together with lock_sk and rcu_read_lock > > > > in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update > > > > in parallel"), probably to match disabled migration of BPF programs, and > > > > is no longer necessary. > > > > > > > > Remove preempt_disable to fix BUG in sock_map_update_common on RT. > > > > > > > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> > > > > --- > > > > > > We disable softirq and hold a spin lock when modifying the map/hash in > > > sock_{map,hash}_update_common so this LGTM: > > > > > > > Agree. > > > > > Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> > > > > Reviewed-by: John Fastabend <john.fastabend@gmail.com> > > > > > > > > You might want some extra tags: > > > > > > Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ > > > Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel") > > ENOCOFFEE here (which is never an excuse, but at least today is really > true), but I considered you may want this patch via the ebpf tree only > after applying it to net. > > Hopefully should not be tragic, but please let me know if you prefer > the change reverted from net and going via the other path. It's fine. Probably it shouldn't conflict with other sockmap fixes.
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 19538d628714..08ab108206bf 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -115,7 +115,6 @@ static void sock_map_sk_acquire(struct sock *sk) __acquires(&sk->sk_lock.slock) { lock_sock(sk); - preempt_disable(); rcu_read_lock(); } @@ -123,7 +122,6 @@ static void sock_map_sk_release(struct sock *sk) __releases(&sk->sk_lock.slock) { rcu_read_unlock(); - preempt_enable(); release_sock(sk); }