diff mbox series

[net] net/tcp: Disable TCP-AO static key after RCU grace period

Message ID 20240725-tcp-ao-static-branch-rcu-v1-1-021d009beebf@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net/tcp: Disable TCP-AO static key after RCU grace period | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7 this patch: 7
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 7 this patch: 7
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 7 this patch: 7
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-26--15-00 (tests: 707)

Commit Message

Dmitry Safonov via B4 Relay July 25, 2024, 5 a.m. UTC
From: Dmitry Safonov <0x7f454c46@gmail.com>

The lifetime of TCP-AO static_key is the same as the last
tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
with RCU grace period, while tcp-ao static branch is currently deferred
destructed. The static key definition is
: DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);

which means that if RCU grace period is delayed by more than a second
and tcp_ao_needed is in the process of disablement, other CPUs may
yet see tcp_ao_info which atent dead, but soon-to-be.
And that breaks the assumption of static_key_fast_inc_not_disabled().

Happened on netdev test-bot[1], so not a theoretical issue:

[] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
[] ------------[ cut here ]------------
[] kernel BUG at arch/x86/kernel/jump_label.c:73!
[] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
[] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[] Workqueue: events jump_label_update_timeout
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
...
[] Call Trace:
[]  <TASK>
[]  arch_jump_label_transform_queue+0x6c/0x110
[]  __jump_label_update+0xef/0x350
[]  __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
[]  jump_label_update_timeout+0x2c/0x40
[]  process_one_work+0xe3b/0x1670
[]  worker_thread+0x587/0xce0
[]  kthread+0x28a/0x350
[]  ret_from_fork+0x31/0x70
[]  ret_from_fork_asm+0x1a/0x30
[]  </TASK>
[] Modules linked in: veth
[] ---[ end trace 0000000000000000 ]---
[] RIP: 0010:__jump_label_patch+0x2f6/0x350

[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr

Cc: stable@kernel.org
Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
---
 net/ipv4/tcp_ao.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)


---
base-commit: c33ffdb70cc6df4105160f991288e7d2567d7ffa
change-id: 20240725-tcp-ao-static-branch-rcu-85ede7b3a1a5

Best regards,

Comments

Eric Dumazet July 25, 2024, 7:47 a.m. UTC | #1
On Thu, Jul 25, 2024 at 7:00 AM Dmitry Safonov via B4 Relay
<devnull+0x7f454c46.gmail.com@kernel.org> wrote:
>
> From: Dmitry Safonov <0x7f454c46@gmail.com>
>
> The lifetime of TCP-AO static_key is the same as the last
> tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
> with RCU grace period, while tcp-ao static branch is currently deferred
> destructed. The static key definition is
> : DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);
>
> which means that if RCU grace period is delayed by more than a second
> and tcp_ao_needed is in the process of disablement, other CPUs may
> yet see tcp_ao_info which atent dead, but soon-to-be.

> And that breaks the assumption of static_key_fast_inc_not_disabled().

I am afraid I do not understand this changelog at all.

What is "the assumption of static_key_fast_inc_not_disabled()"  you
are referring to ?

I think it would help to provide more details.

>
> Happened on netdev test-bot[1], so not a theoretical issue:
>
> [] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
> [] ------------[ cut here ]------------
> [] kernel BUG at arch/x86/kernel/jump_label.c:73!
> [] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
> [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [] Workqueue: events jump_label_update_timeout
> [] RIP: 0010:__jump_label_patch+0x2f6/0x350
> ...
> [] Call Trace:
> []  <TASK>
> []  arch_jump_label_transform_queue+0x6c/0x110
> []  __jump_label_update+0xef/0x350
> []  __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
> []  jump_label_update_timeout+0x2c/0x40
> []  process_one_work+0xe3b/0x1670
> []  worker_thread+0x587/0xce0
> []  kthread+0x28a/0x350
> []  ret_from_fork+0x31/0x70
> []  ret_from_fork_asm+0x1a/0x30
> []  </TASK>
> [] Modules linked in: veth
> [] ---[ end trace 0000000000000000 ]---
> [] RIP: 0010:__jump_label_patch+0x2f6/0x350
>
> [1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr
>
> Cc: stable@kernel.org
> Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO")
> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
> ---
> ---
>  net/ipv4/tcp_ao.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
> index 85531437890c..5ce914d3e3db 100644
> --- a/net/ipv4/tcp_ao.c
> +++ b/net/ipv4/tcp_ao.c
> @@ -267,6 +267,14 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head)
>         kfree_sensitive(key);
>  }
>
> +static void tcp_ao_info_free_rcu(struct rcu_head *head)
> +{
> +       struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu);
> +
> +       kfree(ao);
> +       static_branch_slow_dec_deferred(&tcp_ao_needed);
> +}
> +
>  void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
>  {
>         struct tcp_ao_info *ao;
> @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
>                         atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
>                 call_rcu(&key->rcu, tcp_ao_key_free_rcu);
>         }
> -
> -       kfree_rcu(ao, rcu);
> -       static_branch_slow_dec_deferred(&tcp_ao_needed);
> +       call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
>  }
>
>  void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)
>
> ---
> base-commit: c33ffdb70cc6df4105160f991288e7d2567d7ffa
> change-id: 20240725-tcp-ao-static-branch-rcu-85ede7b3a1a5
>
> Best regards,
> --
> Dmitry Safonov <0x7f454c46@gmail.com>
>
>
Dmitry Safonov July 25, 2024, 8:50 a.m. UTC | #2
Hi Eric, thanks for looking into this,

On Thu, 25 Jul 2024 at 08:48, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jul 25, 2024 at 7:00 AM Dmitry Safonov via B4 Relay
> <devnull+0x7f454c46.gmail.com@kernel.org> wrote:
> >
> > From: Dmitry Safonov <0x7f454c46@gmail.com>
> >
> > The lifetime of TCP-AO static_key is the same as the last
> > tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
> > with RCU grace period, while tcp-ao static branch is currently deferred
> > destructed. The static key definition is
> > : DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);
> >
> > which means that if RCU grace period is delayed by more than a second
> > and tcp_ao_needed is in the process of disablement, other CPUs may
> > yet see tcp_ao_info which atent dead, but soon-to-be.
>
> > And that breaks the assumption of static_key_fast_inc_not_disabled().
>
> I am afraid I do not understand this changelog at all.
>
> What is "the assumption of static_key_fast_inc_not_disabled()"  you
> are referring to ?
>
> I think it would help to provide more details.

Sorry, my bad, I'm referring here to the comment/description for the function:

 * static_key_fast_inc_not_disabled - adds a user for a static key
 * @key: static key that must be already enabled
 *
 * The caller must make sure that the static key can't get disabled while
 * in this function. It doesn't patch jump labels, only adds a user to
 * an already enabled static key.

Originally it was introduced in commit eb8c507296f6 ("jump_label:
Prevent key->enabled int overflow"), which is needed for the atomic
contexts, one of which would be the creation of a full socket from a
request socket. In that atomic context, we know by the presence of the
key (md5/ao) that the static branch is already enabled. So, we can
just increment the ref counter for that static branch instead of
holding the proper mutex. static_key_fast_inc_not_disabled() is just a
helper for that usage case. But it must not be used if the static
branch could get disabled in parallel as it's not protected by
jump_label_mutex and as a result, races with jump_label_update()
implementation details.

Specifically, from the log in [1], I see that jump_label_type()
wrongly tells arch_jump_label_transform_queue() to enable the
static_brach, when the caller was, in fact,
__static_key_slow_dec_cpuslocked() - requesting to disable that. And
then, the x86-specific code produces:
: jump_label: Fatal kernel bug, unexpected op at
tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66
90 0f 1f 00)) size:2 type:1

when it tries to enable the static key; but the op-code is not no-op,
it's 2-byte jump. The reason for that of course is that intended
operation was to disable the branch, but it has raced with this
increment helper.

Hopefully, that clarifies somewhat the situation here.

Thankfully, for TCP-MD5 I did a better job: tcp_md5sig_info_free_rcu()
and tcp_md5_twsk_free_rcu() are RCU callbacks.

Also, please note that I intentionally call
static_branch_slow_dec_deferred() variant in the RCU callback, rather
than synchronous. The reason for that:

 * When the control is directly exposed to userspace, it is prudent to delay the
 * decrement to avoid high frequency code modifications which can (and do)
 * cause significant performance degradation. Struct static_key_deferred and
 * static_key_slow_dec_deferred() provide for this.

>
> >
> > Happened on netdev test-bot[1], so not a theoretical issue:
> >
> > [] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
> > [] ------------[ cut here ]------------
> > [] kernel BUG at arch/x86/kernel/jump_label.c:73!
> > [] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> > [] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
> > [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [] Workqueue: events jump_label_update_timeout
> > [] RIP: 0010:__jump_label_patch+0x2f6/0x350
> > ...
> > [] Call Trace:
> > []  <TASK>
> > []  arch_jump_label_transform_queue+0x6c/0x110
> > []  __jump_label_update+0xef/0x350
> > []  __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
> > []  jump_label_update_timeout+0x2c/0x40
> > []  process_one_work+0xe3b/0x1670
> > []  worker_thread+0x587/0xce0
> > []  kthread+0x28a/0x350
> > []  ret_from_fork+0x31/0x70
> > []  ret_from_fork_asm+0x1a/0x30
> > []  </TASK>
> > [] Modules linked in: veth
> > [] ---[ end trace 0000000000000000 ]---
> > [] RIP: 0010:__jump_label_patch+0x2f6/0x350
> >
> > [1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr
> >
> > Cc: stable@kernel.org
> > Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO")
> > Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
> > ---
> > ---
> >  net/ipv4/tcp_ao.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
> > index 85531437890c..5ce914d3e3db 100644
> > --- a/net/ipv4/tcp_ao.c
> > +++ b/net/ipv4/tcp_ao.c
> > @@ -267,6 +267,14 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head)
> >         kfree_sensitive(key);
> >  }
> >
> > +static void tcp_ao_info_free_rcu(struct rcu_head *head)
> > +{
> > +       struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu);
> > +
> > +       kfree(ao);
> > +       static_branch_slow_dec_deferred(&tcp_ao_needed);
> > +}
> > +
> >  void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
> >  {
> >         struct tcp_ao_info *ao;
> > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
> >                         atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
> >                 call_rcu(&key->rcu, tcp_ao_key_free_rcu);
> >         }
> > -
> > -       kfree_rcu(ao, rcu);
> > -       static_branch_slow_dec_deferred(&tcp_ao_needed);
> > +       call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
> >  }
> >
> >  void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)
> >
> > ---
> > base-commit: c33ffdb70cc6df4105160f991288e7d2567d7ffa
> > change-id: 20240725-tcp-ao-static-branch-rcu-85ede7b3a1a5
> >
> > Best regards,
> > --
> > Dmitry Safonov <0x7f454c46@gmail.com>
> >
> >


Thanks,
             Dmitry
Jakub Kicinski July 27, 2024, 2:34 a.m. UTC | #3
On Thu, 25 Jul 2024 06:00:02 +0100 Dmitry Safonov via B4 Relay wrote:
> @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
>  			atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
>  		call_rcu(&key->rcu, tcp_ao_key_free_rcu);
>  	}
> -
> -	kfree_rcu(ao, rcu);
> -	static_branch_slow_dec_deferred(&tcp_ao_needed);
> +	call_rcu(&ao->rcu, tcp_ao_info_free_rcu);

Maybe free the keys inside tcp_ao_info_free_rcu, too?
IIUC you're saying that new sock is still looking at this ao under RCU
protection - messing with the key list feels a tiny bit odd since the
object is technically "live" until the end of the RCU grace period.
Dmitry Safonov July 27, 2024, 6:30 p.m. UTC | #4
Hi Jakub,

On Sat, 27 Jul 2024 at 03:34, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 25 Jul 2024 06:00:02 +0100 Dmitry Safonov via B4 Relay wrote:
> > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
> >                       atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
> >               call_rcu(&key->rcu, tcp_ao_key_free_rcu);
> >       }
> > -
> > -     kfree_rcu(ao, rcu);
> > -     static_branch_slow_dec_deferred(&tcp_ao_needed);
> > +     call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
>
> Maybe free the keys inside tcp_ao_info_free_rcu, too?
> IIUC you're saying that new sock is still looking at this ao under RCU
> protection - messing with the key list feels a tiny bit odd since the
> object is technically "live" until the end of the RCU grace period.

Yeah, I think that's possible.
Looking at it now, I think it also needs
: rcu_assign_pointer(tp->ao_info, NULL)
above, rather than a plain null-assign.

Will fix and send v2.

Thanks,
             Dmitry
diff mbox series

Patch

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index 85531437890c..5ce914d3e3db 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -267,6 +267,14 @@  static void tcp_ao_key_free_rcu(struct rcu_head *head)
 	kfree_sensitive(key);
 }
 
+static void tcp_ao_info_free_rcu(struct rcu_head *head)
+{
+	struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu);
+
+	kfree(ao);
+	static_branch_slow_dec_deferred(&tcp_ao_needed);
+}
+
 void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
 {
 	struct tcp_ao_info *ao;
@@ -290,9 +298,7 @@  void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
 			atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
 		call_rcu(&key->rcu, tcp_ao_key_free_rcu);
 	}
-
-	kfree_rcu(ao, rcu);
-	static_branch_slow_dec_deferred(&tcp_ao_needed);
+	call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
 }
 
 void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)