diff mbox series

[v2,net-next] tcp: Warn if sock_owned_by_user() is true in tcp_child_process().

Message ID 20211209013250.44347-1-kuniyu@amazon.co.jp (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,net-next] tcp: Warn if sock_owned_by_user() is true in tcp_child_process(). | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers warning 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 27 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Iwashima, Kuniyuki Dec. 9, 2021, 1:32 a.m. UTC
While creating a child socket from ACK (not TCP Fast Open case), before
v2.3.41, we used to call bh_lock_sock() later than now; it was called just
before tcp_rcv_state_process().  The full socket was put into an accept
queue and exposed to other CPUs before bh_lock_sock() so that process
context might have acquired the lock by then.  Thus, we had to check if any
process context was accessing the socket before tcp_rcv_state_process().

We can see this code in tcp_v4_do_rcv() of v2.3.14. [0]

	if (sk->state == TCP_LISTEN) {
		struct sock *nsk;

		nsk = tcp_v4_hnd_req(sk, skb);
		...
		if (nsk != sk) {
			bh_lock_sock(nsk);
			if (nsk->lock.users != 0) {
				...
				sk_add_backlog(nsk, skb);
				bh_unlock_sock(nsk);
				return 0;
			}
			...
		}
	}

	if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len))
		goto reset;

However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by
mistake. [1]

		if (nsk != sk) {
			...
			BUG_TRAP(nsk->lock.users == 0);
			...
			ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len);
			...
			bh_unlock_sock(nsk);
			...
			return 0;
		}

Fortunately, the test was back in 2.3.41. [2]  Then, related code was
packed into tcp_child_process() with comments, which remains until now.

What is interesting in v2.3.41 is that the bh_lock_sock() was moved to
tcp_create_openreq_child() and placed just after sock_lock_init().
Thus, the lock is never acquired until tcp_rcv_state_process() by process
contexts.  The bh_lock_sock() is now in sk_clone_lock() and the rule does
not change.  As of now, alas, it is not possible to reach the commented
path by the change.

This patch removes the unreachable path and adds a WARN_ON_ONCE() so that
syzbot can validate if it is dead code or not.  The WARN_ON_ONCE() could
be removed if syzbot is happy for at least one release. [3]

[0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz
[1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz
[2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz
[3]: https://lore.kernel.org/all/CANn89iL+YWbQDCTQU-D1nU4EePv07EyHvMPjFPkpH1ELyzg5MA@mail.gmail.com/

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
I left Fixes: tag as a reference, but if it is unnecessary, please remove
it.

Changelog:
  v2:
    * Add a WARN_ON_ONCE()
    * Clarify TCP Fast Open is not the case

  v1:
  https://lore.kernel.org/all/20211208051633.49122-1-kuniyu@amazon.co.jp/
---
 net/ipv4/tcp_minisocks.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

Comments

Eric Dumazet Dec. 9, 2021, 8 a.m. UTC | #1
On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>
> While creating a child socket from ACK (not TCP Fast Open case), before
> v2.3.41, we used to call bh_lock_sock() later than now; it was called just
> before tcp_rcv_state_process().  The full socket was put into an accept
> queue and exposed to other CPUs before bh_lock_sock() so that process
> context might have acquired the lock by then.  Thus, we had to check if any
> process context was accessing the socket before tcp_rcv_state_process().
>

I think you misunderstood me.

I think this code is not dead yet, so I would :

Not include a Fixes: tag to avoid unnecessary backports (of a patch
and its revert)

If you want to get syzbot coverage for few releases, especially with
MPTCP and synflood,
you  can then submit a patch like the following.

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index cf913a66df17..19da6e442fca 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -843,6 +843,9 @@ int tcp_child_process(struct sock *parent, struct
sock *child,
                 * in main socket hash table and lock on listening
                 * socket does not protect us more.
                 */
+
+               /* Check if this code path is obsolete ? */
+               WARN_ON_ONCE(1);
                __sk_add_backlog(child, skb);
        }
Iwashima, Kuniyuki Dec. 9, 2021, 11:07 a.m. UTC | #2
From:   Eric Dumazet <edumazet@google.com>
Date:   Thu, 9 Dec 2021 00:00:35 -0800
> On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>>
>> While creating a child socket from ACK (not TCP Fast Open case), before
>> v2.3.41, we used to call bh_lock_sock() later than now; it was called just
>> before tcp_rcv_state_process().  The full socket was put into an accept
>> queue and exposed to other CPUs before bh_lock_sock() so that process
>> context might have acquired the lock by then.  Thus, we had to check if any
>> process context was accessing the socket before tcp_rcv_state_process().
>>
> 
> I think you misunderstood me.
> 
> I think this code is not dead yet, so I would :
> 
> Not include a Fixes: tag to avoid unnecessary backports (of a patch
> and its revert)
> 
> If you want to get syzbot coverage for few releases, especially with
> MPTCP and synflood,
> you  can then submit a patch like the following.

Sorry, I got on the same page.
Let me take a look at MPTCP, then if I still think it is dead code, I will
submit the patch.

Thank you.


> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index cf913a66df17..19da6e442fca 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -843,6 +843,9 @@ int tcp_child_process(struct sock *parent, struct
> sock *child,
>                  * in main socket hash table and lock on listening
>                  * socket does not protect us more.
>                  */
> +
> +               /* Check if this code path is obsolete ? */
> +               WARN_ON_ONCE(1);
>                 __sk_add_backlog(child, skb);
>         }
Paolo Abeni Dec. 9, 2021, 11:59 a.m. UTC | #3
On Thu, 2021-12-09 at 20:07 +0900, Kuniyuki Iwashima wrote:
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Thu, 9 Dec 2021 00:00:35 -0800
> > On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > > 
> > > While creating a child socket from ACK (not TCP Fast Open case), before
> > > v2.3.41, we used to call bh_lock_sock() later than now; it was called just
> > > before tcp_rcv_state_process().  The full socket was put into an accept
> > > queue and exposed to other CPUs before bh_lock_sock() so that process
> > > context might have acquired the lock by then.  Thus, we had to check if any
> > > process context was accessing the socket before tcp_rcv_state_process().
> > > 
> > 
> > I think you misunderstood me.
> > 
> > I think this code is not dead yet, so I would :
> > 
> > Not include a Fixes: tag to avoid unnecessary backports (of a patch
> > and its revert)
> > 
> > If you want to get syzbot coverage for few releases, especially with
> > MPTCP and synflood,
> > you  can then submit a patch like the following.
> 
> Sorry, I got on the same page.
> Let me take a look at MPTCP, then if I still think it is dead code, I will
> submit the patch.

For the records, I think the 'else' branch should be reachble with
MPTCP in some non trivial scenario, e.g. MPJ subflows 3WHS racing with
setsockopt on the main MPTCP socket. I'm unsure if syzbot could catch
that, as it needs mptcp endpoints configuration.

Cheers,

Paolo
Iwashima, Kuniyuki Dec. 9, 2021, 12:16 p.m. UTC | #4
From:   Paolo Abeni <pabeni@redhat.com>
Date:   Thu, 09 Dec 2021 12:59:21 +0100
> On Thu, 2021-12-09 at 20:07 +0900, Kuniyuki Iwashima wrote:
>> From:   Eric Dumazet <edumazet@google.com>
>> Date:   Thu, 9 Dec 2021 00:00:35 -0800
>>> On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>>>> 
>>>> While creating a child socket from ACK (not TCP Fast Open case), before
>>>> v2.3.41, we used to call bh_lock_sock() later than now; it was called just
>>>> before tcp_rcv_state_process().  The full socket was put into an accept
>>>> queue and exposed to other CPUs before bh_lock_sock() so that process
>>>> context might have acquired the lock by then.  Thus, we had to check if any
>>>> process context was accessing the socket before tcp_rcv_state_process().
>>>> 
>>> 
>>> I think you misunderstood me.
>>> 
>>> I think this code is not dead yet, so I would :
>>> 
>>> Not include a Fixes: tag to avoid unnecessary backports (of a patch
>>> and its revert)
>>> 
>>> If you want to get syzbot coverage for few releases, especially with
>>> MPTCP and synflood,
>>> you  can then submit a patch like the following.
>> 
>> Sorry, I got on the same page.
>> Let me take a look at MPTCP, then if I still think it is dead code, I will
>> submit the patch.
> 
> For the records, I think the 'else' branch should be reachble with
> MPTCP in some non trivial scenario, e.g. MPJ subflows 3WHS racing with
> setsockopt on the main MPTCP socket. I'm unsure if syzbot could catch
> that, as it needs mptcp endpoints configuration.

Ah, I was wrong.
Thanks for explaining!


> 
> Cheers,
> 
> Paolo
diff mbox series

Patch

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index cf913a66df17..85b1e752da5d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -833,18 +833,15 @@  int tcp_child_process(struct sock *parent, struct sock *child,
 	sk_mark_napi_id(child, skb);
 
 	tcp_segs_in(tcp_sk(child), skb);
-	if (!sock_owned_by_user(child)) {
-		ret = tcp_rcv_state_process(child, skb);
-		/* Wakeup parent, send SIGIO */
-		if (state == TCP_SYN_RECV && child->sk_state != state)
-			parent->sk_data_ready(parent);
-	} else {
-		/* Alas, it is possible again, because we do lookup
-		 * in main socket hash table and lock on listening
-		 * socket does not protect us more.
-		 */
-		__sk_add_backlog(child, skb);
-	}
+
+	/* The lock is held in sk_clone_lock() */
+	WARN_ON_ONCE(sock_owned_by_user(child));
+
+	ret = tcp_rcv_state_process(child, skb);
+
+	/* Wakeup parent, send SIGIO */
+	if (state == TCP_SYN_RECV && child->sk_state != state)
+		parent->sk_data_ready(parent);
 
 	bh_unlock_sock(child);
 	sock_put(child);