diff mbox series

[net] tcp: Remove sock_owned_by_user() test in tcp_child_process().

Message ID 20211208051633.49122-1-kuniyu@amazon.co.jp (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: Remove sock_owned_by_user() test in tcp_child_process(). | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers warning 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Iwashima, Kuniyuki Dec. 8, 2021, 5:16 a.m. UTC
While creating a child socket, before v2.3.41, we used to call
bh_lock_sock() later than now; it was called just before
tcp_rcv_state_process().  The full socket was put into an accept queue
and exposed to other CPUs before bh_lock_sock() so that process context
might have acquired the lock by then.  Thus, we had to check if any
process context was accessing the socket before tcp_rcv_state_process().

We can see this code in tcp_v4_do_rcv() of v2.3.14. [0]

	if (sk->state == TCP_LISTEN) {
		struct sock *nsk;

		nsk = tcp_v4_hnd_req(sk, skb);
		...
		if (nsk != sk) {
			bh_lock_sock(nsk);
			if (nsk->lock.users != 0) {
				...
				sk_add_backlog(nsk, skb);
				bh_unlock_sock(nsk);
				return 0;
			}
			...
		}
	}

	if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len))
		goto reset;

However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by
mistake. [1]

		if (nsk != sk) {
			...
			BUG_TRAP(nsk->lock.users == 0);
			...
			ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len);
			...
			bh_unlock_sock(nsk);
			...
			return 0;
		}

Fortunately, the test was back in 2.3.41. [2]  Then, related code was
packed into tcp_child_process() with comments, which remains until now.

What is interesting in v2.3.41 is that the bh_lock_sock() was moved to
tcp_create_openreq_child() and placed just after sock_lock_init().
Thus, the lock is never acquired until tcp_rcv_state_process() by process
contexts.  The bh_lock_sock() is now in sk_clone_lock() and the rule does
not change.

As of now, alas, it is not possible to reach the commented path by the
change.  Let's remove the remnant of the old days.

[0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz
[1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz
[2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/tcp_minisocks.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Eric Dumazet Dec. 8, 2021, 6:30 p.m. UTC | #1
On Tue, Dec 7, 2021 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>
> While creating a child socket, before v2.3.41, we used to call
> bh_lock_sock() later than now; it was called just before
> tcp_rcv_state_process().  The full socket was put into an accept queue
> and exposed to other CPUs before bh_lock_sock() so that process context
> might have acquired the lock by then.  Thus, we had to check if any
> process context was accessing the socket before tcp_rcv_state_process().
>
> We can see this code in tcp_v4_do_rcv() of v2.3.14. [0]
>
>         if (sk->state == TCP_LISTEN) {
>                 struct sock *nsk;
>
>                 nsk = tcp_v4_hnd_req(sk, skb);
>                 ...
>                 if (nsk != sk) {
>                         bh_lock_sock(nsk);
>                         if (nsk->lock.users != 0) {
>                                 ...
>                                 sk_add_backlog(nsk, skb);
>                                 bh_unlock_sock(nsk);
>                                 return 0;
>                         }
>                         ...
>                 }
>         }
>
>         if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len))
>                 goto reset;
>
> However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by
> mistake. [1]
>
>                 if (nsk != sk) {
>                         ...
>                         BUG_TRAP(nsk->lock.users == 0);
>                         ...
>                         ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len);
>                         ...
>                         bh_unlock_sock(nsk);
>                         ...
>                         return 0;
>                 }
>
> Fortunately, the test was back in 2.3.41. [2]  Then, related code was
> packed into tcp_child_process() with comments, which remains until now.
>
> What is interesting in v2.3.41 is that the bh_lock_sock() was moved to
> tcp_create_openreq_child() and placed just after sock_lock_init().
> Thus, the lock is never acquired until tcp_rcv_state_process() by process
> contexts.  The bh_lock_sock() is now in sk_clone_lock() and the rule does
> not change.
>
> As of now, alas, it is not possible to reach the commented path by the
> change.  Let's remove the remnant of the old days.
>
> [0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz
> [1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz
> [2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

I do not think this patch qualifies as a stable candidate.

At best this is a cleanup.

At worst this could add a bug.

I would advise adding a WARN_ON_ONCE() there for at least one release
so that syzbot can validate for you if this is dead code or not.

TCP_SYN_RECV is not TCP_NEW_SYN_RECV

Thanks.

> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> ---
>  net/ipv4/tcp_minisocks.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index 7c2d3ac2363a..b4a1f8728093 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -833,18 +833,12 @@ int tcp_child_process(struct sock *parent, struct sock *child,
>         sk_mark_napi_id_set(child, skb);
>
>         tcp_segs_in(tcp_sk(child), skb);
> -       if (!sock_owned_by_user(child)) {
> -               ret = tcp_rcv_state_process(child, skb);
> -               /* Wakeup parent, send SIGIO */
> -               if (state == TCP_SYN_RECV && child->sk_state != state)
> -                       parent->sk_data_ready(parent);
> -       } else {
> -               /* Alas, it is possible again, because we do lookup
> -                * in main socket hash table and lock on listening
> -                * socket does not protect us more.
> -                */
> -               __sk_add_backlog(child, skb);
> -       }
> +
> +       ret = tcp_rcv_state_process(child, skb);
> +
> +       /* Wakeup parent, send SIGIO */
> +       if (state == TCP_SYN_RECV && child->sk_state != state)
> +               parent->sk_data_ready(parent);
>
>         bh_unlock_sock(child);
>         sock_put(child);
> --
> 2.30.2
>
Iwashima, Kuniyuki Dec. 9, 2021, 12:21 a.m. UTC | #2
From:   Eric Dumazet <edumazet@google.com>
Date:   Wed, 8 Dec 2021 10:30:49 -0800
> On Tue, Dec 7, 2021 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>>
>> While creating a child socket, before v2.3.41, we used to call
>> bh_lock_sock() later than now; it was called just before
>> tcp_rcv_state_process().  The full socket was put into an accept queue
>> and exposed to other CPUs before bh_lock_sock() so that process context
>> might have acquired the lock by then.  Thus, we had to check if any
>> process context was accessing the socket before tcp_rcv_state_process().
>>
>> We can see this code in tcp_v4_do_rcv() of v2.3.14. [0]
>>
>>         if (sk->state == TCP_LISTEN) {
>>                 struct sock *nsk;
>>
>>                 nsk = tcp_v4_hnd_req(sk, skb);
>>                 ...
>>                 if (nsk != sk) {
>>                         bh_lock_sock(nsk);
>>                         if (nsk->lock.users != 0) {
>>                                 ...
>>                                 sk_add_backlog(nsk, skb);
>>                                 bh_unlock_sock(nsk);
>>                                 return 0;
>>                         }
>>                         ...
>>                 }
>>         }
>>
>>         if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len))
>>                 goto reset;
>>
>> However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by
>> mistake. [1]
>>
>>                 if (nsk != sk) {
>>                         ...
>>                         BUG_TRAP(nsk->lock.users == 0);
>>                         ...
>>                         ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len);
>>                         ...
>>                         bh_unlock_sock(nsk);
>>                         ...
>>                         return 0;
>>                 }
>>
>> Fortunately, the test was back in 2.3.41. [2]  Then, related code was
>> packed into tcp_child_process() with comments, which remains until now.
>>
>> What is interesting in v2.3.41 is that the bh_lock_sock() was moved to
>> tcp_create_openreq_child() and placed just after sock_lock_init().
>> Thus, the lock is never acquired until tcp_rcv_state_process() by process
>> contexts.  The bh_lock_sock() is now in sk_clone_lock() and the rule does
>> not change.
>>
>> As of now, alas, it is not possible to reach the commented path by the
>> change.  Let's remove the remnant of the old days.
>>
>> [0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz
>> [1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz
>> [2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> 
> I do not think this patch qualifies as a stable candidate.
> 
> At best this is a cleanup.
> 
> At worst this could add a bug.
> 
> I would advise adding a WARN_ON_ONCE() there for at least one release
> so that syzbot can validate for you if this is dead code or not.

Thanks for review.
I will add a WARN_ON_ONCE() and respin for net-next.

> 
> TCP_SYN_RECV is not TCP_NEW_SYN_RECV

Right, TCP_SYN_RECV is not the case.
"While creating a child socket," was a bit misleading.
I will clarify that is for TCP_NEW_SYN_RECV case and SYN cookie case.


> 
> Thanks.
> 
>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>> ---
>>  net/ipv4/tcp_minisocks.c | 18 ++++++------------
>>  1 file changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>> index 7c2d3ac2363a..b4a1f8728093 100644
>> --- a/net/ipv4/tcp_minisocks.c
>> +++ b/net/ipv4/tcp_minisocks.c
>> @@ -833,18 +833,12 @@ int tcp_child_process(struct sock *parent, struct sock *child,
>>         sk_mark_napi_id_set(child, skb);
>>
>>         tcp_segs_in(tcp_sk(child), skb);
>> -       if (!sock_owned_by_user(child)) {
>> -               ret = tcp_rcv_state_process(child, skb);
>> -               /* Wakeup parent, send SIGIO */
>> -               if (state == TCP_SYN_RECV && child->sk_state != state)
>> -                       parent->sk_data_ready(parent);
>> -       } else {
>> -               /* Alas, it is possible again, because we do lookup
>> -                * in main socket hash table and lock on listening
>> -                * socket does not protect us more.
>> -                */
>> -               __sk_add_backlog(child, skb);
>> -       }
>> +
>> +       ret = tcp_rcv_state_process(child, skb);
>> +
>> +       /* Wakeup parent, send SIGIO */
>> +       if (state == TCP_SYN_RECV && child->sk_state != state)
>> +               parent->sk_data_ready(parent);
>>
>>         bh_unlock_sock(child);
>>         sock_put(child);
>> --
>> 2.30.2
>>
diff mbox series

Patch

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 7c2d3ac2363a..b4a1f8728093 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -833,18 +833,12 @@  int tcp_child_process(struct sock *parent, struct sock *child,
 	sk_mark_napi_id_set(child, skb);
 
 	tcp_segs_in(tcp_sk(child), skb);
-	if (!sock_owned_by_user(child)) {
-		ret = tcp_rcv_state_process(child, skb);
-		/* Wakeup parent, send SIGIO */
-		if (state == TCP_SYN_RECV && child->sk_state != state)
-			parent->sk_data_ready(parent);
-	} else {
-		/* Alas, it is possible again, because we do lookup
-		 * in main socket hash table and lock on listening
-		 * socket does not protect us more.
-		 */
-		__sk_add_backlog(child, skb);
-	}
+
+	ret = tcp_rcv_state_process(child, skb);
+
+	/* Wakeup parent, send SIGIO */
+	if (state == TCP_SYN_RECV && child->sk_state != state)
+		parent->sk_data_ready(parent);
 
 	bh_unlock_sock(child);
 	sock_put(child);