diff mbox series

[net,v2] tcp: Defer ts_recent changes until req is owned

Message ID 20250222103928.12104-1-wanghai38@huawei.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] tcp: Defer ts_recent changes until req is owned | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-22--12-00 (tests: 893)

Commit Message

Wang Hai Feb. 22, 2025, 10:39 a.m. UTC
The same 5-tuple packet may be processed by different CPUSs, so two
CPUs may receive different ack packets at the same time when the
state is TCP_NEW_SYN_RECV.

In that case, req->ts_recent in tcp_check_req may be changed concurrently,
which will probably cause the newsk's ts_recent to be incorrectly large.
So that tcp_validate_incoming will fail.

cpu1                                    cpu2
tcp_check_req
                                        tcp_check_req
 req->ts_recent = rcv_tsval = t1
                                         req->ts_recent = rcv_tsval = t2

 syn_recv_sock
  newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
 tcp_rcv_state_process
  tcp_validate_incoming
   tcp_paws_check
    if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
	// t2 - t1 > paws_win, failed

In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
v1->v2: Modified the fix logic based on Eric's suggestion. Also modified the msg
 net/ipv4/tcp_minisocks.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Jason Xing Feb. 22, 2025, 3:06 p.m. UTC | #1
On Sat, Feb 22, 2025 at 6:41 PM Wang Hai <wanghai38@huawei.com> wrote:
>
> The same 5-tuple packet may be processed by different CPUSs, so two
> CPUs may receive different ack packets at the same time when the
> state is TCP_NEW_SYN_RECV.
>
> In that case, req->ts_recent in tcp_check_req may be changed concurrently,
> which will probably cause the newsk's ts_recent to be incorrectly large.
> So that tcp_validate_incoming will fail.
>
> cpu1                                    cpu2
> tcp_check_req
>                                         tcp_check_req
>  req->ts_recent = rcv_tsval = t1
>                                          req->ts_recent = rcv_tsval = t2
>
>  syn_recv_sock
>   newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> tcp_child_process
>  tcp_rcv_state_process
>   tcp_validate_incoming
>    tcp_paws_check
>     if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
>         // t2 - t1 > paws_win, failed
>
> In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug.

Honestly, from my perspective, the commit message doesn't actually
reflect what the real problem you've encountered is and what the
potential bad result could be. Your previous reply is good and
detailed, at least showing to the readers enough information to help
them revisit or analyze in the future.

>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Wang Hai <wanghai38@huawei.com>

Otherwise, it looks good to me. Thanks!

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

> ---
> v1->v2: Modified the fix logic based on Eric's suggestion. Also modified the msg
>  net/ipv4/tcp_minisocks.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b089b08e9617..53700206f498 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -815,12 +815,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>
>         /* In sequence, PAWS is OK. */
>
> -       /* TODO: We probably should defer ts_recent change once
> -        * we take ownership of @req.
> -        */
> -       if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> -               WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval);
> -
>         if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) {
>                 /* Truncate SYN, it is out of window starting
>                    at tcp_rsk(req)->rcv_isn + 1. */
> @@ -869,6 +863,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>         if (!child)
>                 goto listen_overflow;
>
> +       if (own_req && tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> +               tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> +

nit: I would suggest using the following format if a re-spin is necessary:
+       if (own_req && tmp_opt.saw_tstamp &&
+           !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
+               tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+

Thanks,
Jason

>         if (own_req && rsk_drop_req(req)) {
>                 reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
>                 inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);
> --
> 2.17.1
>
diff mbox series

Patch

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b089b08e9617..53700206f498 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -815,12 +815,6 @@  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 	/* In sequence, PAWS is OK. */
 
-	/* TODO: We probably should defer ts_recent change once
-	 * we take ownership of @req.
-	 */
-	if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
-		WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval);
-
 	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) {
 		/* Truncate SYN, it is out of window starting
 		   at tcp_rsk(req)->rcv_isn + 1. */
@@ -869,6 +863,9 @@  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	if (!child)
 		goto listen_overflow;
 
+	if (own_req && tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
+		tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
 	if (own_req && rsk_drop_req(req)) {
 		reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
 		inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);