Message ID | 20250224090047.50748-1-wanghai38@huawei.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3,net] tcp: Defer ts_recent changes until req is owned | expand |
On Mon, Feb 24, 2025 at 10:03 AM Wang Hai <wanghai38@huawei.com> wrote: > > Recently a bug was discovered where the server had entered TCP_ESTABLISHED > state, but the upper layers were not notified. > > The same 5-tuple packet may be processed by different CPUSs, so two > CPUs may receive different ack packets at the same time when the > state is TCP_NEW_SYN_RECV. > > In that case, req->ts_recent in tcp_check_req may be changed concurrently, > which will probably cause the newsk's ts_recent to be incorrectly large. > So that tcp_validate_incoming will fail. At this point, newsk will not be > able to enter the TCP_ESTABLISHED. > > cpu1 cpu2 > tcp_check_req > tcp_check_req > req->ts_recent = rcv_tsval = t1 > req->ts_recent = rcv_tsval = t2 > > syn_recv_sock > tcp_sk(child)->rx_opt.ts_recent = req->ts_recent = t2 // t1 < t2 > tcp_child_process > tcp_rcv_state_process > tcp_validate_incoming > tcp_paws_check > if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) > // t2 - t1 > paws_win, failed > tcp_v4_do_rcv > tcp_rcv_state_process > // TCP_ESTABLISHED > > The cpu2's skb or a newly received skb will call tcp_v4_do_rcv to get > the newsk into the TCP_ESTABLISHED state, but at this point it is no > longer possible to notify the upper layer application. A notification > mechanism could be added here, but the fix is more complex, so the > current fix is used. > > In tcp_check_req, req->ts_recent is used to assign a value to > tcp_sk(child)->rx_opt.ts_recent, so removing the change in req->ts_recent > and changing tcp_sk(child)->rx_opt.ts_recent directly after owning the > req fixes this bug. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Wang Hai <wanghai38@huawei.com> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> > --- Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks for the fix !
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index b089b08e9617..dfdb7a4608a8 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -815,12 +815,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, /* In sequence, PAWS is OK. */ - /* TODO: We probably should defer ts_recent change once - * we take ownership of @req. - */ - if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt)) - WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval); - if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) { /* Truncate SYN, it is out of window starting at tcp_rsk(req)->rcv_isn + 1. */ @@ -869,6 +863,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, if (!child) goto listen_overflow; + if (own_req && tmp_opt.saw_tstamp && + !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt)) + tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval; + if (own_req && rsk_drop_req(req)) { reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req); inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);