diff mbox series

[net-next,v2] tcp: Support skb PAWS drop reason when TIME-WAIT

Message ID 20250325110325.51958-1-jiayuan.chen@linux.dev (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] tcp: Support skb PAWS drop reason when TIME-WAIT | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7 this patch: 7
netdev/build_tools success Errors and warnings before: 26 (+0) this patch: 26 (+0)
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 10 this patch: 10
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 909 this patch: 909
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-03-26--00-00 (tests: 896)

Commit Message

Jiayuan Chen March 25, 2025, 11:03 a.m. UTC
PAWS is a long-standing issue, especially when there are upstream network
devices, making it more prone to occur.

Currently, packet loss statistics for PAWS can only be viewed through MIB,
which is a global metric and cannot be precisely obtained through tracing
to get the specific 4-tuple of the dropped packet. In the past, we had to
use kprobe ret to retrieve relevant skb information from
tcp_timewait_state_process().

We add a drop_reason pointer, similar to what previous commit does:
commit e34100c2ecbb ("tcp: add a drop_reason pointer to tcp_check_req()")

This commit addresses the PAWSESTABREJECTED case and also sets the
corresponding drop reason.

We use 'pwru' to test.

Before this commit:
''''
./pwru 'port 9999'
2025/03/24 13:46:03 Listening for events..
TUPLE                                        FUNC
172.31.75.115:12345->172.31.75.114:9999(tcp) sk_skb_reason_drop(SKB_DROP_REASON_NOT_SPECIFIED)
'''

After this commit:
'''
./pwru 'port 9999'
2025/03/24 16:06:59 Listening for events..
TUPLE                                        FUNC
172.31.75.115:12345->172.31.75.114:9999(tcp) sk_skb_reason_drop(SKB_DROP_REASON_TCP_RFC7323_PAWS)
'''

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
My apologize.
I struggled for a long time to get packetdrill to fix the client port, but
ultimately failed to do so, which is why I couldn't provide a packetdrill
script.
Instead, I wrote my own program to trigger PAWS, which can be found at
https://github.com/mrpre/nettrigger/tree/main
---
 include/net/tcp.h        | 3 ++-
 net/ipv4/tcp_ipv4.c      | 2 +-
 net/ipv4/tcp_minisocks.c | 7 +++++--
 net/ipv6/tcp_ipv6.c      | 2 +-
 4 files changed, 9 insertions(+), 5 deletions(-)

Comments

Eric Dumazet March 25, 2025, 11:29 a.m. UTC | #1
On Tue, Mar 25, 2025 at 12:03 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> PAWS is a long-standing issue, especially when there are upstream network
> devices, making it more prone to occur.
>
> Currently, packet loss statistics for PAWS can only be viewed through MIB,
> which is a global metric and cannot be precisely obtained through tracing
> to get the specific 4-tuple of the dropped packet. In the past, we had to
> use kprobe ret to retrieve relevant skb information from
> tcp_timewait_state_process().
>
> We add a drop_reason pointer, similar to what previous commit does:
> commit e34100c2ecbb ("tcp: add a drop_reason pointer to tcp_check_req()")
>
> This commit addresses the PAWSESTABREJECTED case and also sets the
> corresponding drop reason.
>
> We use 'pwru' to test.
>
> Before this commit:
> ''''
> ./pwru 'port 9999'
> 2025/03/24 13:46:03 Listening for events..
> TUPLE                                        FUNC
> 172.31.75.115:12345->172.31.75.114:9999(tcp) sk_skb_reason_drop(SKB_DROP_REASON_NOT_SPECIFIED)
> '''
>
> After this commit:
> '''
> ./pwru 'port 9999'
> 2025/03/24 16:06:59 Listening for events..
> TUPLE                                        FUNC
> 172.31.75.115:12345->172.31.75.114:9999(tcp) sk_skb_reason_drop(SKB_DROP_REASON_TCP_RFC7323_PAWS)
> '''
>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> My apologize.
> I struggled for a long time to get packetdrill to fix the client port, but
> ultimately failed to do so, which is why I couldn't provide a packetdrill
> script.
> Instead, I wrote my own program to trigger PAWS, which can be found at
> https://github.com/mrpre/nettrigger/tree/main
> ---
>  include/net/tcp.h        | 3 ++-
>  net/ipv4/tcp_ipv4.c      | 2 +-
>  net/ipv4/tcp_minisocks.c | 7 +++++--
>  net/ipv6/tcp_ipv6.c      | 2 +-
>  4 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index f8efe56bbccb..e1574e804530 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -427,7 +427,8 @@ enum tcp_tw_status {
>  enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
>                                               struct sk_buff *skb,
>                                               const struct tcphdr *th,
> -                                             u32 *tw_isn);
> +                                             u32 *tw_isn,
> +                                             enum skb_drop_reason *drop_reason);
>  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>                            struct request_sock *req, bool fastopen,
>                            bool *lost_race, enum skb_drop_reason *drop_reason);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 1cd0938d47e0..a9dde473a23f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2417,7 +2417,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
>                 goto csum_error;
>         }
>
> -       tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn);
> +       tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn, &drop_reason);
>         switch (tw_status) {
>         case TCP_TW_SYN: {
>                 struct sock *sk2 = inet_lookup_listener(net,
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index fb9349be36b8..d16dfd41397e 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -97,7 +97,8 @@ static void twsk_rcv_nxt_update(struct tcp_timewait_sock *tcptw, u32 seq,
>   */
>  enum tcp_tw_status
>  tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> -                          const struct tcphdr *th, u32 *tw_isn)
> +                          const struct tcphdr *th, u32 *tw_isn,
> +                          enum skb_drop_reason *drop_reason)
>  {
>         struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
>         u32 rcv_nxt = READ_ONCE(tcptw->tw_rcv_nxt);
> @@ -245,8 +246,10 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
>                 return TCP_TW_SYN;
>         }
>
> -       if (paws_reject)
> +       if (paws_reject) {
> +               *drop_reason = SKB_DROP_REASON_TCP_RFC7323_PAWS;
>                 __NET_INC_STATS(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED);

I think we should add a new SNMP value and drop reason for TW sockets.

SNMP_MIB_ITEM("PAWSTimewait", LINUX_MIB_PAWSTIMEWAIT),

and SKB_DROP_REASON_TCP_RFC7323_TW_PAWS ?
Jiayuan Chen March 25, 2025, 12:20 p.m. UTC | #2
March 25, 2025 at 19:29, "Eric Dumazet" <edumazet@google.com> wrote:

> >  ---
> > 
> >  include/net/tcp.h | 3 ++-
> > 
> >  net/ipv4/tcp_ipv4.c | 2 +-
> > 
> >  net/ipv4/tcp_minisocks.c | 7 +++++--
> > 
> >  net/ipv6/tcp_ipv6.c | 2 +-
> > 
> >  4 files changed, 9 insertions(+), 5 deletions(-)
> > 
> >  diff --git a/include/net/tcp.h b/include/net/tcp.h
> > 
> >  index f8efe56bbccb..e1574e804530 100644
> > 
> >  --- a/include/net/tcp.h
> > 
> >  +++ b/include/net/tcp.h
> > 
> >  @@ -427,7 +427,8 @@ enum tcp_tw_status {
> > 
> >  enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
> > 
> >  struct sk_buff *skb,
> > 
> >  const struct tcphdr *th,
> > 
> >  - u32 *tw_isn);
> > 
> >  + u32 *tw_isn,
> > 
> >  + enum skb_drop_reason *drop_reason);
> > 
> >  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> > 
> >  struct request_sock *req, bool fastopen,
> > 
> >  bool *lost_race, enum skb_drop_reason *drop_reason);
> > 
> >  diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > 
> >  index 1cd0938d47e0..a9dde473a23f 100644
> > 
> >  --- a/net/ipv4/tcp_ipv4.c
> > 
> >  +++ b/net/ipv4/tcp_ipv4.c
> > 
> >  @@ -2417,7 +2417,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
> > 
> >  goto csum_error;
> > 
> >  }
> > 
> >  - tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn);
> > 
> >  + tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn, &drop_reason);
> > 
> >  switch (tw_status) {
> > 
> >  case TCP_TW_SYN: {
> > 
> >  struct sock *sk2 = inet_lookup_listener(net,
> > 
> >  diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > 
> >  index fb9349be36b8..d16dfd41397e 100644
> > 
> >  --- a/net/ipv4/tcp_minisocks.c
> > 
> >  +++ b/net/ipv4/tcp_minisocks.c
> > 
> >  @@ -97,7 +97,8 @@ static void twsk_rcv_nxt_update(struct tcp_timewait_sock *tcptw, u32 seq,
> > 
> >  */
> > 
> >  enum tcp_tw_status
> > 
> >  tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > 
> >  - const struct tcphdr *th, u32 *tw_isn)
> > 
> >  + const struct tcphdr *th, u32 *tw_isn,
> > 
> >  + enum skb_drop_reason *drop_reason)
> > 
> >  {
> > 
> >  struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
> > 
> >  u32 rcv_nxt = READ_ONCE(tcptw->tw_rcv_nxt);
> > 
> >  @@ -245,8 +246,10 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > 
> >  return TCP_TW_SYN;
> > 
> >  }
> > 
> >  - if (paws_reject)
> > 
> >  + if (paws_reject) {
> > 
> >  + *drop_reason = SKB_DROP_REASON_TCP_RFC7323_PAWS;
> > 
> >  __NET_INC_STATS(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED);
> > 
> 
> I think we should add a new SNMP value and drop reason for TW sockets.
> 
> SNMP_MIB_ITEM("PAWSTimewait", LINUX_MIB_PAWSTIMEWAIT),
> 
> and SKB_DROP_REASON_TCP_RFC7323_TW_PAWS ?
>

That makes sense, we've done similar things before, such as adding
PAWS_OLD_ACK previously.

Thanks for the suggestion!
Jakub Kicinski March 25, 2025, 3:43 p.m. UTC | #3
On Tue, 25 Mar 2025 12:20:18 +0000 Jiayuan Chen wrote:
> pw-bot: cr

Interesting, so you know about pw-bot commands but neither about 
the 24h reposting cool down time, nor about the fact that net-next 
is closed..

Please read:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
Jiayuan Chen March 25, 2025, 4:18 p.m. UTC | #4
March 25, 2025 at 23:43, "Jakub Kicinski" <kuba@kernel.org> wrote:

> 
> On Tue, 25 Mar 2025 12:20:18 +0000 Jiayuan Chen wrote:
> 
> > 
> > pw-bot: cr
> > 
> 
> Interesting, so you know about pw-bot commands but neither about 
> the 24h reposting cool down time, nor about the fact that net-next 
> is closed..
> 
> Please read:
> 
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
>

My apologies, I learned the pw-bot commands from other maintainers' email
responses and then discovered that I could set the patchwork status.
I'll learn the documentation more thoroughly next.

Please ingore my patch; I'll resubmit it after the merge window closes.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f8efe56bbccb..e1574e804530 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -427,7 +427,8 @@  enum tcp_tw_status {
 enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
 					      struct sk_buff *skb,
 					      const struct tcphdr *th,
-					      u32 *tw_isn);
+					      u32 *tw_isn,
+					      enum skb_drop_reason *drop_reason);
 struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 			   struct request_sock *req, bool fastopen,
 			   bool *lost_race, enum skb_drop_reason *drop_reason);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1cd0938d47e0..a9dde473a23f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2417,7 +2417,7 @@  int tcp_v4_rcv(struct sk_buff *skb)
 		goto csum_error;
 	}
 
-	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn);
+	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn, &drop_reason);
 	switch (tw_status) {
 	case TCP_TW_SYN: {
 		struct sock *sk2 = inet_lookup_listener(net,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index fb9349be36b8..d16dfd41397e 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -97,7 +97,8 @@  static void twsk_rcv_nxt_update(struct tcp_timewait_sock *tcptw, u32 seq,
  */
 enum tcp_tw_status
 tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
-			   const struct tcphdr *th, u32 *tw_isn)
+			   const struct tcphdr *th, u32 *tw_isn,
+			   enum skb_drop_reason *drop_reason)
 {
 	struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
 	u32 rcv_nxt = READ_ONCE(tcptw->tw_rcv_nxt);
@@ -245,8 +246,10 @@  tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 		return TCP_TW_SYN;
 	}
 
-	if (paws_reject)
+	if (paws_reject) {
+		*drop_reason = SKB_DROP_REASON_TCP_RFC7323_PAWS;
 		__NET_INC_STATS(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED);
+	}
 
 	if (!th->rst) {
 		/* In this case we must reset the TIMEWAIT timer.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c134cf1a603a..8862e315edc5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1970,7 +1970,7 @@  INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 		goto csum_error;
 	}
 
-	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn);
+	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn, &drop_reason);
 	switch (tw_status) {
 	case TCP_TW_SYN:
 	{