diff mbox series

[net-next] tcp: add a scheduling point in established_get_first()

Message ID 20230630071827.2078604-1-wenjian1@xiaomi.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] tcp: add a scheduling point in established_get_first() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 27 this patch: 27
netdev/cc_maintainers warning 3 maintainers not CCed: kuba@kernel.org dsahern@kernel.org pabeni@redhat.com
netdev/build_clang fail Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 27 this patch: 27
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Jian Wen <wenjianhn@gmail.com>' != 'Signed-off-by: Jian Wen <wenjian1@xiaomi.com>'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jian Wen June 30, 2023, 7:18 a.m. UTC
Kubernetes[1] is going to stick with /proc/net/tcp for a while.

This commit reduces the scheduling latency introduced by established_get_first(),
similar to commit acffb584cda7 ("net: diag: add a scheduling point in inet_diag_dump_icsk()").

In our environment, the scheduling latency affects:
1. the performance of latency-sensitive services like Redis
2. the delay of synchronize_net() that is called with RTNL is locked
   12 times when Dockerd is deleting a container

[1] https://github.com/google/cadvisor/blob/v0.47.2/container/libcontainer/handler.go#L130

Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
---
 net/ipv4/tcp_ipv4.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Eric Dumazet June 30, 2023, 10:49 a.m. UTC | #1
On Fri, Jun 30, 2023 at 9:18 AM Jian Wen <wenjianhn@gmail.com> wrote:
>
> Kubernetes[1] is going to stick with /proc/net/tcp for a while.
>
> This commit reduces the scheduling latency introduced by established_get_first(),
> similar to commit acffb584cda7 ("net: diag: add a scheduling point in inet_diag_dump_icsk()").
>
> In our environment, the scheduling latency affects:
> 1. the performance of latency-sensitive services like Redis
> 2. the delay of synchronize_net() that is called with RTNL is locked
>    12 times when Dockerd is deleting a container
>
> [1] https://github.com/google/cadvisor/blob/v0.47.2/container/libcontainer/handler.go#L130
>
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
>  net/ipv4/tcp_ipv4.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fd365de4d5ff..3271848e9c9a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -57,6 +57,7 @@
>  #include <linux/init.h>
>  #include <linux/times.h>
>  #include <linux/slab.h>
> +#include <linux/sched.h>
>
>  #include <net/net_namespace.h>
>  #include <net/icmp.h>
> @@ -2456,6 +2457,7 @@ static void *established_get_first(struct seq_file *seq)
>                                 return sk;
>                 }
>                 spin_unlock_bh(lock);
> +               cond_resched();
>         }
>
>         return NULL;
> --
> 2.25.1
>
Hi Jian, thanks for your patch.

Few points:

- Note that net-next is currently closed (merge window)

- Also, /proc interface does not hold RTNL, not sure why you mention
RTNL in the changelog,
and not other mutexes in the kernel that also would be impacted by the
long duration of established_get_first() ?

- The cond_resched() should be done even if all buckets are empty ?

- Using inet_diag, Kubernetes could list both IPv4/IPv6 sockets in one dump,
and benefit from more modern interface (with cond_resched() already there)
diff mbox series

Patch

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fd365de4d5ff..3271848e9c9a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -57,6 +57,7 @@ 
 #include <linux/init.h>
 #include <linux/times.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 
 #include <net/net_namespace.h>
 #include <net/icmp.h>
@@ -2456,6 +2457,7 @@  static void *established_get_first(struct seq_file *seq)
 				return sk;
 		}
 		spin_unlock_bh(lock);
+		cond_resched();
 	}
 
 	return NULL;