diff mbox series

net: Provide sysctl to tune local port range to IANA specification

Message ID 202407241403542217WOxM8U3ABv-nWZT068xe@zte.com.cn (mailing list archive)
State Changes Requested
Headers show
Series net: Provide sysctl to tune local port range to IANA specification | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 273 this patch: 273
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 281 this patch: 281
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 281 this patch: 282
netdev/checkpatch warning WARNING: Co-developed-by and Signed-off-by: name/email do not match WARNING: line length of 83 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

jiang.kun2@zte.com.cn July 24, 2024, 6:03 a.m. UTC
From: Fan Yu <fan.yu9@zte.com.cn>

The Importance of Following IANA Standards
========================================
IANA specifies User ports as 1024-49151, and it just so happens
that my application uses port 33060 (reserved for MySQL Database Extended),
which conflicts with the Linux default dynamic port range (32768-60999)[1].

In fact, IANA assigns numbers in port range from 32768 to 49151,
which is uniformly accepted by the industry. To do this,
it is necessary for the kernel to follow the IANA specification.

Drawbacks of existing implementations
========================================
In past discussions, follow the IANA specification by modifying the
system defaults has been discouraged, which would greatly affect
existing users[2].

Theoretically, this can be done by tuning net.ipv4.local_port_range,
but there are inconveniences such as:
(1) For cloud-native scenarios, each container is expected to follow
the IANA specification uniformly, so it is necessary to do sysctl
configuration in each container individually, which increases the user's
resource management costs.
(2) For new applications, since sysctl(net.ipv4.local_port_range) is
isolated across namespaces, the container cannot inherit the host's value,
so after startup, it remains at the kernel default value of 32768-60999,
which reduces the ease of use of the system.

Solution
========================================
In order to maintain compatibility, we provide a sysctl interface in
host namespace, which makes it easy to tune local port range to
IANA specification.

When ip_local_port_range_use_iana=1, the local port range of all network
namespaces is tuned to IANA specification (49152-60999), and IANA
specification is also used for newly created network namespaces. Therefore,
each container does not need to do sysctl settings separately, which
improves the convenience of configuration.
When ip_local_port_range_use_iana=0, the local port range of all network
namespaces are tuned to the original kernel defaults (32768-60999).
For example:
	# cat /proc/sys/net/ipv4/ip_local_port_range 
	32768   60999
	# echo 1 > /proc/sys/net/ipv4/ip_local_port_range_use_iana
	# cat /proc/sys/net/ipv4/ip_local_port_range 
	49152   60999

	# unshare -n
	# cat /proc/sys/net/ipv4/ip_local_port_range 
	49152   60999

Notes
========================================
The lower value(49152), consistent with IANA dynamic port lower limit.
The upper limit value(60999), which differs from the IANA dynamic upper
limit due to the fact that Linux will use 61000-65535 as masquarading/NAT,
but this does not conflict with the IANA specification[3].

Note that following the above specification reduces the number of ephemeral
ports by half, increasing the risk of port exhaustion[2].

[1]:https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
[2]:https://lore.kernel.org/all/bf42f6fd-cd06-02d6-d7b6-233a0602c437@gmail.com/
[3]:https://lore.kernel.org/all/20070512210830.514c7709@the-village.bc.nu/

Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn>
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Reviewed-by: Qiang Tu <tu.qiang35@zte.com.cn>
Reviewed-by: Peilin He<he.peilin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
---
 Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
 net/ipv4/af_inet.c                     |  7 ++++++-
 net/ipv4/sysctl_net_ipv4.c             | 31 +++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)

Comments

Eric Dumazet July 24, 2024, 9:59 a.m. UTC | #1
On Wed, Jul 24, 2024 at 8:04 AM <jiang.kun2@zte.com.cn> wrote:
>
> From: Fan Yu <fan.yu9@zte.com.cn>
>
> The Importance of Following IANA Standards
> ========================================
> IANA specifies User ports as 1024-49151, and it just so happens
> that my application uses port 33060 (reserved for MySQL Database Extended),
> which conflicts with the Linux default dynamic port range (32768-60999)[1].
>
> In fact, IANA assigns numbers in port range from 32768 to 49151,
> which is uniformly accepted by the industry. To do this,
> it is necessary for the kernel to follow the IANA specification.
>
> Drawbacks of existing implementations
> ========================================
> In past discussions, follow the IANA specification by modifying the
> system defaults has been discouraged, which would greatly affect
> existing users[2].
>
> Theoretically, this can be done by tuning net.ipv4.local_port_range,
> but there are inconveniences such as:
> (1) For cloud-native scenarios, each container is expected to follow
> the IANA specification uniformly, so it is necessary to do sysctl
> configuration in each container individually, which increases the user's
> resource management costs.
> (2) For new applications, since sysctl(net.ipv4.local_port_range) is
> isolated across namespaces, the container cannot inherit the host's value,
> so after startup, it remains at the kernel default value of 32768-60999,
> which reduces the ease of use of the system.
>
> Solution
> ========================================
> In order to maintain compatibility, we provide a sysctl interface in
> host namespace, which makes it easy to tune local port range to
> IANA specification.
>
> When ip_local_port_range_use_iana=1, the local port range of all network
> namespaces is tuned to IANA specification (49152-60999), and IANA
> specification is also used for newly created network namespaces. Therefore,
> each container does not need to do sysctl settings separately, which
> improves the convenience of configuration.
> When ip_local_port_range_use_iana=0, the local port range of all network
> namespaces are tuned to the original kernel defaults (32768-60999).
> For example:
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         32768   60999
>         # echo 1 > /proc/sys/net/ipv4/ip_local_port_range_use_iana
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         49152   60999
>
>         # unshare -n
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         49152   60999
>
> Notes
> ========================================
> The lower value(49152), consistent with IANA dynamic port lower limit.
> The upper limit value(60999), which differs from the IANA dynamic upper
> limit due to the fact that Linux will use 61000-65535 as masquarading/NAT,
> but this does not conflict with the IANA specification[3].
>
> Note that following the above specification reduces the number of ephemeral
> ports by half, increasing the risk of port exhaustion[2].
>
> [1]:https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
> [2]:https://lore.kernel.org/all/bf42f6fd-cd06-02d6-d7b6-233a0602c437@gmail.com/
> [3]:https://lore.kernel.org/all/20070512210830.514c7709@the-village.bc.nu/
>
> Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Reviewed-by: xu xin <xu.xin16@zte.com.cn>
> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
> Reviewed-by: Qiang Tu <tu.qiang35@zte.com.cn>
> Reviewed-by: Peilin He<he.peilin@zte.com.cn>
> Cc: Yang Yang <yang.yang29@zte.com.cn>
> ---
>  Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
>  net/ipv4/af_inet.c                     |  7 ++++++-
>  net/ipv4/sysctl_net_ipv4.c             | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index bd50df6a5a42..27f4928c2a1d 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1320,6 +1320,19 @@ ip_local_port_range - 2 INTEGERS
>         Must be greater than or equal to ip_unprivileged_port_start.
>         The default values are 32768 and 60999 respectively.
>
> +ip_local_port_range_use_iana - BOOLEAN
> +       Tune ip_local_port_range to IANA specification easily.
> +       When ip_local_port_range_use_iana=1, the local port range of
> +       all network namespaces is tuned to IANA specification (49152-60999),
> +       and IANA specification is also used for newly created network namespaces.
> +       Therefore, each container does not need to do sysctl settings separately,
> +       which improves the convenience of configuration.
> +       When ip_local_port_range_use_iana=0, the local port range of
> +       all network namespaces are tuned to the original kernel
> +       defaults (32768-60999).
> +

IANA means : Internet Assigned Numbers Authority

It is very possible a future RFC changes the actual ranges.

I would have used rfc 6335, because when a new rfc comes in 2030, we
will have to add a new sysctl, right ?

> +       Default: 0
> +
>  ip_local_reserved_ports - list of comma separated ranges
>         Specify the ports which are reserved for known third-party
>         applications. These ports will not be used by automatic port
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b24d74616637..42b6bc58dc45 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -123,6 +123,8 @@
>
>  #include <trace/events/sock.h>
>
> +extern u8 sysctl_ip_local_port_range_use_iana;
> +
>  /* The inetsw table contains everything that inet_create needs to
>   * build a new socket.
>   */
> @@ -1802,7 +1804,10 @@ static __net_init int inet_init_net(struct net *net)
>         /*
>          * Set defaults for local port range
>          */
> -       net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;
> +       if (sysctl_ip_local_port_range_use_iana)
> +               net->ipv4.ip_local_ports.range = 60999u << 16 | 49152u;
> +       else
> +               net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;
>
>         seqlock_init(&net->ipv4.ping_group_range.lock);
>         /*
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 162a0a3b6ba5..a38447889072 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -45,6 +45,8 @@ static unsigned int tcp_child_ehash_entries_max = 16 * 1024 * 1024;
>  static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
>  static int tcp_plb_max_rounds = 31;
>  static int tcp_plb_max_cong_thresh = 256;
> +u8 sysctl_ip_local_port_range_use_iana;
> +EXPORT_SYMBOL(sysctl_ip_local_port_range_use_iana);
>
>  /* obsolete */
>  static int sysctl_tcp_low_latency __read_mostly;
> @@ -95,6 +97,26 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
>         return ret;
>  }
>
> +static int ipv4_local_port_range_use_iana(struct ctl_table *table, int write,
> +                                         void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct net *net;
> +       int ret;
> +
> +       ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos);
> +
> +       if (write && ret == 0) {
> +               for_each_net(net) {

This is quite buggy.

for_each_net() can only be used with care, otherwise list can be
corrupted, netns can disappear under you.
Eric Dumazet July 24, 2024, 10:04 a.m. UTC | #2
On Wed, Jul 24, 2024 at 8:04 AM <jiang.kun2@zte.com.cn> wrote:


...

>
> Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Reviewed-by: xu xin <xu.xin16@zte.com.cn>
> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
> Reviewed-by: Qiang Tu <tu.qiang35@zte.com.cn>
> Reviewed-by: Peilin He<he.peilin@zte.com.cn>
> Cc: Yang Yang <yang.yang29@zte.com.cn>

This long list of reviewers, and having all of them missing the
net_rwsem requirement
for using for_each_net() is alarming.
Stephen Hemminger July 24, 2024, 4:25 p.m. UTC | #3
On Wed, 24 Jul 2024 14:03:54 +0800 (CST)
<jiang.kun2@zte.com.cn> wrote:

> From: Fan Yu <fan.yu9@zte.com.cn>
> 
> The Importance of Following IANA Standards
> ========================================
> IANA specifies User ports as 1024-49151, and it just so happens
> that my application uses port 33060 (reserved for MySQL Database Extended),
> which conflicts with the Linux default dynamic port range (32768-60999)[1].
> 
> In fact, IANA assigns numbers in port range from 32768 to 49151,
> which is uniformly accepted by the industry. To do this,
> it is necessary for the kernel to follow the IANA specification.
> 
> Drawbacks of existing implementations
> ========================================
> In past discussions, follow the IANA specification by modifying the
> system defaults has been discouraged, which would greatly affect
> existing users[2].
> 
> Theoretically, this can be done by tuning net.ipv4.local_port_range,
> but there are inconveniences such as:
> (1) For cloud-native scenarios, each container is expected to follow
> the IANA specification uniformly, so it is necessary to do sysctl
> configuration in each container individually, which increases the user's
> resource management costs.
> (2) For new applications, since sysctl(net.ipv4.local_port_range) is
> isolated across namespaces, the container cannot inherit the host's value,
> so after startup, it remains at the kernel default value of 32768-60999,
> which reduces the ease of use of the system.
> 
> Solution
> ========================================
> In order to maintain compatibility, we provide a sysctl interface in
> host namespace, which makes it easy to tune local port range to
> IANA specification.
> 
> When ip_local_port_range_use_iana=1, the local port range of all network
> namespaces is tuned to IANA specification (49152-60999), and IANA
> specification is also used for newly created network namespaces. Therefore,
> each container does not need to do sysctl settings separately, which
> improves the convenience of configuration.
> When ip_local_port_range_use_iana=0, the local port range of all network
> namespaces are tuned to the original kernel defaults (32768-60999).
> For example:
> 	# cat /proc/sys/net/ipv4/ip_local_port_range 
> 	32768   60999
> 	# echo 1 > /proc/sys/net/ipv4/ip_local_port_range_use_iana
> 	# cat /proc/sys/net/ipv4/ip_local_port_range 
> 	49152   60999
> 
> 	# unshare -n
> 	# cat /proc/sys/net/ipv4/ip_local_port_range 
> 	49152   60999
> 
> Notes
> ========================================
> The lower value(49152), consistent with IANA dynamic port lower limit.
> The upper limit value(60999), which differs from the IANA dynamic upper
> limit due to the fact that Linux will use 61000-65535 as masquarading/NAT,
> but this does not conflict with the IANA specification[3].
> 
> Note that following the above specification reduces the number of ephemeral
> ports by half, increasing the risk of port exhaustion[2].
> 
> [1]:https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
> [2]:https://lore.kernel.org/all/bf42f6fd-cd06-02d6-d7b6-233a0602c437@gmail.com/
> [3]:https://lore.kernel.org/all/20070512210830.514c7709@the-village.bc.nu/
> 
> Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
> Reviewed-by: xu xin <xu.xin16@zte.com.cn>
> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
> Reviewed-by: Qiang Tu <tu.qiang35@zte.com.cn>
> Reviewed-by: Peilin He<he.peilin@zte.com.cn>
> Cc: Yang Yang <yang.yang29@zte.com.cn>
> ---

Yet another NAK

Rather than buggy and verbose new sysctl, why not just allow setting
the port range you want through existing sysctls?

You can configure this through existing sysctl files and startup in your distro.
diff mbox series

Patch

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index bd50df6a5a42..27f4928c2a1d 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1320,6 +1320,19 @@  ip_local_port_range - 2 INTEGERS
 	Must be greater than or equal to ip_unprivileged_port_start.
 	The default values are 32768 and 60999 respectively.

+ip_local_port_range_use_iana - BOOLEAN
+	Tune ip_local_port_range to IANA specification easily.
+	When ip_local_port_range_use_iana=1, the local port range of
+	all network namespaces is tuned to IANA specification (49152-60999),
+	and IANA specification is also used for newly created network namespaces.
+	Therefore, each container does not need to do sysctl settings separately,
+	which improves the convenience of configuration.
+	When ip_local_port_range_use_iana=0, the local port range of
+	all network namespaces are tuned to the original kernel
+	defaults (32768-60999).
+
+	Default: 0
+
 ip_local_reserved_ports - list of comma separated ranges
 	Specify the ports which are reserved for known third-party
 	applications. These ports will not be used by automatic port
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b24d74616637..42b6bc58dc45 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -123,6 +123,8 @@ 

 #include <trace/events/sock.h>

+extern u8 sysctl_ip_local_port_range_use_iana;
+
 /* The inetsw table contains everything that inet_create needs to
  * build a new socket.
  */
@@ -1802,7 +1804,10 @@  static __net_init int inet_init_net(struct net *net)
 	/*
 	 * Set defaults for local port range
 	 */
-	net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;
+	if (sysctl_ip_local_port_range_use_iana)
+		net->ipv4.ip_local_ports.range = 60999u << 16 | 49152u;
+	else
+		net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;

 	seqlock_init(&net->ipv4.ping_group_range.lock);
 	/*
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 162a0a3b6ba5..a38447889072 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -45,6 +45,8 @@  static unsigned int tcp_child_ehash_entries_max = 16 * 1024 * 1024;
 static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
 static int tcp_plb_max_rounds = 31;
 static int tcp_plb_max_cong_thresh = 256;
+u8 sysctl_ip_local_port_range_use_iana;
+EXPORT_SYMBOL(sysctl_ip_local_port_range_use_iana);

 /* obsolete */
 static int sysctl_tcp_low_latency __read_mostly;
@@ -95,6 +97,26 @@  static int ipv4_local_port_range(struct ctl_table *table, int write,
 	return ret;
 }

+static int ipv4_local_port_range_use_iana(struct ctl_table *table, int write,
+					  void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct net *net;
+	int ret;
+
+	ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos);
+
+	if (write && ret == 0) {
+		for_each_net(net) {
+			if (sysctl_ip_local_port_range_use_iana)
+				set_local_port_range(net, 49152u, 60999u);
+			else
+				set_local_port_range(net, 32768u, 60999u);
+		}
+	}
+
+	return ret;
+}
+
 /* Validate changes from /proc interface. */
 static int ipv4_privileged_ports(struct ctl_table *table, int write,
 				void *buffer, size_t *lenp, loff_t *ppos)
@@ -575,6 +597,15 @@  static struct ctl_table ipv4_table[] = {
 		.extra1		= &sysctl_fib_sync_mem_min,
 		.extra2		= &sysctl_fib_sync_mem_max,
 	},
+	{
+		.procname	= "ip_local_port_range_use_iana",
+		.data		= &sysctl_ip_local_port_range_use_iana,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= ipv4_local_port_range_use_iana,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 };

 static struct ctl_table ipv4_net_table[] = {