diff mbox series

sock: allow reading and changing sk_userlocks with setsockopt

Message ID 20210730105406.318726-1-ptikhomirov@virtuozzo.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series sock: allow reading and changing sk_userlocks with setsockopt | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Pavel Tikhomirov July 30, 2021, 10:54 a.m. UTC
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
tcp_sndbuf_expand()). If we've just created a new socket this adjustment
is enabled on it, but if one changes the socket buffer size by
setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.

CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
restore as it first needs to increase buffer sizes for packet queues
restore and second it needs to restore back original buffer sizes. So
after CRIU restore all sockets become non-auto-adjustable, which can
decrease network performance of restored applications significantly.

CRIU need to be able to restore sockets with enabled/disabled adjustment
to the same state it was before dump, so let's add special setsockopt
for it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
Here is a corresponding CRIU commits using these new feature to fix slow
download speed problem after migration:
https://github.com/checkpoint-restore/criu/pull/1568 

Origin of the problem:

We have a customer in Virtuozzo who mentioned that nginx server becomes
slower after container migration. Especially it is easy to mention when
you wget some big file via localhost from the same container which was
just migrated. 
 
By strace-ing all nginx processes I see that nginx worker process before
c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it
only succeeds to send by small chunks ~64Kb.

Before: 
sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180> 
 
After: 
sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675> 

Smaller buffer can explain the decrease in download speed. So as a POC I
just commented out all buffer setting manipulations and that helped.

---
 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h     |  2 ++
 net/core/sock.c                       | 12 ++++++++++++
 6 files changed, 22 insertions(+)

Comments

Paolo Abeni July 30, 2021, 1:13 p.m. UTC | #1
On Fri, 2021-07-30 at 13:54 +0300, Pavel Tikhomirov wrote:
> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
> is enabled on it, but if one changes the socket buffer size by
> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
> 
> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
> restore as it first needs to increase buffer sizes for packet queues
> restore and second it needs to restore back original buffer sizes. So
> after CRIU restore all sockets become non-auto-adjustable, which can
> decrease network performance of restored applications significantly.

I'm wondering if you could just tune tcp_rmem instead?

> CRIU need to be able to restore sockets with enabled/disabled adjustment
> to the same state it was before dump, so let's add special setsockopt
> for it.
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
> Here is a corresponding CRIU commits using these new feature to fix slow
> download speed problem after migration:
> https://github.com/checkpoint-restore/criu/pull/1568 
> 
> Origin of the problem:
> 
> We have a customer in Virtuozzo who mentioned that nginx server becomes
> slower after container migration. Especially it is easy to mention when
> you wget some big file via localhost from the same container which was
> just migrated. 
>  
> By strace-ing all nginx processes I see that nginx worker process before
> c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it
> only succeeds to send by small chunks ~64Kb.
> 
> Before: 
> sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180> 
>  
> After: 
> sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675> 
> 
> Smaller buffer can explain the decrease in download speed. So as a POC I
> just commented out all buffer setting manipulations and that helped.
> 
> ---
>  arch/alpha/include/uapi/asm/socket.h  |  2 ++
>  arch/mips/include/uapi/asm/socket.h   |  2 ++
>  arch/parisc/include/uapi/asm/socket.h |  2 ++
>  arch/sparc/include/uapi/asm/socket.h  |  2 ++
>  include/uapi/asm-generic/socket.h     |  2 ++
>  net/core/sock.c                       | 12 ++++++++++++
>  6 files changed, 22 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index 6b3daba60987..1dd9baf4a6c2 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -129,6 +129,8 @@
>  
>  #define SO_NETNS_COOKIE		71
>  
> +#define SO_BUF_LOCK		72
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index cdf404a831b2..1eaf6a1ca561 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -140,6 +140,8 @@
>  
>  #define SO_NETNS_COOKIE		71
>  
> +#define SO_BUF_LOCK		72
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index 5b5351cdcb33..8baaad52d799 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -121,6 +121,8 @@
>  
>  #define SO_NETNS_COOKIE		0x4045
>  
> +#define SO_BUF_LOCK		0x4046
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 92675dc380fa..e80ee8641ac3 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -122,6 +122,8 @@
>  
>  #define SO_NETNS_COOKIE          0x0050
>  
> +#define SO_BUF_LOCK              0x0051
> +
>  #if !defined(__KERNEL__)
>  
>  
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index d588c244ec2f..1f0a2b4864e4 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -124,6 +124,8 @@
>  
>  #define SO_NETNS_COOKIE		71
>  
> +#define SO_BUF_LOCK		72
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/net/core/sock.c b/net/core/sock.c
> index a3eea6e0b30a..843094f069f3 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1357,6 +1357,14 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>  		ret = sock_bindtoindex_locked(sk, val);
>  		break;
>  
> +	case SO_BUF_LOCK:
> +		{
> +		int mask = SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK;

What about define a marco with the above mask, and avoid the local
variable declaration and brackets??!

Thanks!

Paolo
Pavel Tikhomirov July 30, 2021, 2:21 p.m. UTC | #2
On 30.07.2021 16:13, Paolo Abeni wrote:
> On Fri, 2021-07-30 at 13:54 +0300, Pavel Tikhomirov wrote:
>> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
>> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
>> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
>> is enabled on it, but if one changes the socket buffer size by
>> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
>>
>> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
>> restore as it first needs to increase buffer sizes for packet queues
>> restore and second it needs to restore back original buffer sizes. So
>> after CRIU restore all sockets become non-auto-adjustable, which can
>> decrease network performance of restored applications significantly.
> 
> I'm wondering if you could just tune tcp_rmem instead?

It would not help with the lack of information about if a socket is in 
auto-adjusted mode or not.

Though, yes in some part it helps. We can set tcp_rmem[1] before 
creating each socket to the buffer size we want for this socket. This 
way we would leave all sockets in autoadjusted state.

But a) it would be tcp-only approach and b) with this approach we need 
to create socket with it's final size and never change it. We would not 
be able to temporary increase buffer size like we do in socket queue 
restore code... (I see that initially we had some problem so that we 
needed to increase the buffer size 
https://lists.openvz.org/pipermail/criu/2012-November/005477.html but 
I'm not sure about real reason here, probably Andrey will be able to 
remember, almost 10 years passed since when.)

> 
>> CRIU need to be able to restore sockets with enabled/disabled adjustment
>> to the same state it was before dump, so let's add special setsockopt
>> for it.
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>> Here is a corresponding CRIU commits using these new feature to fix slow
>> download speed problem after migration:
>> https://github.com/checkpoint-restore/criu/pull/1568
>>
>> Origin of the problem:
>>
>> We have a customer in Virtuozzo who mentioned that nginx server becomes
>> slower after container migration. Especially it is easy to mention when
>> you wget some big file via localhost from the same container which was
>> just migrated.
>>   
>> By strace-ing all nginx processes I see that nginx worker process before
>> c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it
>> only succeeds to send by small chunks ~64Kb.
>>
>> Before:
>> sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180>
>>   
>> After:
>> sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675>
>>
>> Smaller buffer can explain the decrease in download speed. So as a POC I
>> just commented out all buffer setting manipulations and that helped.
>>
>> ---
>>   arch/alpha/include/uapi/asm/socket.h  |  2 ++
>>   arch/mips/include/uapi/asm/socket.h   |  2 ++
>>   arch/parisc/include/uapi/asm/socket.h |  2 ++
>>   arch/sparc/include/uapi/asm/socket.h  |  2 ++
>>   include/uapi/asm-generic/socket.h     |  2 ++
>>   net/core/sock.c                       | 12 ++++++++++++
>>   6 files changed, 22 insertions(+)
>>
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index 6b3daba60987..1dd9baf4a6c2 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -129,6 +129,8 @@
>>   
>>   #define SO_NETNS_COOKIE		71
>>   
>> +#define SO_BUF_LOCK		72
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index cdf404a831b2..1eaf6a1ca561 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -140,6 +140,8 @@
>>   
>>   #define SO_NETNS_COOKIE		71
>>   
>> +#define SO_BUF_LOCK		72
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index 5b5351cdcb33..8baaad52d799 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -121,6 +121,8 @@
>>   
>>   #define SO_NETNS_COOKIE		0x4045
>>   
>> +#define SO_BUF_LOCK		0x4046
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 92675dc380fa..e80ee8641ac3 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -122,6 +122,8 @@
>>   
>>   #define SO_NETNS_COOKIE          0x0050
>>   
>> +#define SO_BUF_LOCK              0x0051
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index d588c244ec2f..1f0a2b4864e4 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -124,6 +124,8 @@
>>   
>>   #define SO_NETNS_COOKIE		71
>>   
>> +#define SO_BUF_LOCK		72
>> +
>>   #if !defined(__KERNEL__)
>>   
>>   #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index a3eea6e0b30a..843094f069f3 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1357,6 +1357,14 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>>   		ret = sock_bindtoindex_locked(sk, val);
>>   		break;
>>   
>> +	case SO_BUF_LOCK:
>> +		{
>> +		int mask = SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK;
> 
> What about define a marco with the above mask, and avoid the local
> variable declaration and brackets??!

Sure, will do.

> 
> Thanks!
> 
> Paolo
>
diff mbox series

Patch

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 6b3daba60987..1dd9baf4a6c2 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -129,6 +129,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index cdf404a831b2..1eaf6a1ca561 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -140,6 +140,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 5b5351cdcb33..8baaad52d799 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -121,6 +121,8 @@ 
 
 #define SO_NETNS_COOKIE		0x4045
 
+#define SO_BUF_LOCK		0x4046
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 92675dc380fa..e80ee8641ac3 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -122,6 +122,8 @@ 
 
 #define SO_NETNS_COOKIE          0x0050
 
+#define SO_BUF_LOCK              0x0051
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index d588c244ec2f..1f0a2b4864e4 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -124,6 +124,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index a3eea6e0b30a..843094f069f3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1357,6 +1357,14 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 		ret = sock_bindtoindex_locked(sk, val);
 		break;
 
+	case SO_BUF_LOCK:
+		{
+		int mask = SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK;
+
+		sk->sk_userlocks = (sk->sk_userlocks & ~mask) | (val & mask);
+		break;
+		}
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1719,6 +1727,10 @@  int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val64 = sock_net(sk)->net_cookie;
 		break;
 
+	case SO_BUF_LOCK:
+		v.val = sk->sk_userlocks & (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK);
+		break;
+
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).