diff mbox series

[net-next,v2,1/2] net: retrieve netns cookie via getsocketopt

Message ID 20210623135646.1632083-1-m@lambda.lt (mailing list archive)
State Accepted
Commit e8b9eab99232c4e62ada9d7976c80fd5e8118289
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2,1/2] net: retrieve netns cookie via getsocketopt | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 29 maintainers not CCed: kpsingh@kernel.org kafai@fb.com john.fastabend@gmail.com deller@gmx.de linux-parisc@vger.kernel.org aahringo@redhat.com rth@twiddle.net ink@jurassic.park.msu.ru bpf@vger.kernel.org andrii@kernel.org keescook@chromium.org James.Bottomley@HansenPartnership.com tsbogend@alpha.franken.de pabeni@redhat.com arnd@arndb.de fw@strlen.de xiangxia.m.yue@gmail.com yhs@fb.com mattst88@gmail.com sparclinux@vger.kernel.org mathew.j.martineau@linux.intel.com davem@davemloft.net linux-arch@vger.kernel.org kuba@kernel.org linux-alpha@vger.kernel.org linux-mips@vger.kernel.org linmiaohe@huawei.com songliubraving@fb.com bjorn@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 7408 this patch: 7408
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 53 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 12375 this patch: 12375
netdev/header_inline success Link

Commit Message

Martynas Pumputis June 23, 2021, 1:56 p.m. UTC
It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.

Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.

Thus, this is following up on Eric's commit 3d368ab87cf6 ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
via SO_COOKIE.

  [1] https://kind.sigs.k8s.io/

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Cc: Eric Dumazet <edumazet@google.com>
---
 arch/alpha/include/uapi/asm/socket.h  | 2 ++
 arch/mips/include/uapi/asm/socket.h   | 2 ++
 arch/parisc/include/uapi/asm/socket.h | 2 ++
 arch/sparc/include/uapi/asm/socket.h  | 2 ++
 include/uapi/asm-generic/socket.h     | 2 ++
 net/core/sock.c                       | 7 +++++++
 6 files changed, 17 insertions(+)

Comments

Eric Dumazet June 23, 2021, 2:07 p.m. UTC | #1
On Wed, Jun 23, 2021 at 3:55 PM Martynas Pumputis <m@lambda.lt> wrote:
>
> It's getting more common to run nested container environments for
> testing cloud software. One of such examples is Kind [1] which runs a
> Kubernetes cluster in Docker containers on a single host. Each container
> acts as a Kubernetes node, and thus can run any Pod (aka container)
> inside the former. This approach simplifies testing a lot, as it
> eliminates complicated VM setups.
>
> Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
> programs are used for load-balancing. The load-balancer BPF program
> needs to detect whether a request originates from the host netns or a
> container netns in order to allow some access, e.g. to a service via a
> loopback IP address. Typically, the programs detect this by comparing
> netns cookies with the one of the init ns via a call to
> bpf_get_netns_cookie(NULL). However, in nested environments the latter
> cannot be used given the Kubernetes node's netns is outside the init ns.
> To fix this, we need to pass the Kubernetes node netns cookie to the
> program in a different way: by extending getsockopt() with a
> SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
> node netns can retrieve the cookie and pass it to the program instead.
>
> Thus, this is following up on Eric's commit 3d368ab87cf6 ("net:
> initialize net->net_cookie at netns setup") to allow retrieval via
> SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
> via SO_COOKIE.
>
>   [1] https://kind.sigs.k8s.io/
>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Martynas Pumputis <m@lambda.lt>
> Cc: Eric Dumazet <edumazet@google.com>
> ---

This looks fine, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>
patchwork-bot+netdevbpf@kernel.org June 24, 2021, 6:20 p.m. UTC | #2
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Wed, 23 Jun 2021 15:56:45 +0200 you wrote:
> It's getting more common to run nested container environments for
> testing cloud software. One of such examples is Kind [1] which runs a
> Kubernetes cluster in Docker containers on a single host. Each container
> acts as a Kubernetes node, and thus can run any Pod (aka container)
> inside the former. This approach simplifies testing a lot, as it
> eliminates complicated VM setups.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] net: retrieve netns cookie via getsocketopt
    https://git.kernel.org/netdev/net-next/c/e8b9eab99232
  - [net-next,v2,2/2] tools/testing: add a selftest for SO_NETNS_COOKIE
    https://git.kernel.org/netdev/net-next/c/ae24bab257bb

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 57420356ce4c..6b3daba60987 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -127,6 +127,8 @@ 
 #define SO_PREFER_BUSY_POLL	69
 #define SO_BUSY_POLL_BUDGET	70
 
+#define SO_NETNS_COOKIE		71
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2d949969313b..cdf404a831b2 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -138,6 +138,8 @@ 
 #define SO_PREFER_BUSY_POLL	69
 #define SO_BUSY_POLL_BUDGET	70
 
+#define SO_NETNS_COOKIE		71
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index f60904329bbc..5b5351cdcb33 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -119,6 +119,8 @@ 
 #define SO_PREFER_BUSY_POLL	0x4043
 #define SO_BUSY_POLL_BUDGET	0x4044
 
+#define SO_NETNS_COOKIE		0x4045
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 848a22fbac20..92675dc380fa 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -120,6 +120,8 @@ 
 #define SO_PREFER_BUSY_POLL	 0x0048
 #define SO_BUSY_POLL_BUDGET	 0x0049
 
+#define SO_NETNS_COOKIE          0x0050
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 4dcd13d097a9..d588c244ec2f 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -122,6 +122,8 @@ 
 #define SO_PREFER_BUSY_POLL	69
 #define SO_BUSY_POLL_BUDGET	70
 
+#define SO_NETNS_COOKIE		71
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index ddfa88082a2b..a2337b37eba6 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1635,6 +1635,13 @@  int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_bound_dev_if;
 		break;
 
+	case SO_NETNS_COOKIE:
+		lv = sizeof(u64);
+		if (len != lv)
+			return -EINVAL;
+		v.val64 = sock_net(sk)->net_cookie;
+		break;
+
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).