diff mbox series

[v2] net: bpf: fix request_sock leak in filter.c

Message ID 20220615011540.813025-1-jmaxwell37@gmail.com (mailing list archive)
State Accepted
Commit 3046a827316c0e55fc563b4fb78c93b9ca5c7c37
Delegated to: BPF
Headers show
Series [v2] net: bpf: fix request_sock leak in filter.c | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 25 this patch: 25
netdev/cc_maintainers fail 3 blamed authors not CCed: ast@kernel.org joe@isovalent.com lmb@cloudflare.com; 8 maintainers not CCed: lmb@cloudflare.com songliubraving@fb.com ast@kernel.org joe@isovalent.com yhs@fb.com john.fastabend@gmail.com andrii@kernel.org kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 25 this patch: 25
netdev/checkpatch warning WARNING: line length of 94 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-15
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Kernel LATEST on z15 with gcc

Commit Message

Jonathan Maxwell June 15, 2022, 1:15 a.m. UTC
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to 
validate RCU flags on the listen socket so that it balances with 
bpf_sk_release() and update comments as per Martin KaFai Lau's suggestion.
One small change to Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)"
to avoid an extra instruction.
 
A customer reported a request_socket leak in a Calico cloud environment. We 
found that a BPF program was doing a socket lookup with takes a refcnt on 
the socket and that it was finding the request_socket but returning the parent 
LISTEN socket via sk_to_full_sk() without decrementing the child request socket 
1st, resulting in request_sock slab object leak. This patch retains the 
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.

Thanks to Curtis Taylor for all the help in diagnosing and testing this. And 
thanks to Antoine Tenart for the reproducer and patch input.

Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper")
Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com>
Co-developed-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 net/core/filter.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org June 15, 2022, 2:40 p.m. UTC | #1
Hello:

This patch was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Wed, 15 Jun 2022 11:15:40 +1000 you wrote:
> v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
> validate RCU flags on the listen socket so that it balances with
> bpf_sk_release() and update comments as per Martin KaFai Lau's suggestion.
> One small change to Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)"
> to avoid an extra instruction.
> 
> A customer reported a request_socket leak in a Calico cloud environment. We
> found that a BPF program was doing a socket lookup with takes a refcnt on
> the socket and that it was finding the request_socket but returning the parent
> LISTEN socket via sk_to_full_sk() without decrementing the child request socket
> 1st, resulting in request_sock slab object leak. This patch retains the
> existing behaviour of returning full socks to the caller but it also decrements
> the child request_socket if one is present before doing so to prevent the leak.
> 
> [...]

Here is the summary with links:
  - [v2] net: bpf: fix request_sock leak in filter.c
    https://git.kernel.org/bpf/bpf/c/3046a827316c

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 2e32cee2c469..ec2a1e68af12 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6204,10 +6204,21 @@  __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
 					   ifindex, proto, netns_id, flags);
 
 	if (sk) {
-		sk = sk_to_full_sk(sk);
-		if (!sk_fullsock(sk)) {
+		struct sock *sk2 = sk_to_full_sk(sk);
+
+		/* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk
+		 * sock refcnt is decremented to prevent a request_sock leak.
+		 */
+		if (!sk_fullsock(sk2))
+			sk2 = NULL;
+		if (sk2 != sk) {
 			sock_gen_put(sk);
-			return NULL;
+			/* Ensure there is no need to bump sk2 refcnt */
+			if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
+				WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
+				return NULL;
+			}
+			sk = sk2;
 		}
 	}
 
@@ -6241,10 +6252,21 @@  bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
 					 flags);
 
 	if (sk) {
-		sk = sk_to_full_sk(sk);
-		if (!sk_fullsock(sk)) {
+		struct sock *sk2 = sk_to_full_sk(sk);
+
+		/* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk
+		 * sock refcnt is decremented to prevent a request_sock leak.
+		 */
+		if (!sk_fullsock(sk2))
+			sk2 = NULL;
+		if (sk2 != sk) {
 			sock_gen_put(sk);
-			return NULL;
+			/* Ensure there is no need to bump sk2 refcnt */
+			if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
+				WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
+				return NULL;
+			}
+			sk = sk2;
 		}
 	}