From patchwork Wed Nov 29 01:25:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13472090 X-Patchwork-Delegate: bpf@iogearbox.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mT6B1YiU" Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3806010C0; Tue, 28 Nov 2023 17:26:02 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-2858f58ed3cso4346414a91.2; Tue, 28 Nov 2023 17:26:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701221161; x=1701825961; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GT6mSDV+NdeJ/FpSSchkcdTZdswOexVq0uvo5Kro5z8=; b=mT6B1YiUd5Y73vw//aUQqWpcfAxdUoKkmnBuyNqVZM1LPo/8iviyl7s59GH+DR1MnH 9zHL2ldjuqumZhDBwUyX5NzLiSl80Lijm6gAOoNLceStud/+kuTEqRytBLRN4+dyNPWa 0Qpy6L+UYHtZ4TlWx+eODY8zTqpgCrFizXBnwRWlD0Wt6i7GV1TlKqfc8swnrXVXMiFu bVa6GNvIXDT2SZMumgYrAuMP/dN1itoCiaZwTK1a2JGrjkJYg7RN8UfVh9h+ho8fkzgF +p4I8RUxhMgLUmhrrVX8PJ5ESTMPWevAj1VGZM7IR7laPik/wmhXAr9CFxcpcj9GjbaS 2KSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701221161; x=1701825961; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GT6mSDV+NdeJ/FpSSchkcdTZdswOexVq0uvo5Kro5z8=; b=A+rByNeEL7N+H5Q3JqXmRMGL2JGlrAids7MxcTsh9IJBzQ+neeSQ136FNsfcO6aNci H5BK6T9SDH79aYFkTfMrH19HCdnxDw0apj7BWDE1Lj28Aviz7Jgx2BZreccrd8DpwRST 2Q5pF3OH/mt3WfiSXLy9FOrafTMDBVDcZ/R+nEMD5TvOq4iLYAd6U45EdGLDx6usbKwb qxnm7+HYGQq8+rEQTmskyp97eYAw0Bizdkp3RNEyJiz7HF11bWZNw08yL9MreS7dcUVx hWtRXJRnpLUGp7OphE8NqfgvDAsxf2py/SgOIopCfiFbVdDg/euUmzPnI0QpwCe0Sj0n wegg== X-Gm-Message-State: AOJu0YyB7A7kgoG02gJ5uoH6Ca2EIxbkcy4HDFMkKT6f1CxjWB0QPIvP txwuaspvCMGLWuKeDBjHGs0= X-Google-Smtp-Source: AGHT+IFL+TDFUo9MtyF5xHWknut5sXjhy4mdMKy8Ok7tBW8+aj6AU2XinTzqu3ytiJfP0MiB1LQ6Yg== X-Received: by 2002:a17:90a:df04:b0:280:74ce:ae8d with SMTP id gp4-20020a17090adf0400b0028074ceae8dmr18982446pjb.20.1701221161617; Tue, 28 Nov 2023 17:26:01 -0800 (PST) Received: from john.lan ([2605:59c8:148:ba10:9a79:36c7:502e:91e9]) by smtp.gmail.com with ESMTPSA id gb23-20020a17090b061700b00285114454b4sm122757pjb.22.2023.11.28.17.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 17:26:00 -0800 (PST) From: John Fastabend To: martin.lau@kernel.org, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Subject: [PATCH bpf v4 1/2] bpf: sockmap, af_unix stream sockets need to hold ref for pair sock Date: Tue, 28 Nov 2023 17:25:56 -0800 Message-Id: <20231129012557.95371-2-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20231129012557.95371-1-john.fastabend@gmail.com> References: <20231129012557.95371-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net AF_UNIX stream sockets are a paired socket. So sending on one of the pairs will lookup the paired socket as part of the send operation. It is possible however to put just one of the pairs in a BPF map. This currently increments the refcnt on the sock in the sockmap to ensure it is not free'd by the stack before sockmap cleans up its state and stops any skbs being sent/recv'd to that socket. But we missed a case. If the peer socket is closed it will be free'd by the stack. However, the paired socket can still be referenced from BPF sockmap side because we hold a reference there. Then if we are sending traffic through BPF sockmap to that socket it will try to dereference the free'd pair in its send logic creating a use after free. And following splat, [59.900375] BUG: KASAN: slab-use-after-free in sk_wake_async+0x31/0x1b0 [59.901211] Read of size 8 at addr ffff88811acbf060 by task kworker/1:2/954 [...] [59.905468] Call Trace: [59.905787] [59.906066] dump_stack_lvl+0x130/0x1d0 [59.908877] print_report+0x16f/0x740 [59.910629] kasan_report+0x118/0x160 [59.912576] sk_wake_async+0x31/0x1b0 [59.913554] sock_def_readable+0x156/0x2a0 [59.914060] unix_stream_sendmsg+0x3f9/0x12a0 [59.916398] sock_sendmsg+0x20e/0x250 [59.916854] skb_send_sock+0x236/0xac0 [59.920527] sk_psock_backlog+0x287/0xaa0 To fix let BPF sockmap hold a refcnt on both the socket in the sockmap and its paired socket. It wasn't obvious how to contain the fix to bpf_unix logic. The primarily problem with keeping this logic in bpf_unix was: In the sock close() we could handle the deref by having a close handler. But, when we are destroying the psock through a map delete operation we wouldn't have gotten any signal thorugh the proto struct other than it being replaced. If we do the deref from the proto replace its too early because we need to deref the skpair after the backlog worker has been stopped. Given all this it seems best to just cache it at the end of the psock and eat 8B for the af_unix and vsock users. Notice dgram sockets are OK because they handle locking already. Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- include/linux/skmsg.h | 1 + include/net/af_unix.h | 1 + net/core/skmsg.c | 2 ++ net/unix/af_unix.c | 2 -- net/unix/unix_bpf.c | 5 +++++ 5 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c1637515a8a4..fbe628961cf8 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -106,6 +106,7 @@ struct sk_psock { struct mutex work_mutex; struct sk_psock_work_state work_state; struct delayed_work work; + struct sock *skpair; struct rcu_work rwork; }; diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 824c258143a3..49c4640027d8 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -75,6 +75,7 @@ struct unix_sock { }; #define unix_sk(ptr) container_of_const(ptr, struct unix_sock, sk) +#define unix_peer(sk) (unix_sk(sk)->peer) #define peer_wait peer_wq.wait diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 6c31eefbd777..6236164b9bce 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -826,6 +826,8 @@ static void sk_psock_destroy(struct work_struct *work) if (psock->sk_redir) sock_put(psock->sk_redir); + if (psock->skpair) + sock_put(psock->skpair); sock_put(psock->sk); kfree(psock); } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index a357dc5f2404..ac1f2bc18fc9 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -213,8 +213,6 @@ static inline bool unix_secdata_eq(struct scm_cookie *scm, struct sk_buff *skb) } #endif /* CONFIG_SECURITY_NETWORK */ -#define unix_peer(sk) (unix_sk(sk)->peer) - static inline int unix_our_peer(struct sock *sk, struct sock *osk) { return unix_peer(osk) == sk; diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c index 2f9d8271c6ec..074ab91485f1 100644 --- a/net/unix/unix_bpf.c +++ b/net/unix/unix_bpf.c @@ -159,12 +159,17 @@ int unix_dgram_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool re int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) { + struct sock *skpair; + if (restore) { sk->sk_write_space = psock->saved_write_space; sock_replace_proto(sk, psock->sk_proto); return 0; } + skpair = unix_peer(sk); + sock_hold(skpair); + psock->skpair = skpair; unix_stream_bpf_check_needs_rebuild(psock->sk_proto); sock_replace_proto(sk, &unix_stream_bpf_prot); return 0; From patchwork Wed Nov 29 01:25:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13472091 X-Patchwork-Delegate: bpf@iogearbox.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OkV+AXH/" Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35FDE19A6; Tue, 28 Nov 2023 17:26:04 -0800 (PST) Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-517ab9a4a13so4804203a12.1; Tue, 28 Nov 2023 17:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701221163; x=1701825963; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=66Orb0A/FBk2ud9bubDjtOxTZ6JhqUJSm1nBS/v96J4=; b=OkV+AXH/k1m1AEex9GBN0X1eFTPdy0/FhnhnAuE0x2HFVXTHJbEgfungYHJS7yfPqq jNLSv25ibrLkq7m9xW1ApsoQ5wzM+BFPCwJM5xlz3xs4J3P/sczJQkVjlkLqqZjb9ctA CpbvnFw8VCzfKLtKtY5ri80FFDwqXC+ia42h2/7yzcIWLpAWkJJ3c5srl/1OjPZwAFqI F6219iDT0VsaCkNsEJ/qjg2STS6ZBB/dX39TK/cjg0iBdgWYEioMwa575ZXp78j9MOF5 lFS8PCHatsdjjsuIXl1UsjXBahFQJYdRWgxN/qwXGu2VuqfyafyLTIelZLVZkaMHKcd7 BRiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701221163; x=1701825963; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=66Orb0A/FBk2ud9bubDjtOxTZ6JhqUJSm1nBS/v96J4=; b=n8/ZyM+gBs1Mt4UZ6+UrLllVqj5qSlehXVZGVyR9Wb7glRbKNckChN/FlcpfE5DmiH OBXVh9ueL5kub4XEzyj+GT0r30iErU+ExS5CWFAcVhl8B/jHqG/hsGHESIUM2JdQcN8i CMHHVjJEha2WXnBqghm5hYfWYAnC/wTgEA/xxUQ2Ymdo5Fn7AaQsv+TsRStrQsZmUsnt vInLe/D5saHpHROpPbYeu32oD2vxQp04OEWrAsXXsBwCMJhAjRoj+EyUA6n1+5cXDaLu xEH4TGM/G3ZWL7jwtYu3W1fp/1UEcNHrWxy9M23uBOo9LRzjgqT+A0MSpgHcGnT7LDxH czOg== X-Gm-Message-State: AOJu0YyQAaLIglx28Sa6Kw1KZGVCvM2DurP34TairwXv3rNCxIV0xucU tIctS0y8alrZZM2TFuap1dkqNm52QSJjzA== X-Google-Smtp-Source: AGHT+IE4T4hbdizCtpFu02Q+bGu2pCGmdjKMlZowHW7tYSiIvTonYzT2zMaBF+ARfNwCrQf1+QJPUQ== X-Received: by 2002:a05:6a20:144b:b0:18c:548d:3d0f with SMTP id a11-20020a056a20144b00b0018c548d3d0fmr17277576pzi.5.1701221163647; Tue, 28 Nov 2023 17:26:03 -0800 (PST) Received: from john.lan ([2605:59c8:148:ba10:9a79:36c7:502e:91e9]) by smtp.gmail.com with ESMTPSA id gb23-20020a17090b061700b00285114454b4sm122757pjb.22.2023.11.28.17.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 17:26:02 -0800 (PST) From: John Fastabend To: martin.lau@kernel.org, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Subject: [PATCH bpf v4 2/2] bpf: sockmap, add af_unix test with both sockets in map Date: Tue, 28 Nov 2023 17:25:57 -0800 Message-Id: <20231129012557.95371-3-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20231129012557.95371-1-john.fastabend@gmail.com> References: <20231129012557.95371-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net This adds a test where both pairs of a af_unix paired socket are put into a BPF map. This ensures that when we tear down the af_unix pair we don't have any issues on sockmap side with ordering and reference counting. Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- .../selftests/bpf/prog_tests/sockmap_listen.c | 51 +++++++++++++++---- .../selftests/bpf/progs/test_sockmap_listen.c | 7 +++ 2 files changed, 47 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index a934d430c20c..a92807bfcd13 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1337,7 +1337,8 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map, } static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1, - int sock_mapfd, int verd_mapfd, enum redir_mode mode) + int sock_mapfd, int nop_mapfd, + int verd_mapfd, enum redir_mode mode) { const char *log_prefix = redir_mode_str(mode); unsigned int pass; @@ -1351,6 +1352,12 @@ static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1, if (err) return; + if (nop_mapfd >= 0) { + err = add_to_sockmap(nop_mapfd, cli0, cli1); + if (err) + return; + } + n = write(cli1, "a", 1); if (n < 0) FAIL_ERRNO("%s: write", log_prefix); @@ -1387,7 +1394,7 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd, goto close0; c1 = sfd[0], p1 = sfd[1]; - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode); + pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode); xclose(c1); xclose(p1); @@ -1677,7 +1684,7 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd, if (err) goto close_cli0; - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode); + pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode); xclose(c1); xclose(p1); @@ -1735,7 +1742,7 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd, if (err) goto close; - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode); + pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode); xclose(c1); xclose(p1); @@ -1770,8 +1777,10 @@ static void inet_unix_skb_redir_to_connected(struct test_sockmap_listen *skel, xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); } -static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd, - int verd_mapfd, enum redir_mode mode) +static void unix_inet_redir_to_connected(int family, int type, + int sock_mapfd, int nop_mapfd, + int verd_mapfd, + enum redir_mode mode) { int c0, c1, p0, p1; int sfd[2]; @@ -1785,7 +1794,8 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd, goto close_cli0; c1 = sfd[0], p1 = sfd[1]; - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode); + pairs_redir_to_connected(c0, p0, c1, p1, + sock_mapfd, nop_mapfd, verd_mapfd, mode); xclose(c1); xclose(p1); @@ -1799,6 +1809,7 @@ static void unix_inet_skb_redir_to_connected(struct test_sockmap_listen *skel, struct bpf_map *inner_map, int family) { int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int nop_map = bpf_map__fd(skel->maps.nop_map); int verdict_map = bpf_map__fd(skel->maps.verdict_map); int sock_map = bpf_map__fd(inner_map); int err; @@ -1808,14 +1819,32 @@ static void unix_inet_skb_redir_to_connected(struct test_sockmap_listen *skel, return; skel->bss->test_ingress = false; - unix_inet_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, + unix_inet_redir_to_connected(family, SOCK_DGRAM, + sock_map, -1, verdict_map, REDIR_EGRESS); - unix_inet_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map, + unix_inet_redir_to_connected(family, SOCK_DGRAM, + sock_map, -1, verdict_map, + REDIR_EGRESS); + + unix_inet_redir_to_connected(family, SOCK_DGRAM, + sock_map, nop_map, verdict_map, + REDIR_EGRESS); + unix_inet_redir_to_connected(family, SOCK_STREAM, + sock_map, nop_map, verdict_map, REDIR_EGRESS); skel->bss->test_ingress = true; - unix_inet_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, + unix_inet_redir_to_connected(family, SOCK_DGRAM, + sock_map, -1, verdict_map, + REDIR_INGRESS); + unix_inet_redir_to_connected(family, SOCK_STREAM, + sock_map, -1, verdict_map, + REDIR_INGRESS); + + unix_inet_redir_to_connected(family, SOCK_DGRAM, + sock_map, nop_map, verdict_map, REDIR_INGRESS); - unix_inet_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map, + unix_inet_redir_to_connected(family, SOCK_STREAM, + sock_map, nop_map, verdict_map, REDIR_INGRESS); xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index 464d35bd57c7..b7250eb9c30c 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -14,6 +14,13 @@ struct { __type(value, __u64); } sock_map SEC(".maps"); +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 2); + __type(key, __u32); + __type(value, __u64); +} nop_map SEC(".maps"); + struct { __uint(type, BPF_MAP_TYPE_SOCKHASH); __uint(max_entries, 2);