From patchwork Mon Apr 8 15:58:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Luczaj X-Patchwork-Id: 13621379 X-Patchwork-Delegate: kuba@kernel.org Received: from mailtransmit04.runbox.com (mailtransmit04.runbox.com [185.226.149.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 440271428F3 for ; Mon, 8 Apr 2024 16:36:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712594166; cv=none; b=QWJ2Eukzd4XDiGtvMKOULyD3rE8GXZxYY6Pq9gV60RlxUuV1R4zH0fL4jYilPS0B6UORHmQL9YivWEsiOiedfWPALoVt0NPLGX9zX+9/fXICgXfDjZIjGfHV845s7gSUmYTnkJ3ItnC71mdcikYzh/7j4hOqmJdcS4s/QS0KfyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712594166; c=relaxed/simple; bh=esAnk1Vn14FCnu6g/w0wJeTpOQksVERHL7TgFeuI0h8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aCzgLVHuz7rC9seHD0yT0AqJ1TCBmVWqCY4z8DB89mu7FjN2yv/lwbGF3ogd2Kai3C5qR2nn3qy20WGpywBMhoGDOjpfBKGWLOuux2dBSqPLsv8h0NfkKNnipTYhYSVhGrJIS0lBSjuK/+WOYJpBHXWZIVyqVyaWzyibJXYFVVE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co; spf=pass smtp.mailfrom=rbox.co; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b=uM8BOtN+; arc=none smtp.client-ip=185.226.149.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rbox.co Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b="uM8BOtN+" Received: from mailtransmit03.runbox ([10.9.9.163] helo=aibo.runbox.com) by mailtransmit04.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1rtrdK-00BfbF-DX; Mon, 08 Apr 2024 18:14:10 +0200 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector2; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From; bh=ZpTjFZsxP951y4tEj8jhDilkxjajyebiLobmoA6+dpo=; b=uM8BOtN+/bkeAWeOudiFMvmriw wGKf+5Cs3AvOCqLoj0m6NY+OOBWFbe8i1kgVpPYZDFF3VASckR51eCMWJVT3CbRHXnSKEtrdJ+kL5 Mt9Im87K2vlWF2AMnoEwD4bh7l7cT6uKCBLYNbWvFAxgNOA65iRo2wYd8oQq/f+u/+kZR+qJf6aem +wOlS6pSRL6jHGSseTXTTMSgm02dXn1lo3fg6BfYK5BAnA6IWtO1SnTxq7+rCsCI4DdBY/kSHP7PT 2MWvJTJdYHQiqpkmY4539GNUpygsZbeixNSiR6JbGbDqZT2C6/N1KYa+WHqOccWWUlMpKPlq/PzJV BjlTNvLg==; Received: from [10.9.9.73] (helo=submission02.runbox) by mailtransmit03.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1rtrdJ-0006ea-H4; Mon, 08 Apr 2024 18:14:10 +0200 Received: by submission02.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1rtrd3-00Gq8Q-Df; Mon, 08 Apr 2024 18:13:53 +0200 From: Michal Luczaj To: netdev@vger.kernel.org Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, kuniyu@amazon.com, Michal Luczaj Subject: [PATCH net 1/2] af_unix: Fix garbage collector racing against connect() Date: Mon, 8 Apr 2024 17:58:45 +0200 Message-ID: <20240408161336.612064-2-mhal@rbox.co> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240408161336.612064-1-mhal@rbox.co> References: <20240408161336.612064-1-mhal@rbox.co> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list. sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- ----------- NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate condition met for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1 inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another connect() coming, its embryo won't carry SCM_RIGHTS as we already took the unix_gc_lock. Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.") Signed-off-by: Michal Luczaj --- net/unix/garbage.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index fa39b6265238..cd3e8585ceb2 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -274,11 +274,20 @@ static void __unix_gc(struct work_struct *work) * receive queues. Other, non candidate sockets _can_ be * added to queue, so we must make sure only to touch * candidates. + * + * Embryos, though never candidates themselves, affect which + * candidates are reachable by the garbage collector. Before + * being added to a listener's queue, an embryo may already + * receive data carrying SCM_RIGHTS, potentially making the + * passed socket a candidate that is not yet reachable by the + * collector. It becomes reachable once the embryo is + * enqueued. Therefore, we must ensure that no SCM-laden + * embryo appears in a (candidate) listener's queue between + * consecutive scan_children() calls. */ list_for_each_entry_safe(u, next, &gc_inflight_list, link) { - long total_refs; - - total_refs = file_count(u->sk.sk_socket->file); + struct sock *sk = &u->sk; + long total_refs = file_count(sk->sk_socket->file); WARN_ON_ONCE(!u->inflight); WARN_ON_ONCE(total_refs < u->inflight); @@ -286,6 +295,11 @@ static void __unix_gc(struct work_struct *work) list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); + + if (sk->sk_state == TCP_LISTEN) { + unix_state_lock(sk); + unix_state_unlock(sk); + } } } From patchwork Mon Apr 8 15:58:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Luczaj X-Patchwork-Id: 13621390 X-Patchwork-Delegate: kuba@kernel.org Received: from mailtransmit05.runbox.com (mailtransmit05.runbox.com [185.226.149.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCE741428F1 for ; Mon, 8 Apr 2024 16:47:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.38 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712594883; cv=none; b=bWOVH4vEscwTlCbD9DdCJEKHYnfetJC37bEdBgz4qmhELwF+nMZ1pUXOEPURMpF2Yp0vOdjk4csIPauZY9oY9B5/IjuUtrcnUg+YMHFsqong9c0CcH+H07G1Cvz8VwZMT/exAgSIKieqNaL8sHuDLHx9qCZ+lwHzZo9f59YMUFw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712594883; c=relaxed/simple; bh=QVbaPDGsqOgvW1cyuBq70eIFSby+cyWLYp4KOuq/AMg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LH904sJznvxxksIycqKb52khvgDnoOjNAP5s2b77Pn5Yv2bic/y2nNtLDhFDXFTVXPZ1joCjYlZOrdn4D+ZB6I479h7XFKe5tnX5jgmB0G8UNb6kNnEB+2nkIacNzpbl/Y4VTZqcDyzl0M2Moz3K9KZ3oCE+TR+M7U/WM4eQkE8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co; spf=pass smtp.mailfrom=rbox.co; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b=KtwMKe4M; arc=none smtp.client-ip=185.226.149.38 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rbox.co Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b="KtwMKe4M" Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com) by mailtransmit05.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1rtrdI-00BVA9-De; Mon, 08 Apr 2024 18:14:08 +0200 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector2; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From; bh=HwySkYg13T0tSBGx7Bl5y+gdkt4e93qnnix64ekngls=; b=KtwMKe4Mg+4/lA+cXjmrpqZQcn 4aUiIdhUntJ4Fjui6R6cFKKbqkxL3D1IZ57xzj5L1pSEsTrYboYoB28kfplONLJAZnqtaTYUluCzC /Cy6vKh33A5ZA0YNBJDgJs8xHQv8Ukx09GdRP7tMQBplZwOVrW6/BOE+e3+FbJvIy8hsnuAPw1EVw 2amx+6phMr7cEtNsJXyyAD4Iyx4H/tbCo09pbXzojCbZWRHGuWiMWS49mGXJ2xHranYykft4JtBzZ rfNle0fspZ3P57LOyg/Xv5azh21bwwk5pv3t1yXH4n8A8uHfcrLRCOGhTWPZIDGbkV4h7koPoPQR0 kHOPT9lQ==; Received: from [10.9.9.73] (helo=submission02.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1rtrdI-000574-0R; Mon, 08 Apr 2024 18:14:08 +0200 Received: by submission02.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1rtrd3-00Gq8Q-Sa; Mon, 08 Apr 2024 18:13:53 +0200 From: Michal Luczaj To: netdev@vger.kernel.org Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, kuniyu@amazon.com, Michal Luczaj Subject: [PATCH net 2/2] af_unix: Add GC race reproducer + slow down unix_stream_connect() Date: Mon, 8 Apr 2024 17:58:46 +0200 Message-ID: <20240408161336.612064-3-mhal@rbox.co> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240408161336.612064-1-mhal@rbox.co> References: <20240408161336.612064-1-mhal@rbox.co> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Attempt to crash kernel racing unix socket garbage collector. Signed-off-by: Michal Luczaj --- net/unix/af_unix.c | 2 + tools/testing/selftests/net/af_unix/Makefile | 2 +- .../selftests/net/af_unix/gc_vs_connect.c | 158 ++++++++++++++++++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/net/af_unix/gc_vs_connect.c diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5b41e2321209..8e56a094dc80 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1636,6 +1636,8 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, unix_state_unlock(sk); + mdelay(1); + /* take ten and send info to listening sock */ spin_lock(&other->sk_receive_queue.lock); __skb_queue_tail(&other->sk_receive_queue, skb); diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index 221c387a7d7f..3b12a9290e06 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -1,4 +1,4 @@ CFLAGS += $(KHDR_INCLUDES) -TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd +TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd gc_vs_connect include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/gc_vs_connect.c b/tools/testing/selftests/net/af_unix/gc_vs_connect.c new file mode 100644 index 000000000000..8b724f1616dd --- /dev/null +++ b/tools/testing/selftests/net/af_unix/gc_vs_connect.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SOCK_TYPE SOCK_STREAM /* or SOCK_SEQPACKET */ + +union { + char buf[CMSG_SPACE(sizeof(int))]; + struct cmsghdr align; +} cbuf; + +struct iovec io = { + .iov_base = (char[1]) {0}, + .iov_len = 1 +}; + +struct msghdr msg = { + .msg_iov = &io, + .msg_iovlen = 1, + .msg_control = cbuf.buf, + .msg_controllen = sizeof(cbuf.buf) +}; + +pthread_barrier_t barr; +struct sockaddr_un saddr; +int salen, client; + +static void barrier(void) +{ + int ret = pthread_barrier_wait(&barr); + + assert(!ret || ret == PTHREAD_BARRIER_SERIAL_THREAD); +} + +static int socket_unix(void) +{ + int sock = socket(AF_UNIX, SOCK_TYPE, 0); + + assert(sock != -1); + return sock; +} + +static int recv_fd(int socket) +{ + struct cmsghdr *cmsg; + int ret, fd; + + ret = recvmsg(socket, &msg, 0); + assert(ret == 1 && !(msg.msg_flags & MSG_CTRUNC)); + + cmsg = CMSG_FIRSTHDR(&msg); + assert(cmsg); + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + assert(fd >= 0); + + return fd; +} + +static void send_fd(int socket, int fd) +{ + struct cmsghdr *cmsg; + int ret; + + cmsg = CMSG_FIRSTHDR(&msg); + assert(cmsg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + + memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); + + do { + ret = sendmsg(socket, &msg, 0); + assert(ret == 1 || (ret == -1 && errno == ENOTCONN)); + } while (ret != 1); +} + +static void *racer_connect(void *arg) +{ + for (;;) { + int ret; + + barrier(); + ret = connect(client, (struct sockaddr *)&saddr, salen); + assert(!ret); + barrier(); + } + + return NULL; +} + +static void *racer_gc(void *arg) +{ + for (;;) + close(socket_unix()); /* trigger GC */ + + return NULL; +} + +int main(void) +{ + pthread_t thread_conn, thread_gc; + int ret, pair[2]; + + printf("running\n"); + + ret = pthread_barrier_init(&barr, NULL, 2); + assert(!ret); + + ret = pthread_create(&thread_conn, NULL, racer_connect, NULL); + assert(!ret); + + ret = pthread_create(&thread_gc, NULL, racer_gc, NULL); + assert(!ret); + + ret = socketpair(AF_UNIX, SOCK_TYPE, 0, pair); + assert(!ret); + + saddr.sun_family = AF_UNIX; + salen = sizeof(saddr.sun_family) + + sprintf(saddr.sun_path, "%c/unix-gc-%d", '\0', getpid()); + + for (;;) { + int server, victim; + + server = socket_unix(); + ret = bind(server, (struct sockaddr *)&saddr, salen); + assert(!ret); + ret = listen(server, -1); + assert(!ret); + + send_fd(pair[0], server); + close(server); + + client = socket_unix(); + victim = socket_unix(); + + barrier(); + send_fd(client, victim); + close(victim); + barrier(); + + server = recv_fd(pair[1]); + close(client); + close(server); + } + + return 0; +}