From patchwork Fri Apr 5 22:10:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13619544 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3B9A1E87C for ; Fri, 5 Apr 2024 22:11:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712355076; cv=none; b=CY/RDJcNJEVO7+hW2hoIYEHzDCR0J4U07Z/ekxMhYfcxtUjpMgkkb0nOe456ZgOV/14kxFqF5IccBXfhmL/CaTrg1t/3ryiMans2EuUhyJkB0wyyrn0MwUn273noWn+xlhui0c/JnQ+fCd4EFah/FhdEsmcrKg3xiJn+4D7MnH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712355076; c=relaxed/simple; bh=FK0G/FCjSXEososHM46ugi1L4cjmlaSB1pTi1GOWgpM=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=nIQNICavoyQX3RWJV4QyQGMAtS9VOcrnTCT64WuOiGGPEG08l9U5DhQGC+Qdgs8q76n6iR5xbNA5cDuQJHNQmvq0HS8gjcIg5HNQM0JdM7+Q1R+Zgwrf+8/DIuxQ+zrLDEwffMu3973QSeZJ8vCKSmdnI9cQ8ufoAZyQIENCWt8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=KgCtWAYS; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="KgCtWAYS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712355075; x=1743891075; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=F9S5i89Ur2NiLYRkrWpvWTzaR3VdlsNWetaXGaaNT+g=; b=KgCtWAYSeSukp1sPYnDRABmF0l30cv3NNxTXjgc7qZkJC/7jUN/GIJ87 /XU5bJqd8SAupmVJz57FzypzvF3hQ0+WBLJ1WncJrtH500vCYz/VQLFg6 jmsrWwUhivkI800nJleEk8hKP34iWT2ebbvr1SrkyxCJXE0yhwkYjn/jZ M=; X-IronPort-AV: E=Sophos;i="6.07,182,1708387200"; d="scan'208";a="398518618" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2024 22:11:12 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:8111] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.44.181:2525] with esmtp (Farcaster) id 0a098e4b-ae03-44b7-8490-00b918663e98; Fri, 5 Apr 2024 22:11:11 +0000 (UTC) X-Farcaster-Flow-ID: 0a098e4b-ae03-44b7-8490-00b918663e98 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Fri, 5 Apr 2024 22:11:10 +0000 Received: from 88665a182662.ant.amazon.com (10.106.101.45) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.28; Fri, 5 Apr 2024 22:11:08 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni CC: Rao shoaib , Kuniyuki Iwashima , Kuniyuki Iwashima , , Subject: [PATCH v2 net] af_unix: Clear stale u->oob_skb. Date: Fri, 5 Apr 2024 15:10:57 -0700 Message-ID: <20240405221057.2406-1-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D037UWC003.ant.amazon.com (10.13.139.231) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org syzkaller started to report deadlock of unix_gc_lock after commit 4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but it just uncovers the bug that has been there since commit 314001f0bf92 ("af_unix: Add OOB support"). The repro basically does the following. from socket import * from array import array c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB) c2.recv(1) # blocked as no normal data in recv queue c2.close() # done async and unblock recv() c1.close() # done async and trigger GC A socket sends its file descriptor to itself as OOB data and tries to receive normal data, but finally recv() fails due to async close(). The problem here is wrong handling of OOB skb in manage_oob(). When recvmsg() is called without MSG_OOB, manage_oob() is called to check if the peeked skb is OOB skb. In such a case, manage_oob() pops it out of the receive queue but does not clear unix_sock(sk)->oob_skb. This is wrong in terms of uAPI. Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB. The 'o' is handled as OOB data. When recv() is called twice without MSG_OOB, the OOB data should be lost. >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0) >>> c1.send(b'hello', MSG_OOB) # 'o' is OOB data 5 >>> c1.send(b'world') 5 >>> c2.recv(5) # OOB data is not received b'hell' >>> c2.recv(5) # OOB date is skipped b'world' >>> c2.recv(5, MSG_OOB) # This should return an error b'o' In the same situation, TCP actually returns -EINVAL for the last recv(). Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set EPOLLPRI even though the data has passed through by previous recv(). To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing it from recv queue. The reason why the old GC did not trigger the deadlock is because the old GC relied on the receive queue to detect the loop. When it is triggered, the socket with OOB data is marked as GC candidate because file refcount == inflight count (1). However, after traversing all inflight sockets, the socket still has a positive inflight count (1), thus the socket is excluded from candidates. Then, the old GC lose the chance to garbage-collect the socket. With the old GC, the repro continues to create true garbage that will never be freed nor detected by kmemleak as it's linked to the global inflight list. That's why we couldn't even notice the issue. Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169 Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet --- v2: Use skb_unref() instead of kfree_skb() v1: https://lore.kernel.org/netdev/20240405204145.93169-1-kuniyu@amazon.com/ --- net/unix/af_unix.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5b41e2321209..d032eb5fa6df 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2665,7 +2665,9 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk, } } else if (!(flags & MSG_PEEK)) { skb_unlink(skb, &sk->sk_receive_queue); - consume_skb(skb); + WRITE_ONCE(u->oob_skb, NULL); + if (!WARN_ON_ONCE(skb_unref(skb))) + kfree_skb(skb); skb = skb_peek(&sk->sk_receive_queue); } }