From patchwork Mon Aug 15 21:45:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Tkhai X-Patchwork-Id: 12944198 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A81FC00140 for ; Tue, 16 Aug 2022 01:54:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234327AbiHPByR (ORCPT ); Mon, 15 Aug 2022 21:54:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234305AbiHPBxy (ORCPT ); Mon, 15 Aug 2022 21:53:54 -0400 Received: from forward500p.mail.yandex.net (forward500p.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b7:110]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 053EA20E98E for ; Mon, 15 Aug 2022 14:45:35 -0700 (PDT) Received: from iva4-143b1447cf50.qloud-c.yandex.net (iva4-143b1447cf50.qloud-c.yandex.net [IPv6:2a02:6b8:c0c:7511:0:640:143b:1447]) by forward500p.mail.yandex.net (Yandex) with ESMTP id 0F3B5F01683; Tue, 16 Aug 2022 00:45:34 +0300 (MSK) Received: by iva4-143b1447cf50.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id L10zMfevH2-jWi8ZYbk; Tue, 16 Aug 2022 00:45:33 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ya.ru; s=mail; t=1660599933; bh=6DfvZKkoYrBZQY9QKmpT6aw2IaRKeGGolubxlTMgOyY=; h=Cc:References:Date:Message-ID:In-Reply-To:From:To:Subject; b=nj9cldexlErytqtxJ1BjOuiWLBZDgpporLA3Jc0lP/+H0qaRtTp9+DWTm3zR5r700 XFS6BvM50olZyEf3znLvATcmXYBFv8GTPDoMkj32ZUL9bhHyKsvgV/WJIKosfAb0p5 7bAyAth39dDLAGq2sc0rj+IscL8UKp4Wq+AG3O8Q= Authentication-Results: iva4-143b1447cf50.qloud-c.yandex.net; dkim=pass header.i=@ya.ru Subject: [PATCH v2 2/2] af_unix: Add ioctl(SIOCUNIXGRABFDS) to grab files of receive queue skbs To: Linux Kernel Network Developers Cc: davem@davemloft.net, edumazet@google.com, viro@zeniv.linux.org.uk References: <0b07a55f-0713-7ba4-9b6b-88bc8cc6f1f5@ya.ru> From: Kirill Tkhai Message-ID: <694a6e0a-ed8f-17c0-f85f-77b56cd98357@ya.ru> Date: Tue, 16 Aug 2022 00:45:32 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <0b07a55f-0713-7ba4-9b6b-88bc8cc6f1f5@ya.ru> Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org When a fd owning a counter of some critical resource, say, of a mount, it's impossible to umount that mount and disconnect related block device. That fd may be contained in some unix socket receive queue skb. Despite we have an interface for detecting such the sockets queues (/proc/[PID]/fdinfo/[fd] shows non-zero scm_fds count if so) and it's possible to kill that process to release the counter, the problem is that there may be several processes, and it's not a good thing to kill each of them. This patch adds a simple interface to grab files from receive queue, so the caller may analyze them, and even do that recursively, if grabbed file is unix socket itself. So, the described above problem may be solved by this ioctl() in pair with pidfd_getfd(). Note, that the existing recvmsg(,,MSG_PEEK) is not suitable for that purpose, since it modifies peek offset inside socket, and this results in a problem in case of examined process uses peek offset itself. Additional ptrace freezing of that task plus ioctl(SO_PEEK_OFF) won't help too, since that socket may relate to several tasks, and there is no reliable and non-racy way to detect that. Also, if the caller of such trick will die, the examined task will remain frozen forever. The new suggested ioctl(SIOCUNIXGRABFDS) does not have such problems. The realization of ioctl(SIOCUNIXGRABFDS) is pretty simple. The only interesting thing is protocol with userspace. Firstly, we let userspace to know the number of all files in receive queue skbs. Then we receive fds one by one starting from requested offset. We return number of received fds if there is a successfully received fd, and this number may be less in case of error or desired fds number lack. Userspace may detect that situations by comparison of returned value and out.nr_all minus in.nr_skip. Looking over different variant this one looks the best for me (I considered returning error in case of error and there is a received fd. Also I considered returning number of received files as one more member in struct unix_ioc_grab_fds). Signed-off-by: Kirill Tkhai --- include/uapi/linux/un.h | 12 ++++++++ net/unix/af_unix.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/include/uapi/linux/un.h b/include/uapi/linux/un.h index 0ad59dc8b686..995b358263dd 100644 --- a/include/uapi/linux/un.h +++ b/include/uapi/linux/un.h @@ -11,6 +11,18 @@ struct sockaddr_un { char sun_path[UNIX_PATH_MAX]; /* pathname */ }; +struct unix_ioc_grab_fds { + struct { + int nr_grab; + int nr_skip; + int *fds; + } in; + struct { + int nr_all; + } out; +}; + #define SIOCUNIXFILE (SIOCPROTOPRIVATE + 0) /* open a socket file with O_PATH */ +#define SIOCUNIXGRABFDS (SIOCPROTOPRIVATE + 1) /* grab files from recv queue */ #endif /* _LINUX_UN_H */ diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index bf338b782fc4..3c7e8049eba1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3079,6 +3079,73 @@ static int unix_open_file(struct sock *sk) return fd; } +static int unix_ioc_grab_fds(struct sock *sk, struct unix_ioc_grab_fds __user *uarg) +{ + int i, todo, skip, count, all, err, done = 0; + struct unix_sock *u = unix_sk(sk); + struct unix_ioc_grab_fds arg; + struct sk_buff *skb = NULL; + struct scm_fp_list *fp; + + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + + skip = arg.in.nr_skip; + todo = arg.in.nr_grab; + + if (skip < 0 || todo <= 0) + return -EINVAL; + if (mutex_lock_interruptible(&u->iolock)) + return -EINTR; + + all = atomic_read(&u->scm_stat.nr_fds); + err = -EFAULT; + /* Set uarg->out.nr_all before the first file is received. */ + if (put_user(all, &uarg->out.nr_all)) + goto unlock; + err = 0; + if (all <= skip) + goto unlock; + if (all - skip < todo) + todo = all - skip; + while (todo) { + spin_lock(&sk->sk_receive_queue.lock); + if (!skb) + skb = skb_peek(&sk->sk_receive_queue); + else + skb = skb_peek_next(skb, &sk->sk_receive_queue); + spin_unlock(&sk->sk_receive_queue.lock); + + if (!skb) + goto unlock; + + fp = UNIXCB(skb).fp; + count = fp->count; + if (skip >= count) { + skip -= count; + continue; + } + + for (i = skip; i < count && todo; i++) { + err = receive_fd_user(fp->fp[i], &arg.in.fds[done], 0); + if (err < 0) + goto unlock; + done++; + todo--; + } + skip = 0; + } +unlock: + mutex_unlock(&u->iolock); + + /* Return number of fds (non-error) if there is a received file. */ + if (done) + return done; + if (err < 0) + return err; + return 0; +} + static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock->sk; @@ -3113,6 +3180,9 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) } break; #endif + case SIOCUNIXGRABFDS: + err = unix_ioc_grab_fds(sk, (struct unix_ioc_grab_fds __user *)arg); + break; default: err = -ENOIOCTLCMD; break;