From patchwork Mon Aug 3 13:06:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11698047 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C80A14E3 for ; Mon, 3 Aug 2020 13:06:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E9A01207FB for ; Mon, 3 Aug 2020 13:06:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MuV+mJpI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727092AbgHCNGh (ORCPT ); Mon, 3 Aug 2020 09:06:37 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:58136 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727064AbgHCNGh (ORCPT ); Mon, 3 Aug 2020 09:06:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596459995; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j/v002Gxe6nBxuXIo8xXJ4Ruh4kteZD7I/mhEJxj0MM=; b=MuV+mJpI9HWCI/yVeXv5uoperMrxw6iZZ4yWmzf5HEbQNIcRduKyBVHx9oLMp5iXSa2mRS N33KJ0NOWR9FTtGihXkFeZNNV4JsW2WGmxHzJS3OAfJ9Bc6GQ+uM2A7baewtmmVBc3T00m tBmNEsONIIbvXco6u/tjOCRJDziQYWc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-421-NWK5erupNA6dLy11xyeTNg-1; Mon, 03 Aug 2020 09:06:34 -0400 X-MC-Unique: NWK5erupNA6dLy11xyeTNg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 29543106B20A; Mon, 3 Aug 2020 13:06:32 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-32.rdu2.redhat.com [10.10.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 214D78A19E; Mon, 3 Aug 2020 13:06:27 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 1/5] watch_queue: Limit the number of watches a user can hold [ver #2] From: David Howells To: viro@zeniv.linux.org.uk Cc: Linus Torvalds , Jarkko Sakkinen , dhowells@redhat.com, torvalds@linux-foundation.org, casey@schaufler-ca.com, sds@tycho.nsa.gov, nicolas.dichtel@6wind.com, raven@themaw.net, christian@brauner.io, jlayton@redhat.com, kzak@redhat.com, mszeredi@redhat.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 03 Aug 2020 14:06:27 +0100 Message-ID: <159645998732.1779777.6694081988484186124.stgit@warthog.procyon.org.uk> In-Reply-To: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> References: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Impose a limit on the number of watches that a user can hold so that they can't use this mechanism to fill up all the available memory. This is done by putting a counter in user_struct that's incremented when a watch is allocated and decreased when it is released. If the number exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN. This can be tested by the following means: (1) Create a watch queue and attach it to fd 5 in the program given - in this case, bash: keyctl watch_session /tmp/nlog /tmp/gclog 5 bash (2) In the shell, set the maximum number of files to, say, 99: ulimit -n 99 (3) Add 200 keyrings: for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done (4) Try to watch all of the keyrings: for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done This should fail when the number of watches belonging to the user hits 99. (5) Remove all the keyrings and all of those watches should go away: for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done (6) Kill off the watch queue by exiting the shell spawned by watch_session. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Linus Torvalds Signed-off-by: David Howells Reviewed-by: Jarkko Sakkinen --- include/linux/sched/user.h | 3 +++ kernel/watch_queue.c | 8 ++++++++ 2 files changed, 11 insertions(+) diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 917d88edb7b9..a8ec3b6093fc 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -36,6 +36,9 @@ struct user_struct { defined(CONFIG_NET) || defined(CONFIG_IO_URING) atomic_long_t locked_vm; #endif +#ifdef CONFIG_WATCH_QUEUE + atomic_t nr_watches; /* The number of watches this user currently has */ +#endif /* Miscellaneous per-user rate limit */ struct ratelimit_state ratelimit; diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c index f74020f6bd9d..0ef8f65bd2d7 100644 --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -393,6 +393,7 @@ static void free_watch(struct rcu_head *rcu) struct watch *watch = container_of(rcu, struct watch, rcu); put_watch_queue(rcu_access_pointer(watch->queue)); + atomic_dec(&watch->cred->user->nr_watches); put_cred(watch->cred); } @@ -452,6 +453,13 @@ int add_watch_to_object(struct watch *watch, struct watch_list *wlist) watch->cred = get_current_cred(); rcu_assign_pointer(watch->watch_list, wlist); + if (atomic_inc_return(&watch->cred->user->nr_watches) > + task_rlimit(current, RLIMIT_NOFILE)) { + atomic_dec(&watch->cred->user->nr_watches); + put_cred(watch->cred); + return -EAGAIN; + } + spin_lock_bh(&wqueue->lock); kref_get(&wqueue->usage); kref_get(&watch->usage); From patchwork Mon Aug 3 13:06:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11698051 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 931C313B1 for ; Mon, 3 Aug 2020 13:06:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B55E2076B for ; Mon, 3 Aug 2020 13:06:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WOe4nwI7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727815AbgHCNGr (ORCPT ); Mon, 3 Aug 2020 09:06:47 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:36889 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727785AbgHCNGr (ORCPT ); Mon, 3 Aug 2020 09:06:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596460006; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ylt84/QjuXGByWDOUhtjvYR5Fe+CedjDeH1XsQdkc6w=; b=WOe4nwI7C1e8gDhclejmEILJJ8eS85SQf6BwBLqeh3jvonxVJ+n8eMscW920kQZBZmKMe8 NL31MOgwdZxu5SgL6xJ0zKveSDi25Jsf/I4uozZdtBD4kCFh9LkKNMXQdha5ntVZ0Hdyl7 BpLoARJ14DPiFQLuKoSK9YiwC88gx3Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-506-XkGgyoVaPC-uDhmHYj-5Pg-1; Mon, 03 Aug 2020 09:06:42 -0400 X-MC-Unique: XkGgyoVaPC-uDhmHYj-5Pg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AC3008017FB; Mon, 3 Aug 2020 13:06:40 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-32.rdu2.redhat.com [10.10.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3399471760; Mon, 3 Aug 2020 13:06:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 2/5] watch_queue: Make watch_sizeof() check record size [ver #2] From: David Howells To: viro@zeniv.linux.org.uk Cc: Miklos Szeredi , dhowells@redhat.com, torvalds@linux-foundation.org, casey@schaufler-ca.com, sds@tycho.nsa.gov, nicolas.dichtel@6wind.com, raven@themaw.net, christian@brauner.io, jlayton@redhat.com, kzak@redhat.com, mszeredi@redhat.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 03 Aug 2020 14:06:37 +0100 Message-ID: <159645999738.1779777.4882634293182825979.stgit@warthog.procyon.org.uk> In-Reply-To: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> References: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Make watch_sizeof() give a build error if the size of the struct won't fit into the size field in the header. Reported-by: Miklos Szeredi Signed-off-by: David Howells --- include/linux/watch_queue.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h index 5e08db2adc31..38e04c7a7951 100644 --- a/include/linux/watch_queue.h +++ b/include/linux/watch_queue.h @@ -120,7 +120,12 @@ static inline void remove_watch_list(struct watch_list *wlist, u64 id) * watch_sizeof - Calculate the information part of the size of a watch record, * given the structure size. */ -#define watch_sizeof(STRUCT) (sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT) +#define watch_sizeof(STRUCT) \ + ({ \ + size_t max = WATCH_INFO_LENGTH >> WATCH_INFO_LENGTH__SHIFT; \ + BUILD_BUG_ON(sizeof(STRUCT) > max); \ + sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT; \ + }) #endif From patchwork Mon Aug 3 13:06:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11698055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E341913 for ; Mon, 3 Aug 2020 13:06:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 06520206D7 for ; Mon, 3 Aug 2020 13:06:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dtLs8wc7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727907AbgHCNGy (ORCPT ); Mon, 3 Aug 2020 09:06:54 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:32590 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726946AbgHCNGy (ORCPT ); Mon, 3 Aug 2020 09:06:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596460012; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rnmF7PFcUQLOvfsrRmGZL1mkv1G/Vz60+pFjt89D08Q=; b=dtLs8wc7zC8c8aXq14TJV8JMc6BcGcQlnh4VZsdXFAxavOfYf4AjIqfzitpddxfxPZKJXM 8vwliwq+Qeg2tLE0h4z7q0e0QVBGJ8bMgAeSakm9or0ZUZGX1X28J44fV6rqBgjyPfxyWJ LaO8na6B2sxjc6zf+CX7TLY5NDfEeb0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-443-DJMHCUvMPvubiTk4YXaZ6g-1; Mon, 03 Aug 2020 09:06:51 -0400 X-MC-Unique: DJMHCUvMPvubiTk4YXaZ6g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 60FF5800685; Mon, 3 Aug 2020 13:06:49 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-32.rdu2.redhat.com [10.10.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1BA21A8EC; Mon, 3 Aug 2020 13:06:46 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 3/5] watch_queue: Add security hooks to rule on setting mount watches [ver #2] From: David Howells To: viro@zeniv.linux.org.uk Cc: James Morris , Casey Schaufler , Stephen Smalley , linux-security-module@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, casey@schaufler-ca.com, sds@tycho.nsa.gov, nicolas.dichtel@6wind.com, raven@themaw.net, christian@brauner.io, jlayton@redhat.com, kzak@redhat.com, mszeredi@redhat.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 03 Aug 2020 14:06:45 +0100 Message-ID: <159646000592.1779777.17854239538736953711.stgit@warthog.procyon.org.uk> In-Reply-To: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> References: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a security hook that will allow an LSM to rule on whether or not a watch may be set on a mount. Signed-off-by: David Howells cc: James Morris cc: Casey Schaufler cc: Stephen Smalley cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hook_defs.h | 3 +++ include/linux/lsm_hooks.h | 6 ++++++ include/linux/security.h | 8 ++++++++ security/security.c | 7 +++++++ 4 files changed, 24 insertions(+) diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index af998f93d256..f6eaf8bd617b 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -264,6 +264,9 @@ LSM_HOOK(int, 0, post_notification, const struct cred *w_cred, #if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) LSM_HOOK(int, 0, watch_key, struct key *key) #endif /* CONFIG_SECURITY && CONFIG_KEY_NOTIFICATIONS */ +#ifdef CONFIG_MOUNT_NOTIFICATIONS +LSM_HOOK(int, 0, watch_mount, struct watch *watch, struct path *path) +#endif #ifdef CONFIG_SECURITY_NETWORK LSM_HOOK(int, 0, unix_stream_connect, struct sock *sock, struct sock *other, diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 95b7c1d32062..56275145b91d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1468,6 +1468,12 @@ * from a key or keyring. * @key: The key to watch. * + * @watch_mount: + * Check to see if a process is allowed to watch for mount topology change + * notifications on a mount subtree. + * @watch: The watch object + * @path: The root of the subtree to watch. + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * diff --git a/include/linux/security.h b/include/linux/security.h index 0a0a03b36a3b..318fdfe7f4d6 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1314,6 +1314,14 @@ static inline int security_watch_key(struct key *key) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_MOUNT_NOTIFICATIONS) +int security_watch_mount(struct watch *watch, struct path *path); +#else +static inline int security_watch_mount(struct watch *watch, struct path *path) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/security.c b/security/security.c index 70a7ad357bc6..3cdf5039f727 100644 --- a/security/security.c +++ b/security/security.c @@ -2067,6 +2067,13 @@ int security_watch_key(struct key *key) } #endif +#ifdef CONFIG_MOUNT_NOTIFICATIONS +int security_watch_mount(struct watch *watch, struct path *path) +{ + return call_int_hook(watch_mount, 0, watch, path); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) From patchwork Mon Aug 3 13:06:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11698059 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32CFC913 for ; Mon, 3 Aug 2020 13:07:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02CB72076B for ; Mon, 3 Aug 2020 13:07:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AiN4FUE5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728030AbgHCNHO (ORCPT ); Mon, 3 Aug 2020 09:07:14 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:41523 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726946AbgHCNHK (ORCPT ); Mon, 3 Aug 2020 09:07:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596460026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cea/IehPpoGxuAOZbqQXRBwhcpKtg6PHiqDM+Rx6Uv0=; b=AiN4FUE5NlYX4trfjthZJfHO26znfza2JuGmVfK2+JGfyir82eeQO3nh8IjbgfSmjamv1y Ddr9YL8Cw9Oss9yX3ti2+FDhFna3+IvZ4GspEBSpsGMID6RcRUD5d4QzjwneZMNN7hiLh4 vYLTf+pYfiozPeezWyUAeV4q/IWTeD4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-256-yhIRpZSsP4-maYXKFtJqfA-1; Mon, 03 Aug 2020 09:07:01 -0400 X-MC-Unique: yhIRpZSsP4-maYXKFtJqfA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 97D3618C63C7; Mon, 3 Aug 2020 13:06:58 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-32.rdu2.redhat.com [10.10.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6DCDA8AD1C; Mon, 3 Aug 2020 13:06:55 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 4/5] watch_queue: Implement mount topology and attribute change notifications [ver #2] From: David Howells To: viro@zeniv.linux.org.uk Cc: dhowells@redhat.com, torvalds@linux-foundation.org, casey@schaufler-ca.com, sds@tycho.nsa.gov, nicolas.dichtel@6wind.com, raven@themaw.net, christian@brauner.io, jlayton@redhat.com, kzak@redhat.com, mszeredi@redhat.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 03 Aug 2020 14:06:54 +0100 Message-ID: <159646001456.1779777.5833836537798006352.stgit@warthog.procyon.org.uk> In-Reply-To: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> References: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a mount notification facility whereby notifications about changes in mount topology and configuration can be received. Note that this only covers vfsmount topology changes and not superblock events. A separate facility will be added for that. Firstly, a watch queue needs to be created: pipe2(fds, O_NOTIFICATION_PIPE); ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_MOUNT_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_mount(AT_FDCWD, "/", 0, fds[1], 0x02); In this case, it would let me monitor the mount topology subtree rooted at "/" for events. Mount notifications propagate up the tree towards the root, so a watch will catch all of the events happening in the subtree rooted at the watch. After setting the watch, records will be placed into the queue when, for example, as superblock switches between read-write and read-only. Records are of the following format: struct mount_notification { struct watch_notification watch; __u64 triggered_on; __u64 auxiliary_mount; } *n; Where: n->watch.type will be WATCH_TYPE_MOUNT_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_MOUNT_NEW_MOUNT. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the fifth argument to watch_mount(), shifted. n->watch.info & NOTIFY_MOUNT_IN_SUBTREE if true indicates that the notification was generated in the mount subtree rooted at the watch, and not actually in the watch itself. n->watch.info & NOTIFY_MOUNT_IS_RECURSIVE if true indicates that the notification was generated by an event (eg. SETATTR) that was applied recursively. The notification is only generated for the object that initially triggered it. n->watch.info & NOTIFY_MOUNT_IS_NOW_RO will be used for NOTIFY_MOUNT_READONLY, being set if the mount becomes R/O, and being cleared otherwise, and for NOTIFY_MOUNT_NEW_MOUNT, being set if the new mount is readonly. n->watch.info & NOTIFY_MOUNT_IS_SUBMOUNT if true indicates that the NOTIFY_MOUNT_NEW_MOUNT notification is in response to a mount performed by the kernel (e.g. an automount). n->triggered_on indicates the ID of the mount to which the change was accounted (e.g. the new parent of a new mount). n->axiliary_mount indicates the ID of an additional mount that was affected (e.g. a new mount itself) or 0. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells --- Documentation/watch_queue.rst | 12 + arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 fs/Kconfig | 9 + fs/Makefile | 1 fs/mount.h | 18 ++ fs/mount_notify.c | 222 +++++++++++++++++++++++++++ fs/namespace.c | 22 +++ include/linux/dcache.h | 1 include/linux/syscalls.h | 2 include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/watch_queue.h | 31 ++++ kernel/sys_ni.c | 3 29 files changed, 341 insertions(+), 4 deletions(-) create mode 100644 fs/mount_notify.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 849fad6893ef..3e647992be31 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -8,6 +8,7 @@ opened by userspace. This can be used in conjunction with:: * Key/keyring notifications + * Mount notifications. The notifications buffers can be enabled by: @@ -233,6 +234,11 @@ Any particular buffer can be fed from multiple sources. Sources include: See Documentation/security/keys/core.rst for more information. + * WATCH_TYPE_MOUNT_NOTIFY + + Notifications of this type indicate changes to mount attributes and the + mount topology within the subtree at the indicated point. + Event Filtering =============== @@ -292,9 +298,10 @@ A buffer is created with something like the following:: pipe2(fds, O_TMPFILE); ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256); -It can then be set to receive keyring change notifications:: +It can then be set to receive notifications:: keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fds[1], 0x01); + watch_mount(AT_FDCWD, "/", 0, fds[1], 0x02); The notifications can then be consumed by something like the following:: @@ -331,6 +338,9 @@ The notifications can then be consumed by something like the following:: case WATCH_TYPE_KEY_NOTIFY: saw_key_change(&n.n); break; + case WATCH_TYPE_MOUNT_NOTIFY: + saw_mount_change(&n.n); + break; } p += len; diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 5ddd128d4b7a..b6cf8403da35 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -478,3 +478,4 @@ 547 common openat2 sys_openat2 548 common pidfd_getfd sys_pidfd_getfd 549 common faccessat2 sys_faccessat2 +550 common watch_mount sys_watch_mount diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index d5cae5ffede0..27cc1f53f4a0 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -452,3 +452,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 3b859596840d..b3b2019f8d16 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 440 +#define __NR_compat_syscalls 441 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 6d95d0c8bf2f..4f9cf98cdf0f 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2) __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#define __NR_watch_mount 440 +__SYSCALL(__NR_watch_mount, sys_watch_mount) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 49e325b604b3..fc6d87903781 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -359,3 +359,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index f71b1bbcc198..c671aa0e4d25 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index edacc4561f2b..65cc53f129ef 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -444,3 +444,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index f777141f5256..7f034a239930 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -377,3 +377,4 @@ 437 n32 openat2 sys_openat2 438 n32 pidfd_getfd sys_pidfd_getfd 439 n32 faccessat2 sys_faccessat2 +440 n32 watch_mount sys_watch_mount diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index da8c76394e17..d39b90de3642 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -353,3 +353,4 @@ 437 n64 openat2 sys_openat2 438 n64 pidfd_getfd sys_pidfd_getfd 439 n64 faccessat2 sys_faccessat2 +440 n64 watch_mount sys_watch_mount diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 13280625d312..09f426cb45b1 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -426,3 +426,4 @@ 437 o32 openat2 sys_openat2 438 o32 pidfd_getfd sys_pidfd_getfd 439 o32 faccessat2 sys_faccessat2 +440 o32 watch_mount sys_watch_mount diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 5a758fa6ec52..52ff3454baa1 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index f833a3190822..10b7ed3c7a1b 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -528,3 +528,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index bfdcb7633957..86f317bf52df 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 437 common openat2 sys_openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount sys_watch_mount diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index acc35daa1b79..0bb0f0b372c7 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8004a276cb74..369ab65c1e9a 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -484,3 +484,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index d8f8a1a69ed1..e760ba92c58d 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -443,3 +443,4 @@ 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd 439 i386 faccessat2 sys_faccessat2 +440 i386 watch_mount sys_watch_mount diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 78847b32e137..5b58621d4f75 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -360,6 +360,7 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 69d0d73876b3..5b28ee39f70f 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -409,3 +409,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common watch_mount sys_watch_mount diff --git a/fs/Kconfig b/fs/Kconfig index a88aa3af73c1..1a55e56d5c54 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -117,6 +117,15 @@ source "fs/verity/Kconfig" source "fs/notify/Kconfig" +config MOUNT_NOTIFICATIONS + bool "Mount topology change notifications" + select WATCH_QUEUE + help + This option provides support for getting change notifications on the + mount tree topology. This makes use of the /dev/watch_queue misc + device to handle the notification buffer and provides the + mount_notify() system call to enable/disable watchpoints. + source "fs/quota/Kconfig" source "fs/autofs/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 2ce5112b02c8..dd0d87e2ef19 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -22,6 +22,7 @@ obj-y += no-block.o endif obj-$(CONFIG_PROC_FS) += proc_namespace.o +obj-$(CONFIG_MOUNT_NOTIFICATIONS) += mount_notify.o obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o diff --git a/fs/mount.h b/fs/mount.h index c7abb7b394d8..85456a5f5a3a 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -4,6 +4,7 @@ #include #include #include +#include struct mnt_namespace { atomic_t count; @@ -78,6 +79,9 @@ struct mount { int mnt_expiry_mark; /* true if marked for expiry */ struct hlist_head mnt_pins; struct hlist_head mnt_stuck_children; +#ifdef CONFIG_MOUNT_NOTIFICATIONS + struct watch_list *mnt_watchers; /* Watches on dentries within this mount */ +#endif } __randomize_layout; #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */ @@ -159,3 +163,17 @@ static inline bool is_anon_ns(struct mnt_namespace *ns) } extern void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor); + +#ifdef CONFIG_MOUNT_NOTIFICATIONS +extern void notify_mount(struct mount *triggered, + struct mount *aux, + enum mount_notification_subtype subtype, + u32 info_flags); +#else +static inline void notify_mount(struct mount *triggered, + struct mount *aux, + enum mount_notification_subtype subtype, + u32 info_flags) +{ +} +#endif diff --git a/fs/mount_notify.c b/fs/mount_notify.c new file mode 100644 index 000000000000..44f570e4cebe --- /dev/null +++ b/fs/mount_notify.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Provide mount topology/attribute change notifications. + * + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include "mount.h" + +/* + * Post mount notifications to all watches going rootwards along the tree. + * + * Must be called with the mount_lock held. + */ +static void post_mount_notification(struct mount *changed, + struct mount_notification *notify) +{ + const struct cred *cred = current_cred(); + struct path cursor; + struct mount *mnt; + unsigned seq; + + seq = 0; + rcu_read_lock(); +restart: + cursor.mnt = &changed->mnt; + cursor.dentry = changed->mnt.mnt_root; + mnt = real_mount(cursor.mnt); + notify->watch.info &= ~NOTIFY_MOUNT_IN_SUBTREE; + + read_seqbegin_or_lock(&rename_lock, &seq); + for (;;) { + if (mnt->mnt_watchers && + !hlist_empty(&mnt->mnt_watchers->watchers)) { + if (cursor.dentry->d_flags & DCACHE_MOUNT_WATCH) + post_watch_notification(mnt->mnt_watchers, + ¬ify->watch, cred, + (unsigned long)cursor.dentry); + } else { + cursor.dentry = mnt->mnt.mnt_root; + } + notify->watch.info |= NOTIFY_MOUNT_IN_SUBTREE; + + if (cursor.dentry == cursor.mnt->mnt_root || + IS_ROOT(cursor.dentry)) { + struct mount *parent = READ_ONCE(mnt->mnt_parent); + + /* Escaped? */ + if (cursor.dentry != cursor.mnt->mnt_root) + break; + + /* Global root? */ + if (mnt == parent) + break; + + cursor.dentry = READ_ONCE(mnt->mnt_mountpoint); + mnt = parent; + cursor.mnt = &mnt->mnt; + } else { + cursor.dentry = cursor.dentry->d_parent; + } + } + + if (need_seqretry(&rename_lock, seq)) { + seq = 1; + goto restart; + } + + done_seqretry(&rename_lock, seq); + rcu_read_unlock(); +} + +/* + * Generate a mount notification. + */ +void notify_mount(struct mount *trigger, + struct mount *aux, + enum mount_notification_subtype subtype, + u32 info_flags) +{ + + struct mount_notification n; + + memset(&n, 0, sizeof(n)); + n.watch.type = WATCH_TYPE_MOUNT_NOTIFY; + n.watch.subtype = subtype; + n.watch.info = info_flags | watch_sizeof(n); + n.triggered_on = trigger->mnt_id; + + switch (subtype) { + case NOTIFY_MOUNT_EXPIRY: + case NOTIFY_MOUNT_READONLY: + case NOTIFY_MOUNT_SETATTR: + break; + + case NOTIFY_MOUNT_NEW_MOUNT: + case NOTIFY_MOUNT_UNMOUNT: + case NOTIFY_MOUNT_MOVE_FROM: + case NOTIFY_MOUNT_MOVE_TO: + n.auxiliary_mount = aux->mnt_id; + break; + + default: + BUG(); + } + + post_mount_notification(trigger, &n); +} + +static void release_mount_watch(struct watch *watch) +{ + struct dentry *dentry = (struct dentry *)(unsigned long)watch->id; + + dput(dentry); +} + +/** + * sys_watch_mount - Watch for mount topology/attribute changes + * @dfd: Base directory to pathwalk from or fd referring to mount. + * @filename: Path to mount to place the watch upon + * @at_flags: Pathwalk control flags + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + */ +SYSCALL_DEFINE5(watch_mount, + int, dfd, + const char __user *, filename, + unsigned int, at_flags, + int, watch_fd, + int, watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct mount *m; + struct path path; + unsigned int lookup_flags = + LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; + int ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0) + return -EINVAL; + if (at_flags & AT_NO_AUTOMOUNT) + lookup_flags &= ~LOOKUP_AUTOMOUNT; + if (at_flags & AT_EMPTY_PATH) + lookup_flags |= LOOKUP_EMPTY; + + ret = user_path_at(dfd, filename, lookup_flags, &path); + if (ret) + return ret; + + ret = inode_permission(path.dentry->d_inode, MAY_EXEC); + if (ret) + goto err_path; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) + goto err_path; + + m = real_mount(path.mnt); + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!READ_ONCE(m->mnt_watchers)) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, release_mount_watch); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = (unsigned long)path.dentry; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_mount(watch, &path); + if (ret < 0) + goto err_watch; + + down_write(&m->mnt.mnt_sb->s_umount); + if (!m->mnt_watchers) { + m->mnt_watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, m->mnt_watchers); + if (ret == 0) { + spin_lock(&path.dentry->d_lock); + path.dentry->d_flags |= DCACHE_MOUNT_WATCH; + spin_unlock(&path.dentry->d_lock); + dget(path.dentry); + watch = NULL; + } + up_write(&m->mnt.mnt_sb->s_umount); + } else { + down_write(&m->mnt.mnt_sb->s_umount); + ret = remove_watch_from_object(m->mnt_watchers, wqueue, + (unsigned long)path.dentry, + false); + up_write(&m->mnt.mnt_sb->s_umount); + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_path: + path_put(&path); + return ret; +} diff --git a/fs/namespace.c b/fs/namespace.c index 4a0f600a3328..73ff5bf0c9af 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -498,6 +498,9 @@ static int mnt_make_readonly(struct mount *mnt) smp_wmb(); mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD; unlock_mount_hash(); + if (ret == 0) + notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, + NOTIFY_MOUNT_IS_NOW_RO); return ret; } @@ -506,6 +509,7 @@ static int __mnt_unmake_readonly(struct mount *mnt) lock_mount_hash(); mnt->mnt.mnt_flags &= ~MNT_READONLY; unlock_mount_hash(); + notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0); return 0; } @@ -835,6 +839,7 @@ static struct mountpoint *unhash_mnt(struct mount *mnt) */ static void umount_mnt(struct mount *mnt) { + notify_mount(mnt->mnt_parent, mnt, NOTIFY_MOUNT_UNMOUNT, 0); put_mountpoint(unhash_mnt(mnt)); } @@ -1175,6 +1180,11 @@ static void mntput_no_expire(struct mount *mnt) mnt->mnt.mnt_flags |= MNT_DOOMED; rcu_read_unlock(); +#ifdef CONFIG_MOUNT_NOTIFICATIONS + if (mnt->mnt_watchers) + remove_watch_list(mnt->mnt_watchers, mnt->mnt_id); +#endif + list_del(&mnt->mnt_instance); if (unlikely(!list_empty(&mnt->mnt_mounts))) { @@ -1503,6 +1513,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how) p = list_first_entry(&tmp_list, struct mount, mnt_list); list_del_init(&p->mnt_expire); list_del_init(&p->mnt_list); + ns = p->mnt_ns; if (ns) { ns->mounts--; @@ -2137,7 +2148,10 @@ static int attach_recursive_mnt(struct mount *source_mnt, } if (moving) { unhash_mnt(source_mnt); + notify_mount(source_mnt->mnt_parent, source_mnt, + NOTIFY_MOUNT_MOVE_FROM, 0); attach_mnt(source_mnt, dest_mnt, dest_mp); + notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_MOVE_TO, 0); touch_mnt_namespace(source_mnt->mnt_ns); } else { if (source_mnt->mnt_ns) { @@ -2146,6 +2160,11 @@ static int attach_recursive_mnt(struct mount *source_mnt, } mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt); commit_tree(source_mnt); + notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT, + (source_mnt->mnt.mnt_sb->s_flags & SB_RDONLY ? + NOTIFY_MOUNT_IS_NOW_RO : 0) | + (source_mnt->mnt.mnt_sb->s_flags & SB_SUBMOUNT ? + NOTIFY_MOUNT_IS_SUBMOUNT : 0)); } hlist_for_each_entry_safe(child, n, &tree_list, mnt_hash) { @@ -2522,6 +2541,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags) mnt->mnt.mnt_flags = mnt_flags; touch_mnt_namespace(mnt->mnt_ns); unlock_mount_hash(); + notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR, + (mnt_flags & SB_RDONLY ? NOTIFY_MOUNT_IS_NOW_RO : 0)); } static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt) @@ -2992,6 +3013,7 @@ void mark_mounts_for_expiry(struct list_head *mounts) propagate_mount_busy(mnt, 1)) continue; list_move(&mnt->mnt_expire, &graveyard); + notify_mount(mnt, NULL, NOTIFY_MOUNT_EXPIRY, 0); } while (!list_empty(&graveyard)) { mnt = list_first_entry(&graveyard, struct mount, mnt_expire); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index a81f0c3cf352..a94c551c62a3 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -219,6 +219,7 @@ struct dentry_operations { #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR 0x20000000 #define DCACHE_NORCU 0x40000000 /* No RCU delay for freeing */ +#define DCACHE_MOUNT_WATCH 0x80000000 /* There's a mount watch here */ extern seqlock_t rename_lock; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b951a87da987..88d03fd627ab 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1005,6 +1005,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_watch_mount(int dfd, const char __user *path, + unsigned int at_flags, int watch_fd, int watch_id); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index f4a01305d9a6..fcdca8c7d30a 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2) __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#define __NR_watch_mount 440 +__SYSCALL(__NR_watch_mount, sys_watch_mount) #undef __NR_syscalls -#define __NR_syscalls 440 +#define __NR_syscalls 441 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index c3d8320b5d3a..83b11242c10e 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -14,7 +14,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE__NR = 2 + WATCH_TYPE_MOUNT_NOTIFY = 2, /* Mount topology change notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -101,4 +102,32 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of mount topology change notification. + */ +enum mount_notification_subtype { + NOTIFY_MOUNT_NEW_MOUNT = 0, /* New mount added */ + NOTIFY_MOUNT_UNMOUNT = 1, /* Mount removed manually */ + NOTIFY_MOUNT_EXPIRY = 2, /* Automount expired */ + NOTIFY_MOUNT_READONLY = 3, /* Mount R/O state changed */ + NOTIFY_MOUNT_SETATTR = 4, /* Mount attributes changed */ + NOTIFY_MOUNT_MOVE_FROM = 5, /* Mount moved from here */ + NOTIFY_MOUNT_MOVE_TO = 6, /* Mount moved to here (compare op_id) */ +}; + +#define NOTIFY_MOUNT_IN_SUBTREE WATCH_INFO_FLAG_0 /* Event not actually at watched dentry */ +#define NOTIFY_MOUNT_IS_NOW_RO WATCH_INFO_FLAG_1 /* Mount changed to R/O */ +#define NOTIFY_MOUNT_IS_SUBMOUNT WATCH_INFO_FLAG_2 /* New mount is submount */ + +/* + * Mount topology/configuration change notification record. + * - watch.type = WATCH_TYPE_MOUNT_NOTIFY + * - watch.subtype = enum mount_notification_subtype + */ +struct mount_notification { + struct watch_notification watch; /* WATCH_TYPE_MOUNT_NOTIFY */ + __u64 triggered_on; /* The mount that triggered the notification */ + __u64 auxiliary_mount; /* Added/moved/removed mount or 0 */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 3b69a560a7ac..3e1c5c9d2efe 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -85,6 +85,9 @@ COND_SYSCALL(ioprio_get); /* fs/locks.c */ COND_SYSCALL(flock); +/* fs/mount_notify.c */ +COND_SYSCALL(watch_mount); + /* fs/namei.c */ /* fs/namespace.c */ From patchwork Mon Aug 3 13:07:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 11698057 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A96213B1 for ; Mon, 3 Aug 2020 13:07:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 325602076B for ; Mon, 3 Aug 2020 13:07:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QAiSaF5c" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728048AbgHCNHP (ORCPT ); Mon, 3 Aug 2020 09:07:15 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:55079 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728031AbgHCNHL (ORCPT ); Mon, 3 Aug 2020 09:07:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596460030; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+akrbQCHsfmHstoMDlxnOA8hU+6bUdSUePBNiUwfOiE=; b=QAiSaF5c5EMxIaz8gxCz6jli4QmdLCnwWiOpuQTj0XuiLbOmPMk/b6yU/EG6TDAbVpP9PU 8568SRe2SLFRF2pDva4ZO3kjgqH3AeJex1TXigkRbrjpVKgLRanAWbc0XgUiETm5oWRq4P cYr5SipPCzcfi7wtJzyP3Ytw1Ehhaqw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-142-n-QvCHE6OKCHkayOe4njFw-1; Mon, 03 Aug 2020 09:07:08 -0400 X-MC-Unique: n-QvCHE6OKCHkayOe4njFw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B69457; Mon, 3 Aug 2020 13:07:07 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-32.rdu2.redhat.com [10.10.112.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B2495F9DC; Mon, 3 Aug 2020 13:07:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 5/5] watch_queue: sample: Display mount tree change notifications [ver #2] From: David Howells To: viro@zeniv.linux.org.uk Cc: dhowells@redhat.com, torvalds@linux-foundation.org, casey@schaufler-ca.com, sds@tycho.nsa.gov, nicolas.dichtel@6wind.com, raven@themaw.net, christian@brauner.io, jlayton@redhat.com, kzak@redhat.com, mszeredi@redhat.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 03 Aug 2020 14:07:03 +0100 Message-ID: <159646002382.1779777.15736595451734580261.stgit@warthog.procyon.org.uk> In-Reply-To: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> References: <159645997768.1779777.8286723139418624756.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is run like: ./watch_test and watches "/" for changes to the mount topology and the attributes of individual mount objects. # mount -t tmpfs none /mnt # mount -o remount,ro /mnt # mount -o remount,rw /mnt producing: # ./watch_test read() = 16 NOTIFY[000]: ty=000002 sy=00 i=02000010 MOUNT 00000060 change=0[new_mount] aux=416 read() = 16 NOTIFY[000]: ty=000002 sy=04 i=02010010 MOUNT 000001a0 change=4[setattr] aux=0 read() = 16 NOTIFY[000]: ty=000002 sy=04 i=02010010 MOUNT 000001a0 change=4[setattr] aux=0 Signed-off-by: David Howells --- samples/watch_queue/watch_test.c | 41 +++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c index 46e618a897fe..d82554436152 100644 --- a/samples/watch_queue/watch_test.c +++ b/samples/watch_queue/watch_test.c @@ -26,6 +26,9 @@ #ifndef __NR_keyctl #define __NR_keyctl -1 #endif +#ifndef __NR_watch_mount +#define __NR_watch_mount -1 +#endif #define BUF_SIZE 256 @@ -58,6 +61,29 @@ static void saw_key_change(struct watch_notification *n, size_t len) k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); } +static const char *mount_subtypes[256] = { + [NOTIFY_MOUNT_NEW_MOUNT] = "new_mount", + [NOTIFY_MOUNT_UNMOUNT] = "unmount", + [NOTIFY_MOUNT_EXPIRY] = "expiry", + [NOTIFY_MOUNT_READONLY] = "readonly", + [NOTIFY_MOUNT_SETATTR] = "setattr", + [NOTIFY_MOUNT_MOVE_FROM] = "move_from", + [NOTIFY_MOUNT_MOVE_TO] = "move_to", +}; + +static void saw_mount_change(struct watch_notification *n, size_t len) +{ + struct mount_notification *m = (struct mount_notification *)n; + + if (len != sizeof(struct mount_notification)) + return; + + printf("MOUNT %08llx change=%u[%s] aux=%llx\n", + (unsigned long long)m->triggered_on, + n->subtype, mount_subtypes[n->subtype], + (unsigned long long)m->auxiliary_mount); +} + /* * Consume and display events. */ @@ -134,6 +160,9 @@ static void consumer(int fd) default: printf("other type\n"); break; + case WATCH_TYPE_MOUNT_NOTIFY: + saw_mount_change(&n.n, len); + break; } p += len; @@ -142,12 +171,17 @@ static void consumer(int fd) } static struct watch_notification_filter filter = { - .nr_filters = 1, + .nr_filters = 2, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, + [1] = { + .type = WATCH_TYPE_MOUNT_NOTIFY, + // Reject move-from notifications + .subtype_filter[0] = UINT_MAX & ~(1 << NOTIFY_MOUNT_MOVE_FROM), + }, }, }; @@ -181,6 +215,11 @@ int main(int argc, char **argv) exit(1); } + if (syscall(__NR_watch_mount, AT_FDCWD, "/", 0, fd, 0xde) == -1) { + perror("watch_mount"); + exit(1); + } + consumer(fd); exit(0); }