From patchwork Thu Jul 7 21:58:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7821AC43334 for ; Thu, 7 Jul 2022 21:58:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236233AbiGGV6e (ORCPT ); Thu, 7 Jul 2022 17:58:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236810AbiGGV6d (ORCPT ); Thu, 7 Jul 2022 17:58:33 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D8B2C1E3D7; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id 39EB52044E3A; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 39EB52044E3A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231112; bh=2Ap9b42+7PS01tMomF0UoTpbPfyX7eb8WdCg4A5a8S0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bGpxt66uQtd1VhAvn5csr2ozPYzS0YfJpTagNp2yUMxoH5Z6FmQy9uIONqDtoYNnF W7HOLIKFYPhSr+faQMGJiYbAFauPw9/vFIisWJ4iMtEK5DodUcRJiLt3b84vuwQ8jK 9AlxpoPcEwEZpyOAgFGFWz/UKWfE/btp4oraSFvs= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 1/7] tracing/user_events: Remove BROKEN and restore user_events.h location Date: Thu, 7 Jul 2022 14:58:22 -0700 Message-Id: <20220707215828.2021-2-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org After having discussions and addressing the ABI issues user_events can be now marked as working and used by others. As part of the BROKEN status, user_events.h was moved from its original uapi location to the kernel location. This needs to be moved back so it can be used by others. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Link: https://lore.kernel.org/all/1651771383.54437.1652370439159.JavaMail.zimbra@efficios.com/ Signed-off-by: Beau Belgrave --- include/{ => uapi}/linux/user_events.h | 0 kernel/trace/Kconfig | 1 - kernel/trace/trace_events_user.c | 5 ----- 3 files changed, 6 deletions(-) rename include/{ => uapi}/linux/user_events.h (100%) diff --git a/include/linux/user_events.h b/include/uapi/linux/user_events.h similarity index 100% rename from include/linux/user_events.h rename to include/uapi/linux/user_events.h diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 2c43e327a619..9bb54c0b3b2d 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -767,7 +767,6 @@ config USER_EVENTS bool "User trace events" select TRACING select DYNAMIC_EVENTS - depends on BROKEN || COMPILE_TEST # API needs to be straighten out help User trace events are user-defined trace events that can be used like an existing kernel trace event. User trace diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 4afc41e321ac..7bff4c8b90f2 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -19,12 +19,7 @@ #include #include #include -/* Reminder to move to uapi when everything works */ -#ifdef CONFIG_COMPILE_TEST -#include -#else #include -#endif #include "trace.h" #include "trace_dynevent.h" From patchwork Thu Jul 7 21:58:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50FD3C433EF for ; Thu, 7 Jul 2022 21:58:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236741AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236809AbiGGV6e (ORCPT ); Thu, 7 Jul 2022 17:58:34 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1324E1F2F6; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id 6CD022044E3C; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 6CD022044E3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231112; bh=1WSR4Xe7Rvc6a41/orqjnmy+ao/eNdC8QRAORPXiFcE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CU52FYJ1iLFZprtu7ImZJT+eQ98qDEeSOEp7Sxj1dqgN7kzpmm3G/z/OaJX/R/cDV MKuCUFCxCcT72Aq08eb4y3sUmGBWmFqVWTaGY9glo5fCWGmAhGVt33A/rMKAJznCQE iTqptaJaUSA7MWVqxbynWPNhmehgFwyRr+IT3zk0= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 2/7] tracing: Add namespace instance directory to tracefs Date: Thu, 7 Jul 2022 14:58:23 -0700 Message-Id: <20220707215828.2021-3-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Some tracing systems require a group or namespace isolation, such as user_events. The namespace directory in tracefs is a singleton like the instances directory. It also acts like the instances directory in that user-mode processes can create a directory within the namespace if they have appropriate permissions. This change only covers adding the ability for a tracing system to create the namespace directory. A system for adding and managing namespaces will reside within another tracing API. Link: https://lore.kernel.org/all/20220312010140.1880-1-beaub@linux.microsoft.com/ Signed-off-by: Beau Belgrave --- fs/tracefs/inode.c | 121 +++++++++++++++++++++++++++++++++++++--- include/linux/tracefs.h | 5 ++ 2 files changed, 119 insertions(+), 7 deletions(-) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index de7252715b12..aba1f9cb6b75 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -24,6 +24,11 @@ #define TRACEFS_DEFAULT_MODE 0700 +enum tracefs_dir_type { + TRACEFS_DIR_INSTANCES, + TRACEFS_DIR_NAMESPACES, +}; + static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered; @@ -50,6 +55,8 @@ static const struct file_operations tracefs_file_operations = { static struct tracefs_dir_ops { int (*mkdir)(const char *name); int (*rmdir)(const char *name); + int (*ns_mkdir)(const char *name); + int (*ns_rmdir)(const char *name); } tracefs_ops __ro_after_init; static char *get_dname(struct dentry *dentry) @@ -67,9 +74,8 @@ static char *get_dname(struct dentry *dentry) return name; } -static int tracefs_syscall_mkdir(struct user_namespace *mnt_userns, - struct inode *inode, struct dentry *dentry, - umode_t mode) +static int tracefs_syscall_mkdir_core(int type, struct inode *inode, + struct dentry *dentry) { char *name; int ret; @@ -84,7 +90,22 @@ static int tracefs_syscall_mkdir(struct user_namespace *mnt_userns, * mkdir routine to handle races. */ inode_unlock(inode); - ret = tracefs_ops.mkdir(name); + + switch (type) { + case TRACEFS_DIR_INSTANCES: + ret = tracefs_ops.mkdir(name); + break; + + case TRACEFS_DIR_NAMESPACES: + ret = tracefs_ops.ns_mkdir(name); + break; + + default: + pr_debug("tracefs: unknown mkdir type '%d'\n", type); + ret = -ENOENT; + break; + } + inode_lock(inode); kfree(name); @@ -92,7 +113,24 @@ static int tracefs_syscall_mkdir(struct user_namespace *mnt_userns, return ret; } -static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) +static int tracefs_syscall_mkdir(struct user_namespace *mnt_userns, + struct inode *inode, struct dentry *dentry, + umode_t mode) +{ + return tracefs_syscall_mkdir_core(TRACEFS_DIR_INSTANCES, + inode, dentry); +} + +static int tracefs_syscall_ns_mkdir(struct user_namespace *mnt_userns, + struct inode *inode, struct dentry *dentry, + umode_t mode) +{ + return tracefs_syscall_mkdir_core(TRACEFS_DIR_NAMESPACES, + inode, dentry); +} + +static int tracefs_syscall_rmdir_core(int type, struct inode *inode, + struct dentry *dentry) { char *name; int ret; @@ -111,7 +149,20 @@ static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) inode_unlock(inode); inode_unlock(d_inode(dentry)); - ret = tracefs_ops.rmdir(name); + switch (type) { + case TRACEFS_DIR_INSTANCES: + ret = tracefs_ops.rmdir(name); + break; + + case TRACEFS_DIR_NAMESPACES: + ret = tracefs_ops.ns_rmdir(name); + break; + + default: + pr_debug("tracefs: unknown rmdir type '%d'\n", type); + ret = -ENOENT; + break; + } inode_lock_nested(inode, I_MUTEX_PARENT); inode_lock(d_inode(dentry)); @@ -121,12 +172,30 @@ static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) return ret; } +static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) +{ + return tracefs_syscall_rmdir_core(TRACEFS_DIR_INSTANCES, + inode, dentry); +} + +static int tracefs_syscall_ns_rmdir(struct inode *inode, struct dentry *dentry) +{ + return tracefs_syscall_rmdir_core(TRACEFS_DIR_NAMESPACES, + inode, dentry); +} + static const struct inode_operations tracefs_dir_inode_operations = { .lookup = simple_lookup, .mkdir = tracefs_syscall_mkdir, .rmdir = tracefs_syscall_rmdir, }; +static const struct inode_operations tracefs_dir_inode_ns_operations = { + .lookup = simple_lookup, + .mkdir = tracefs_syscall_ns_mkdir, + .rmdir = tracefs_syscall_ns_rmdir, +}; + static struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); @@ -554,7 +623,7 @@ struct dentry *tracefs_create_dir(const char *name, struct dentry *parent) * Only one instances directory is allowed. * * The instances directory is special as it allows for mkdir and rmdir to - * to be done by userspace. When a mkdir or rmdir is performed, the inode + * be done by userspace. When a mkdir or rmdir is performed, the inode * locks are released and the methods passed in (@mkdir and @rmdir) are * called without locks and with the name of the directory being created * within the instances directory. @@ -582,6 +651,44 @@ __init struct dentry *tracefs_create_instance_dir(const char *name, return dentry; } +/** + * tracefs_create_namespace_dir - create the tracing namespaces directory + * @name: The name of the namespaces directory to create + * @parent: The parent directory that the namespaces directory will exist + * @mkdir: The function to call when a mkdir is performed. + * @rmdir: The function to call when a rmdir is performed. + * + * Only one namespaces directory is allowed. + * + * The namespaces directory is special as it allows for mkdir and rmdir to + * be done by userspace. When a mkdir or rmdir is performed, the inode + * locks are released and the methods passed in (@mkdir and @rmdir) are + * called without locks and with the name of the directory being created + * within the namespaces directory. + * + * Returns the dentry of the namespaces directory. + */ +__init struct dentry *tracefs_create_namespace_dir(const char *name, + struct dentry *parent, + int (*mkdir)(const char *name), + int (*rmdir)(const char *name)) +{ + struct dentry *dentry; + + /* Only allow one instance of the namespaces directory. */ + if (WARN_ON(tracefs_ops.ns_mkdir || tracefs_ops.ns_rmdir)) + return NULL; + + dentry = __create_dir(name, parent, &tracefs_dir_inode_ns_operations); + if (!dentry) + return NULL; + + tracefs_ops.ns_mkdir = mkdir; + tracefs_ops.ns_rmdir = rmdir; + + return dentry; +} + static void remove_one(struct dentry *victim) { simple_release_fs(&tracefs_mount, &tracefs_mount_count); diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 99912445974c..04870dee6c87 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -33,6 +33,11 @@ struct dentry *tracefs_create_instance_dir(const char *name, struct dentry *pare int (*mkdir)(const char *name), int (*rmdir)(const char *name)); +struct dentry *tracefs_create_namespace_dir(const char *name, + struct dentry *parent, + int (*mkdir)(const char *name), + int (*rmdir)(const char *name)); + bool tracefs_initialized(void); #endif /* CONFIG_TRACING */ From patchwork Thu Jul 7 21:58:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6743C433EF for ; Thu, 7 Jul 2022 21:58:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236900AbiGGV6i (ORCPT ); Thu, 7 Jul 2022 17:58:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236833AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1364F95BE; Thu, 7 Jul 2022 14:58:33 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id A512320A8980; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A512320A8980 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231112; bh=1VAir5NHQxI8UWFhYlpRVTEOaaQYJiPDRYand/igM1M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WByESls780h+/5vX5pLYVphpy0mbNnxf74ND36j2R8BRmy601+DQsY3aE8jft2xV+ N4eHnk7eRnP81Px7rQIEe1Wk2iTZ3a/+iKtePZZdIY8MMUBDhUQIZUNL6N6lUlQ2ds Brrvh8Kh9cfS6Q5gD1BHs6ble6VsAwhVbIcHucXU= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 3/7] tracing: Add tracing namespace API for systems to register with Date: Thu, 7 Jul 2022 14:58:24 -0700 Message-Id: <20220707215828.2021-4-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org User facing tracing systems, such as user_events and LTTng, sometimes require multiple events with the same name, but from different containers. This can cause event name conflicts and leak out details of events not owned by the container. To create a tracing namespace, run mkdir under the new tracefs directory named "namespaces" (/sys/kernel/tracing/namespaces typically). This directory largely works the same as "instances" where the new directory will have files populated within it via the tracing system automatically. The tracing systems will put their files under the "root" directory, which is meant to be the directory that you can bind mount out to containers. The "options" file is meant to allow operators to configure the namespaces via the registered systems. The tracing namespace allows those user facing systems to register with the tracing namespace. When an operator creates a namespace directory under /sys/kernel/tracing/namespaces the registered systems will have their create operation run for that namespace. The systems can then create files in the new directory used for tracing via user programs. These files will then isolate events between each namespace the operator creates. Typically the system name of the event will have the tracing namespace name appended onto the system name. For example, if a namespace directory was created named "mygroup", then the system name would be ".mygroup". Since the system names are different for each namespace, per-namespace recording/playback can be done by specifying the per-namespace system name and the event name. However, this decision is up to the registered tracing system for each namespace. The operator can then bind mount each namespace directory into containers. This provides isolation between events and containers, if required. It's also possible for several containers to share an isolation via bind mounts instead of having an isolation per-container. With these files being isolated, different permissions can be added for these files than normal tracefs files. This helps scenarios where non-admin processes would like to trace, but currently cannot. Link: https://lore.kernel.org/all/20220312010140.1880-1-beaub@linux.microsoft.com/ Signed-off-by: Beau Belgrave --- kernel/trace/Kconfig | 11 + kernel/trace/Makefile | 1 + kernel/trace/trace.c | 39 +++ kernel/trace/trace_namespace.c | 567 +++++++++++++++++++++++++++++++++ kernel/trace/trace_namespace.h | 57 ++++ 5 files changed, 675 insertions(+) create mode 100644 kernel/trace/trace_namespace.c create mode 100644 kernel/trace/trace_namespace.h diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 9bb54c0b3b2d..89550287275c 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -777,6 +777,17 @@ config USER_EVENTS If in doubt, say N. +config TRACE_NAMESPACE + bool "Tracing namespaces" + select TRACING + help + Tracing namespaces are isolated directories within tracefs + that can be used to isolate tracing events from other events + and processes. Typically this is most useful for user-defined + trace events. + + If in doubt, say N. + config HIST_TRIGGERS bool "Histogram triggers" depends on ARCH_HAVE_NMI_SAFE_CMPXCHG diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 0d261774d6f3..b88241164eb3 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -87,6 +87,7 @@ obj-$(CONFIG_TRACE_EVENT_INJECT) += trace_events_inject.o obj-$(CONFIG_SYNTH_EVENTS) += trace_events_synth.o obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o obj-$(CONFIG_USER_EVENTS) += trace_events_user.o +obj-$(CONFIG_TRACE_NAMESPACE) += trace_namespace.o obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o obj-$(CONFIG_TRACEPOINTS) += error_report-traces.o diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f400800bc910..4fdc35db8d5f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -53,6 +53,10 @@ #include "trace.h" #include "trace_output.h" +#ifdef CONFIG_TRACE_NAMESPACE +#include "trace_namespace.h" +#endif + /* * On boot up, the ring buffer is set to the minimum size, so that * we do not waste memory on systems that are not using tracing. @@ -9079,6 +9083,10 @@ static const struct file_operations buffer_percent_fops = { static struct dentry *trace_instance_dir; +#ifdef CONFIG_TRACE_NAMESPACE +static struct dentry *trace_namespace_dir; +#endif + static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer); @@ -9321,6 +9329,18 @@ static int instance_mkdir(const char *name) return ret; } +#ifdef CONFIG_TRACE_NAMESPACE +static int namespace_mkdir(const char *name) +{ + return trace_namespace_add(name); +} + +static int namespace_rmdir(const char *name) +{ + return trace_namespace_remove(name); +} +#endif + /** * trace_array_get_by_name - Create/Lookup a trace array, given its name. * @name: The name of the trace array to be looked up/created. @@ -9472,6 +9492,21 @@ static __init void create_trace_instances(struct dentry *d_tracer) mutex_unlock(&event_mutex); } +#ifdef CONFIG_TRACE_NAMESPACE +static __init void create_trace_namespaces(struct dentry *d_tracer) +{ + trace_namespace_dir = tracefs_create_namespace_dir("namespaces", + d_tracer, + namespace_mkdir, + namespace_rmdir); + + if (MEM_FAIL(!trace_instance_dir, "Failed to create namespaces directory\n")) + return; + + trace_namespace_init(trace_namespace_dir); +} +#endif + static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) { @@ -9760,6 +9795,10 @@ static __init void tracer_init_tracefs_work_func(struct work_struct *work) create_trace_instances(NULL); +#ifdef CONFIG_TRACE_NAMESPACE + create_trace_namespaces(NULL); +#endif + update_tracer_options(&global_trace); } diff --git a/kernel/trace/trace_namespace.c b/kernel/trace/trace_namespace.c new file mode 100644 index 000000000000..934649e4db49 --- /dev/null +++ b/kernel/trace/trace_namespace.c @@ -0,0 +1,567 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2022, Microsoft Corporation. + * + * Authors: + * Beau Belgrave + */ + +#include +#include +#include +#include +#include +#include +#include "trace.h" +#include "trace_namespace.h" + +static struct dentry *root_namespace_dir; +#define TRACE_ROOT_DIR_NAME "root" +#define TRACE_OPTIONS_NAME "options" + +static LIST_HEAD(namespace_systems); +static LIST_HEAD(namespace_groups); +static DEFINE_IDR(namespace_idr); + +/* + * Stores a registered system operations. + */ +struct namespace_system { + struct list_head link; + struct trace_namespace_operations *ops; +}; + +/* + * Stores namespace specific data about the group. The group can either + * be looked up by name or the id of the trace_namespace property. + */ +struct namespace_group { + struct list_head link; + struct trace_namespace ns; + refcount_t refcnt; + struct dentry *trace_dir; + struct dentry *trace_options; +}; + +/* Current parsing group to allow using trace_parse_run_command */ +static struct namespace_group *parsing_group; + +#define TRACE_NS_FROM_GROUP(group) (&(group)->ns) + +/* + * Runs the parse operation for each registered system for the group. + */ +static int namespace_systems_parse(struct namespace_group *group, + const char *command) +{ + struct list_head *head = &namespace_systems; + struct namespace_system *system; + struct trace_namespace *ns; + int ret = -ENODEV; + + ns = TRACE_NS_FROM_GROUP(group); + + list_for_each_entry(system, head, link) { + ret = system->ops->parse(ns, command); + + if (!ret || ret != -ECANCELED) + break; + } + + if (ret == -ECANCELED) + ret = -EINVAL; + + return ret; +} + + +/* + * Runs the is_busy operation for each registered system for the group. + */ +static bool namespace_systems_busy(struct namespace_group *group) +{ + struct list_head *head = &namespace_systems; + struct namespace_system *system; + struct trace_namespace *ns; + + ns = TRACE_NS_FROM_GROUP(group); + + list_for_each_entry(system, head, link) + if (system->ops->is_busy(ns)) + return true; + + return false; +} + +/* + * Runs the remove operation for each registered system for the group. + * + * NOTE: If a system has a failure it does not stop the other systems from + * having their remove operation run for the group. + */ +static int namespace_systems_remove(struct namespace_group *group, int max) +{ + struct list_head *head = &namespace_systems; + struct namespace_system *system; + struct trace_namespace *ns; + int error, ret = 0, i = 0; + + ns = TRACE_NS_FROM_GROUP(group); + + list_for_each_entry(system, head, link) { + error = system->ops->remove(ns); + i++; + + /* Save last error (if not no entity), but keep removing */ + if (error && error != -ENOENT) + ret = error; + + if (max != -1 && i >= max) + break; + } + + return ret; +} + +/* + * Runs the create operation for each registered system for the group. + * + * NOTE: If a system has a failure, then the previously successful systems + * will have their remove operation run for the group. + */ +static int namespace_systems_create(struct namespace_group *group) +{ + struct list_head *head = &namespace_systems; + struct namespace_system *system; + struct trace_namespace *ns; + int ret = 0, count = 0; + + ns = TRACE_NS_FROM_GROUP(group); + + list_for_each_entry(system, head, link) { + ret = system->ops->create(ns); + + if (ret) + break; + + count++; + } + + /* If we had a failure, remove systems that were created */ + if (ret) + namespace_systems_remove(group, count); + + return ret; +} + +/* + * Release a previously acquired reference to a namespace group. + */ +static __always_inline +void namespace_group_release(struct namespace_group *group) +{ + refcount_dec(&group->refcnt); +} + +/* + * Lookups up a namespace group by ID and increments the ref count. + */ +static struct namespace_group *namespace_group_ref(int id) +{ + struct namespace_group *group; + + mutex_lock(&event_mutex); + + group = idr_find(&namespace_idr, id); + + if (group) + refcount_inc(&group->refcnt); + + mutex_unlock(&event_mutex); + + return group; +} + +/* + * Lookups up a namespace group by name, without increasing the ref count. + */ +static struct namespace_group *namespace_group_find(const char *name) +{ + struct list_head *head = &namespace_groups; + struct namespace_group *group; + struct trace_namespace *ns; + + lockdep_assert_held(&event_mutex); + + list_for_each_entry(group, head, link) { + ns = TRACE_NS_FROM_GROUP(group); + + if (!strcmp(ns->name, name)) + return group; + } + + return NULL; +} + +/* + * Frees group resources and removes the directory of a namespace. + */ +static void namespace_group_destroy(struct namespace_group *group) +{ + struct trace_namespace *ns = TRACE_NS_FROM_GROUP(group); + + lockdep_assert_held(&event_mutex); + + if (ns->id > 0) + idr_remove(&namespace_idr, ns->id); + + if (ns->dir) + tracefs_remove(ns->dir); + + if (group->trace_options) + tracefs_remove(group->trace_options); + + if (group->trace_dir) + tracefs_remove(group->trace_dir); + + kfree(ns->name); + kfree(group); +} + +void *group_options_seq_start(struct seq_file *m, loff_t *pos) +{ + mutex_lock(&event_mutex); + + return seq_list_start(&namespace_systems, *pos); +} + +void *group_options_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + return seq_list_next(v, &namespace_systems, pos); +} + +void group_options_seq_stop(struct seq_file *m, void *v) +{ + mutex_unlock(&event_mutex); +} + +static int group_options_seq_show(struct seq_file *m, void *v) +{ + struct namespace_system *system = v; + struct namespace_group *group = m->private; + + if (system && system->ops && group) + return system->ops->show(TRACE_NS_FROM_GROUP(group), m); + + return 0; +} + +static const struct seq_operations group_options_seq_op = { + .start = group_options_seq_start, + .next = group_options_seq_next, + .stop = group_options_seq_stop, + .show = group_options_seq_show +}; + +/* + * Gets the group associated with the current seq_file. + */ +static struct namespace_group *seq_file_namespace_group(struct file *file) +{ + struct seq_file *seq = file->private_data; + + if (!seq) + return NULL; + + return seq->private; +} + +static int group_options_open(struct inode *node, struct file *file) +{ + struct namespace_group *group; + int ret; + + group = namespace_group_ref((int)(uintptr_t)node->i_private); + + if (!group) + return -ENOENT; + + ret = seq_open(file, &group_options_seq_op); + + if (!ret) { + /* Chain group into seq_file private data */ + struct seq_file *seq = file->private_data; + + seq->private = group; + } + + return ret; +} + +static int group_options_parse(const char *command) +{ + return namespace_systems_parse(parsing_group, command); +} + +static ssize_t group_options_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) +{ + struct namespace_group *group = seq_file_namespace_group(file); + int ret; + + if (!group) + return -EINVAL; + + mutex_lock(&event_mutex); + + /* Set group to use for commands */ + parsing_group = group; + + ret = trace_parse_run_command(file, buffer, count, ppos, + group_options_parse); + + parsing_group = NULL; + + mutex_unlock(&event_mutex); + + return ret; +} + +static int group_options_release(struct inode *node, struct file *file) +{ + struct namespace_group *group = seq_file_namespace_group(file); + + if (group) + namespace_group_release(group); + + return seq_release(node, file); +} + +static const struct file_operations group_options_fops = { + .open = group_options_open, + .read = seq_read, + .llseek = seq_lseek, + .release = group_options_release, + .write = group_options_write, +}; + +/* + * Creates a group that tracks the name and directory of a namespace. + */ +static struct namespace_group *namespace_group_create(const char *name) +{ + struct namespace_group *group; + struct trace_namespace *ns; + + group = kzalloc(sizeof(*group), GFP_KERNEL); + + if (!group) + goto error; + + refcount_set(&group->refcnt, 1); + + ns = TRACE_NS_FROM_GROUP(group); + ns->name = kstrdup(name, GFP_KERNEL); + + if (!ns->name) + goto error; + + /* + * 0 is reserved for non-namespace lookups for systems to use. + * If this were not the case, systems would have to pivot code + * between namespace cases and non-namespace cases. + * + * Cyclic is used here to reduce the chances of the same id being + * using very quickly. This allows for less chance of a id lookup + * to get the wrong namespace during file open cases. + */ + ns->id = idr_alloc_cyclic(&namespace_idr, group, 1, 0, GFP_KERNEL); + + if (ns->id < 0) + goto error; + + group->trace_dir = tracefs_create_dir(ns->name, root_namespace_dir); + + if (!group->trace_dir) + goto error; + + group->trace_options = tracefs_create_file(TRACE_OPTIONS_NAME, + TRACE_MODE_WRITE, + group->trace_dir, + (void *)(uintptr_t)ns->id, + &group_options_fops); + + if (!group->trace_options) + goto error; + + ns->dir = tracefs_create_dir(TRACE_ROOT_DIR_NAME, group->trace_dir); + + if (!ns->dir) + goto error; + + return group; +error: + if (group) + namespace_group_destroy(group); + + return NULL; +} + +/** + * trace_namespace_register - register a system for tracing namespaces. + * @ops: operations to perform for each namespace + * + * Registers a system that runs operations for each namespace on the system. + * This will fail if not all operations are not specified. + */ +int trace_namespace_register(struct trace_namespace_operations *ops) +{ + struct namespace_system *system; + + if (!ops->create || !ops->remove || !ops->is_busy || + !ops->parse || !ops->show) + return -EINVAL; + + system = kmalloc(sizeof(*system), GFP_KERNEL); + + if (!system) + return -ENOMEM; + + system->ops = ops; + + mutex_lock(&event_mutex); + + list_add(&system->link, &namespace_systems); + + mutex_unlock(&event_mutex); + + return 0; +} + +/** + * trace_namespace_init - configures namespaces to be used on the system. + * @dir: directory to use for namespaces + * + * Configures the directory to be used for namespaces. + * + * NOTE: Can only be called once. + */ +int trace_namespace_init(struct dentry *dir) +{ + int ret = 0; + + mutex_lock(&event_mutex); + + if (root_namespace_dir) { + pr_warn("trace namespace init called more than once\n"); + ret = -EEXIST; + goto out; + } + + root_namespace_dir = dir; +out: + mutex_unlock(&event_mutex); + + return ret; +} + +/** + * trace_namespace_add - adds a trace namespace to the system. + * @name: name of the namespace + * + * Adds a new trace namespace to the system. This can fail if the + * namespace already exists or internal errors within sub-systems registered + * for namespaces. + */ +int trace_namespace_add(const char *name) +{ + struct namespace_group *group; + int ret = 0; + + mutex_lock(&event_mutex); + + if (!root_namespace_dir) { + ret = -ENODEV; + goto out; + } + + /* Ensure we don't already have this group */ + group = namespace_group_find(name); + + if (group) { + ret = -EEXIST; + goto out; + } + + /* Create the group */ + group = namespace_group_create(name); + + if (!group) { + ret = -ENOMEM; + goto out; + } + + /* Notify the systems of a new group */ + ret = namespace_systems_create(group); + + if (!ret) + list_add(&group->link, &namespace_groups); +out: + /* Ensure we cleanup on failure */ + if (ret && group) + namespace_group_destroy(group); + + mutex_unlock(&event_mutex); + + return ret; +} + +/** + * trace_namespace_remove - remove a trace namespace from the system. + * @name: name of the namespace + * + * Removes an existing trace namespace from the system. This can fail if + * the namespace doesn't exist, the namespace is busy, or internal errors + * within sub-systems registered for namespaces. + */ +int trace_namespace_remove(const char *name) +{ + struct namespace_group *group; + int ret = 0; + + mutex_lock(&event_mutex); + + if (!root_namespace_dir) { + ret = -ENODEV; + goto out; + } + + group = namespace_group_find(name); + + if (!group) { + ret = -ENOENT; + goto out; + } + + if (refcount_read(&group->refcnt) != 1) { + ret = -EBUSY; + goto out; + } + + if (namespace_systems_busy(group)) { + ret = -EBUSY; + goto out; + } + + ret = namespace_systems_remove(group, -1); + + if (!ret) { + list_del(&group->link); + + namespace_group_destroy(group); + } + +out: + mutex_unlock(&event_mutex); + + return ret; +} diff --git a/kernel/trace/trace_namespace.h b/kernel/trace/trace_namespace.h new file mode 100644 index 000000000000..644e2d6c4802 --- /dev/null +++ b/kernel/trace/trace_namespace.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_KERNEL_TRACE_NAMESPACE_H +#define _LINUX_KERNEL_TRACE_NAMESPACE_H + +/** + * struct trace_namespace - Trace namespace information + * + * @name: Unique name of the namespace, can be used for event system names, + * etc. + * @dir: Directory of the namespace, can be used for creating system files. + * @id: Id of the namespace, can be used for looking up associated data by + * namespace. NOTE: 0 is reserved for non-namespace lookups for systems. + */ +struct trace_namespace { + const char *name; + struct dentry *dir; + int id; +}; + +/** + * struct trace_namespace_operations - Methods to run for each trace namespace + * + * These methods must be set for each system using trace namespaces. + * + * @create: Run when a trace namespace is being created. Systems create files + * for the namespace with appropriate options. Return 0 if successful. + * @is_busy: Check whether the system is busy within the namespace. Return + * true if it is busy, otherwise false. + * @remove: Removes the namespace from the system. Return 0 if successful, + * return -ENOENT if the namespace is not within the system. All other return + * values are treated as errors. + * @parse: Parses a command to configure a namespace. Return 0 if successful, + * return -ECANCELED if the command is not for your system. All other return + * values are treated as errors. + * @show: Shows the configured options for the namespace. This is run when a + * user reads the options of the namespace. + * + * NOTE: These methods are called while holding event_mutex. + */ +struct trace_namespace_operations { + int (*create)(struct trace_namespace *ns); + int (*remove)(struct trace_namespace *ns); + int (*parse)(struct trace_namespace *ns, const char *command); + int (*show)(struct trace_namespace *ns, struct seq_file *m); + bool (*is_busy)(struct trace_namespace *ns); +}; + +int trace_namespace_register(struct trace_namespace_operations *ops); + +int trace_namespace_init(struct dentry *dir); + +int trace_namespace_add(const char *name); + +int trace_namespace_remove(const char *name); + +#endif /* _LINUX_KERNEL_TRACE_NAMESPACE_H */ From patchwork Thu Jul 7 21:58:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910284 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82853CCA482 for ; Thu, 7 Jul 2022 21:58:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236878AbiGGV6h (ORCPT ); Thu, 7 Jul 2022 17:58:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236832AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 76C731F624; Thu, 7 Jul 2022 14:58:33 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id E6E3B20A8983; Thu, 7 Jul 2022 14:58:32 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com E6E3B20A8983 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231113; bh=BxI5UolQGAsnMdyCvuimOLm8oafKKEXyRJ7WoCQTwHQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U4lPltXaW5xgUTFqt4heA9DVVlpOCF4htM25mMLoBLLO+lBfLpdq5zDgfqqWLUZ0J NIsYFT+QgoXD+86EWxPCJPVU6PshcU8aJrlRfQ5j4DozqslD8CfS/j5TVzYG3l2xCB ddjBOvmn5el4Mr7Jic4i2AEiJNlo4tdO2XHNit9k= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 4/7] tracing/user_events: Move pages/locks into groups to prepare for namespaces Date: Thu, 7 Jul 2022 14:58:25 -0700 Message-Id: <20220707215828.2021-5-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org In order to enable namespaces or any sort of isolation within user_events the register lock and pages need to be broken up into groups. Each event and file now has a group pointer which stores the actual pages to map, lookup data and synchronization objects. Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 381 ++++++++++++++++++++++++------- 1 file changed, 304 insertions(+), 77 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 7bff4c8b90f2..8ffbb9ce2f1a 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -51,11 +51,23 @@ #define EVENT_STATUS_PERF (1 << 1) #define EVENT_STATUS_OTHER (1 << 7) -static char *register_page_data; +struct user_event_group { + struct page *pages; + char *register_page_data; + char *system_name; + struct dentry *status_file; + struct dentry *data_file; + struct hlist_node node; + struct mutex reg_mutex; + DECLARE_HASHTABLE(register_table, 8); + DECLARE_BITMAP(page_bitmap, MAX_EVENTS); + refcount_t refcnt; + int id; +}; -static DEFINE_MUTEX(reg_mutex); -static DEFINE_HASHTABLE(register_table, 8); -static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); +static DEFINE_HASHTABLE(group_table, 8); +static DEFINE_MUTEX(group_mutex); +static struct user_event_group *root_group; /* * Stores per-event properties, as users register events @@ -65,6 +77,7 @@ static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); * refcnt reaches one. */ struct user_event { + struct user_event_group *group; struct tracepoint tracepoint; struct trace_event_call call; struct trace_event_class class; @@ -91,6 +104,11 @@ struct user_event_refs { struct user_event *events[]; }; +struct user_event_file_info { + struct user_event_group *group; + struct user_event_refs *refs; +}; + #define VALIDATOR_ENSURE_NULL (1 << 0) #define VALIDATOR_REL (1 << 1) @@ -103,7 +121,8 @@ struct user_event_validator { typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i, void *tpdata, bool *faulted); -static int user_event_parse(char *name, char *args, char *flags, +static int user_event_parse(struct user_event_group *group, char *name, + char *args, char *flags, struct user_event **newuser); static u32 user_event_key(char *name) @@ -111,12 +130,132 @@ static u32 user_event_key(char *name) return jhash(name, strlen(name), 0); } +static void set_page_reservations(char *pages, bool set) +{ + int page; + + for (page = 0; page < MAX_PAGES; ++page) { + void *addr = pages + (PAGE_SIZE * page); + + if (set) + SetPageReserved(virt_to_page(addr)); + else + ClearPageReserved(virt_to_page(addr)); + } +} + +static void user_event_group_destroy(struct user_event_group *group) +{ + if (group->status_file) + tracefs_remove(group->status_file); + + if (group->data_file) + tracefs_remove(group->status_file); + + if (group->register_page_data) + set_page_reservations(group->register_page_data, false); + + if (group->pages) + __free_pages(group->pages, MAX_PAGE_ORDER); + + kfree(group->system_name); + kfree(group); +} + +static char *user_event_group_system_name(const char *name) +{ + char *system_name; + int len = strlen(name) + sizeof(USER_EVENTS_SYSTEM) + 1; + + system_name = kmalloc(len, GFP_KERNEL); + + if (!system_name) + return NULL; + + snprintf(system_name, len, "%s.%s", USER_EVENTS_SYSTEM, name); + + return system_name; +} + +static __always_inline +void user_event_group_release(struct user_event_group *group) +{ + refcount_dec(&group->refcnt); +} + +static struct user_event_group *user_event_group_find(int id) +{ + struct user_event_group *group; + + mutex_lock(&group_mutex); + + hash_for_each_possible(group_table, group, node, id) + if (group->id == id) { + refcount_inc(&group->refcnt); + mutex_unlock(&group_mutex); + return group; + } + + mutex_unlock(&group_mutex); + + return NULL; +} + +static struct user_event_group *user_event_group_create(const char *name, + int id) +{ + struct user_event_group *group; + + group = kzalloc(sizeof(*group), GFP_KERNEL); + + if (!group) + return NULL; + + if (name) { + group->system_name = user_event_group_system_name(name); + + if (!group->system_name) + goto error; + } + + group->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); + + if (!group->pages) + goto error; + + group->register_page_data = page_address(group->pages); + + set_page_reservations(group->register_page_data, true); + + /* Zero all bits beside 0 (which is reserved for failures) */ + bitmap_zero(group->page_bitmap, MAX_EVENTS); + set_bit(0, group->page_bitmap); + + mutex_init(&group->reg_mutex); + hash_init(group->register_table); + + /* Mark and add to lookup */ + group->id = id; + refcount_set(&group->refcnt, 2); + + mutex_lock(&group_mutex); + hash_add(group_table, &group->node, group->id); + mutex_unlock(&group_mutex); + + return group; +error: + if (group) + user_event_group_destroy(group); + + return NULL; +}; + static __always_inline void user_event_register_set(struct user_event *user) { int i = user->index; - register_page_data[STATUS_BYTE(i)] |= STATUS_MASK(i); + user->group->register_page_data[STATUS_BYTE(i)] |= STATUS_MASK(i); } static __always_inline @@ -124,7 +263,7 @@ void user_event_register_clear(struct user_event *user) { int i = user->index; - register_page_data[STATUS_BYTE(i)] &= ~STATUS_MASK(i); + user->group->register_page_data[STATUS_BYTE(i)] &= ~STATUS_MASK(i); } static __always_inline __must_check @@ -168,7 +307,8 @@ static struct list_head *user_event_get_fields(struct trace_event_call *call) * * Upon success user_event has its ref count increased by 1. */ -static int user_event_parse_cmd(char *raw_command, struct user_event **newuser) +static int user_event_parse_cmd(struct user_event_group *group, + char *raw_command, struct user_event **newuser) { char *name = raw_command; char *args = strpbrk(name, " "); @@ -182,7 +322,7 @@ static int user_event_parse_cmd(char *raw_command, struct user_event **newuser) if (flags) *flags++ = '\0'; - return user_event_parse(name, args, flags, newuser); + return user_event_parse(group, name, args, flags, newuser); } static int user_field_array_size(const char *type) @@ -670,7 +810,7 @@ static int destroy_user_event(struct user_event *user) dyn_event_remove(&user->devent); user_event_register_clear(user); - clear_bit(user->index, page_bitmap); + clear_bit(user->index, user->group->page_bitmap); hash_del(&user->node); user_event_destroy_validators(user); @@ -681,14 +821,15 @@ static int destroy_user_event(struct user_event *user) return ret; } -static struct user_event *find_user_event(char *name, u32 *outkey) +static struct user_event *find_user_event(struct user_event_group *group, + char *name, u32 *outkey) { struct user_event *user; u32 key = user_event_key(name); *outkey = key; - hash_for_each_possible(register_table, user, node, key) + hash_for_each_possible(group->register_table, user, node, key) if (!strcmp(EVENT_NAME(user), name)) { refcount_inc(&user->refcnt); return user; @@ -935,14 +1076,14 @@ static int user_event_create(const char *raw_command) if (!name) return -ENOMEM; - mutex_lock(®_mutex); + mutex_lock(&root_group->reg_mutex); - ret = user_event_parse_cmd(name, &user); + ret = user_event_parse_cmd(root_group, name, &user); if (!ret) refcount_dec(&user->refcnt); - mutex_unlock(®_mutex); + mutex_unlock(&root_group->reg_mutex); if (ret) kfree(name); @@ -1096,7 +1237,8 @@ static int user_event_trace_register(struct user_event *user) * The name buffer lifetime is owned by this method for success cases only. * Upon success the returned user_event has its ref count increased by 1. */ -static int user_event_parse(char *name, char *args, char *flags, +static int user_event_parse(struct user_event_group *group, char *name, + char *args, char *flags, struct user_event **newuser) { int ret; @@ -1106,7 +1248,7 @@ static int user_event_parse(char *name, char *args, char *flags, /* Prevent dyn_event from racing */ mutex_lock(&event_mutex); - user = find_user_event(name, &key); + user = find_user_event(group, name, &key); mutex_unlock(&event_mutex); if (user) { @@ -1119,7 +1261,7 @@ static int user_event_parse(char *name, char *args, char *flags, return 0; } - index = find_first_zero_bit(page_bitmap, MAX_EVENTS); + index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS); if (index == MAX_EVENTS) return -EMFILE; @@ -1133,6 +1275,7 @@ static int user_event_parse(char *name, char *args, char *flags, INIT_LIST_HEAD(&user->fields); INIT_LIST_HEAD(&user->validators); + user->group = group; user->tracepoint.name = name; ret = user_event_parse_fields(user, args); @@ -1152,7 +1295,11 @@ static int user_event_parse(char *name, char *args, char *flags, user->call.tp = &user->tracepoint; user->call.event.funcs = &user_event_funcs; - user->class.system = USER_EVENTS_SYSTEM; + if (group->system_name) + user->class.system = group->system_name; + else + user->class.system = USER_EVENTS_SYSTEM; + user->class.fields_array = user_event_fields_array; user->class.get_fields = user_event_get_fields; user->class.reg = user_event_reg; @@ -1175,8 +1322,8 @@ static int user_event_parse(char *name, char *args, char *flags, dyn_event_init(&user->devent, &user_event_dops); dyn_event_add(&user->devent, &user->call); - set_bit(user->index, page_bitmap); - hash_add(register_table, &user->node, key); + set_bit(user->index, group->page_bitmap); + hash_add(group->register_table, &user->node, key); mutex_unlock(&event_mutex); @@ -1194,10 +1341,10 @@ static int user_event_parse(char *name, char *args, char *flags, /* * Deletes a previously created event if it is no longer being used. */ -static int delete_user_event(char *name) +static int delete_user_event(struct user_event_group *group, char *name) { u32 key; - struct user_event *user = find_user_event(name, &key); + struct user_event *user = find_user_event(group, name, &key); if (!user) return -ENOENT; @@ -1215,6 +1362,7 @@ static int delete_user_event(char *name) */ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) { + struct user_event_file_info *info = file->private_data; struct user_event_refs *refs; struct user_event *user = NULL; struct tracepoint *tp; @@ -1226,7 +1374,7 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) rcu_read_lock_sched(); - refs = rcu_dereference_sched(file->private_data); + refs = rcu_dereference_sched(info->refs); /* * The refs->events array is protected by RCU, and new items may be @@ -1284,6 +1432,30 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) return ret; } +static int user_events_open(struct inode *node, struct file *file) +{ + struct user_event_group *group; + struct user_event_file_info *info; + + group = user_event_group_find((int)(uintptr_t)node->i_private); + + if (!group) + return -ENOENT; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + + if (!info) { + user_event_group_release(group); + return -ENOMEM; + } + + info->group = group; + + file->private_data = info; + + return 0; +} + static ssize_t user_events_write(struct file *file, const char __user *ubuf, size_t count, loff_t *ppos) { @@ -1305,13 +1477,15 @@ static ssize_t user_events_write_iter(struct kiocb *kp, struct iov_iter *i) return user_events_write_core(kp->ki_filp, i); } -static int user_events_ref_add(struct file *file, struct user_event *user) +static int user_events_ref_add(struct user_event_file_info *info, + struct user_event *user) { + struct user_event_group *group = info->group; struct user_event_refs *refs, *new_refs; int i, size, count = 0; - refs = rcu_dereference_protected(file->private_data, - lockdep_is_held(®_mutex)); + refs = rcu_dereference_protected(info->refs, + lockdep_is_held(&group->reg_mutex)); if (refs) { count = refs->count; @@ -1337,7 +1511,7 @@ static int user_events_ref_add(struct file *file, struct user_event *user) refcount_inc(&user->refcnt); - rcu_assign_pointer(file->private_data, new_refs); + rcu_assign_pointer(info->refs, new_refs); if (refs) kfree_rcu(refs, rcu); @@ -1374,7 +1548,8 @@ static long user_reg_get(struct user_reg __user *ureg, struct user_reg *kreg) /* * Registers a user_event on behalf of a user process. */ -static long user_events_ioctl_reg(struct file *file, unsigned long uarg) +static long user_events_ioctl_reg(struct user_event_file_info *info, + unsigned long uarg) { struct user_reg __user *ureg = (struct user_reg __user *)uarg; struct user_reg reg; @@ -1395,14 +1570,14 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg) return ret; } - ret = user_event_parse_cmd(name, &user); + ret = user_event_parse_cmd(info->group, name, &user); if (ret) { kfree(name); return ret; } - ret = user_events_ref_add(file, user); + ret = user_events_ref_add(info, user); /* No longer need parse ref, ref_add either worked or not */ refcount_dec(&user->refcnt); @@ -1420,7 +1595,8 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg) /* * Deletes a user_event on behalf of a user process. */ -static long user_events_ioctl_del(struct file *file, unsigned long uarg) +static long user_events_ioctl_del(struct user_event_file_info *info, + unsigned long uarg) { void __user *ubuf = (void __user *)uarg; char *name; @@ -1433,7 +1609,7 @@ static long user_events_ioctl_del(struct file *file, unsigned long uarg) /* event_mutex prevents dyn_event from racing */ mutex_lock(&event_mutex); - ret = delete_user_event(name); + ret = delete_user_event(info->group, name); mutex_unlock(&event_mutex); kfree(name); @@ -1447,19 +1623,21 @@ static long user_events_ioctl_del(struct file *file, unsigned long uarg) static long user_events_ioctl(struct file *file, unsigned int cmd, unsigned long uarg) { + struct user_event_file_info *info = file->private_data; + struct user_event_group *group = info->group; long ret = -ENOTTY; switch (cmd) { case DIAG_IOCSREG: - mutex_lock(®_mutex); - ret = user_events_ioctl_reg(file, uarg); - mutex_unlock(®_mutex); + mutex_lock(&group->reg_mutex); + ret = user_events_ioctl_reg(info, uarg); + mutex_unlock(&group->reg_mutex); break; case DIAG_IOCSDEL: - mutex_lock(®_mutex); - ret = user_events_ioctl_del(file, uarg); - mutex_unlock(®_mutex); + mutex_lock(&group->reg_mutex); + ret = user_events_ioctl_del(info, uarg); + mutex_unlock(&group->reg_mutex); break; } @@ -1471,17 +1649,24 @@ static long user_events_ioctl(struct file *file, unsigned int cmd, */ static int user_events_release(struct inode *node, struct file *file) { + struct user_event_file_info *info = file->private_data; + struct user_event_group *group; struct user_event_refs *refs; struct user_event *user; int i; + if (!info) + return -EINVAL; + + group = info->group; + /* * Ensure refs cannot change under any situation by taking the * register mutex during the final freeing of the references. */ - mutex_lock(®_mutex); + mutex_lock(&group->reg_mutex); - refs = file->private_data; + refs = info->refs; if (!refs) goto out; @@ -1500,32 +1685,54 @@ static int user_events_release(struct inode *node, struct file *file) out: file->private_data = NULL; - mutex_unlock(®_mutex); + mutex_unlock(&group->reg_mutex); kfree(refs); + kfree(info); + + /* No longer using group */ + user_event_group_release(group); return 0; } static const struct file_operations user_data_fops = { + .open = user_events_open, .write = user_events_write, .write_iter = user_events_write_iter, .unlocked_ioctl = user_events_ioctl, .release = user_events_release, }; +static struct user_event_group *user_status_group(struct file *file) +{ + struct seq_file *m = file->private_data; + + if (!m) + return NULL; + + return m->private; +} + /* * Maps the shared page into the user process for checking if event is enabled. */ static int user_status_mmap(struct file *file, struct vm_area_struct *vma) { + char *pages; + struct user_event_group *group = user_status_group(file); unsigned long size = vma->vm_end - vma->vm_start; if (size != MAX_BYTES) return -EINVAL; + if (!group) + return -EINVAL; + + pages = group->register_page_data; + return remap_pfn_range(vma, vma->vm_start, - virt_to_phys(register_page_data) >> PAGE_SHIFT, + virt_to_phys(pages) >> PAGE_SHIFT, size, vm_get_page_prot(VM_READ)); } @@ -1549,13 +1756,17 @@ static void user_seq_stop(struct seq_file *m, void *p) static int user_seq_show(struct seq_file *m, void *p) { + struct user_event_group *group = m->private; struct user_event *user; char status; int i, active = 0, busy = 0, flags; - mutex_lock(®_mutex); + if (!group) + return -EINVAL; + + mutex_lock(&group->reg_mutex); - hash_for_each(register_table, i, user, node) { + hash_for_each(group->register_table, i, user, node) { status = user->status; flags = user->flags; @@ -1579,7 +1790,7 @@ static int user_seq_show(struct seq_file *m, void *p) active++; } - mutex_unlock(®_mutex); + mutex_unlock(&group->reg_mutex); seq_puts(m, "\n"); seq_printf(m, "Active: %d\n", active); @@ -1598,7 +1809,38 @@ static const struct seq_operations user_seq_ops = { static int user_status_open(struct inode *node, struct file *file) { - return seq_open(file, &user_seq_ops); + struct user_event_group *group; + int ret; + + group = user_event_group_find((int)(uintptr_t)node->i_private); + + if (!group) + return -ENOENT; + + ret = seq_open(file, &user_seq_ops); + + if (!ret) { + /* Chain group to seq_file */ + struct seq_file *m = file->private_data; + + m->private = group; + } else { + user_event_group_release(group); + } + + return ret; +} + +static int user_status_release(struct inode *node, struct file *file) +{ + struct user_event_group *group = user_status_group(file); + + if (group) + user_event_group_release(group); + else + pr_warn("user_events: No group attached to status file\n"); + + return seq_release(node, file); } static const struct file_operations user_status_fops = { @@ -1606,18 +1848,20 @@ static const struct file_operations user_status_fops = { .mmap = user_status_mmap, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = user_status_release, }; /* * Creates a set of tracefs files to allow user mode interactions. */ -static int create_user_tracefs(void) +static int create_user_tracefs(struct dentry *parent, + struct user_event_group *group) { struct dentry *edata, *emmap; edata = tracefs_create_file("user_events_data", TRACE_MODE_WRITE, - NULL, NULL, &user_data_fops); + parent, (void *)(uintptr_t)group->id, + &user_data_fops); if (!edata) { pr_warn("Could not create tracefs 'user_events_data' entry\n"); @@ -1626,7 +1870,8 @@ static int create_user_tracefs(void) /* mmap with MAP_SHARED requires writable fd */ emmap = tracefs_create_file("user_events_status", TRACE_MODE_WRITE, - NULL, NULL, &user_status_fops); + parent, (void *)(uintptr_t)group->id, + &user_status_fops); if (!emmap) { tracefs_remove(edata); @@ -1634,47 +1879,29 @@ static int create_user_tracefs(void) goto err; } + group->data_file = edata; + group->status_file = emmap; + return 0; err: return -ENODEV; } -static void set_page_reservations(bool set) -{ - int page; - - for (page = 0; page < MAX_PAGES; ++page) { - void *addr = register_page_data + (PAGE_SIZE * page); - - if (set) - SetPageReserved(virt_to_page(addr)); - else - ClearPageReserved(virt_to_page(addr)); - } -} - static int __init trace_events_user_init(void) { - struct page *pages; int ret; - /* Zero all bits beside 0 (which is reserved for failures) */ - bitmap_zero(page_bitmap, MAX_EVENTS); - set_bit(0, page_bitmap); + root_group = user_event_group_create(NULL, 0); - pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); - if (!pages) + if (!root_group) return -ENOMEM; - register_page_data = page_address(pages); - - set_page_reservations(true); - ret = create_user_tracefs(); + ret = create_user_tracefs(NULL, root_group); if (ret) { pr_warn("user_events could not register with tracefs\n"); - set_page_reservations(false); - __free_pages(pages, MAX_PAGE_ORDER); + user_event_group_destroy(root_group); + root_group = NULL; return ret; } From patchwork Thu Jul 7 21:58:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BE87CCA480 for ; Thu, 7 Jul 2022 21:58:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236886AbiGGV6i (ORCPT ); Thu, 7 Jul 2022 17:58:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236848AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A4B1C1E3D7; Thu, 7 Jul 2022 14:58:34 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id 30EC320A8989; Thu, 7 Jul 2022 14:58:33 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 30EC320A8989 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231113; bh=OCV9hU67hQrPOe3mg7SwfbQWZ0U/+4Ect7gIEY0M4nU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XxUz/lDcTZzpQnodgmMVh0lq5BFD+wHFxNKrtDUIGbMf7m/ORYBKDoaYa31Lg8oAj FhZTcCcqSawpgIQLTc8H7YNJ54v0MGMcaxjn94F8z3ov66q05TgqVAJfhqjQ72VIuf 8hcMP7Zb//VbM7qElL3PpACbahJRqApbqa+zDPvg= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 5/7] tracing/user_events: Register with trace namespace API Date: Thu, 7 Jul 2022 14:58:26 -0700 Message-Id: <20220707215828.2021-6-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Register user_events up to the trace namespace API to allow user programs to interface with isolated events when required. Each namespace will have their own user_events_status and user_events_data files that have the same ABI as before, however, the system name for events created will be different (user_events. vs user_events). Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 169 ++++++++++++++++++++++++++++++- 1 file changed, 167 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 8ffbb9ce2f1a..ef11706c310f 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -23,6 +23,10 @@ #include "trace.h" #include "trace_dynevent.h" +#ifdef CONFIG_TRACE_NAMESPACE +#include "trace_namespace.h" +#endif + #define USER_EVENTS_PREFIX_LEN (sizeof(USER_EVENTS_PREFIX)-1) #define FIELD_DEPTH_TYPE 0 @@ -150,7 +154,7 @@ static void user_event_group_destroy(struct user_event_group *group) tracefs_remove(group->status_file); if (group->data_file) - tracefs_remove(group->status_file); + tracefs_remove(group->data_file); if (group->register_page_data) set_page_reservations(group->register_page_data, false); @@ -162,6 +166,18 @@ static void user_event_group_destroy(struct user_event_group *group) kfree(group); } +static void user_event_group_unlink(struct user_event_group *group) +{ + if (WARN_ON(refcount_read(&group->refcnt) != 1)) + pr_warn("user_event: Group unlink with more than 1 ref\n"); + + mutex_lock(&group_mutex); + hash_del(&group->node); + mutex_unlock(&group_mutex); + + user_event_group_destroy(group); +} + static char *user_event_group_system_name(const char *name) { char *system_name; @@ -244,6 +260,7 @@ static struct user_event_group *user_event_group_create(const char *name, return group; error: + /* Hash table not added, safe to destroy vs unlink */ if (group) user_event_group_destroy(group); @@ -1887,6 +1904,148 @@ static int create_user_tracefs(struct dentry *parent, return -ENODEV; } +#ifdef CONFIG_TRACE_NAMESPACE +static int user_event_ns_create(struct trace_namespace *ns) +{ + struct user_event_group *group; + int ret; + + group = user_event_group_create(ns->name, ns->id); + + if (!group) + return -ENOMEM; + + ret = create_user_tracefs(ns->dir, group); + + user_event_group_release(group); + + if (ret) { + user_event_group_unlink(group); + return ret; + } + + return 0; +} + +static int user_event_ns_remove(struct trace_namespace *ns) +{ + struct user_event_group *group = user_event_group_find(ns->id); + struct user_event *user; + struct hlist_node *tmp; + int i, ret = 0; + + if (!group) + return -ENOENT; + + /* + * Lock out finding this namespace while we are doing this so that + * user programs trying to open a file owned by this group will block + * until we are done here. The user program upon unblocking will then + * fail to find the group if we removed it. + */ + mutex_lock(&group_mutex); + + /* Ensure we have the only reference */ + if (refcount_read(&group->refcnt) != 2) { + ret = -EBUSY; + goto out; + } + + /* + * At this point no more files can be opened by user space programs + * while we are holding the group_mutex (they'll block on group_mutex). + * To ensure other parts of the kernel aren't registering something we + * also grab the group register mutex as an extra precaution. + * + * The events might be being recorded, which will result in their + * being busy and we'll bail out. + * + * NOTE: event_mutex is held, locking reg_mutex could deadlock so we + * must try to lock it and treat as busy if we cannot. + */ + if (!mutex_trylock(&group->reg_mutex)) { + ret = -EBUSY; + goto out; + } + + hash_for_each_safe(group->register_table, i, tmp, user, node) { + if (!user_event_last_ref(user)) { + ret = -EBUSY; + break; + } + + ret = destroy_user_event(user); + + if (ret) + break; + } + + mutex_unlock(&group->reg_mutex); +out: + mutex_unlock(&group_mutex); + + user_event_group_release(group); + + if (!ret) + user_event_group_unlink(group); + + return ret; +} + +static int user_event_ns_parse(struct trace_namespace *ns, const char *command) +{ + return 0; +} + +static int user_event_ns_show(struct trace_namespace *ns, struct seq_file *m) +{ + return 0; +} + +static bool user_event_ns_is_busy(struct trace_namespace *ns) +{ + struct user_event_group *group = user_event_group_find(ns->id); + struct user_event *user; + int i; + bool busy = false; + + if (!group) + return false; + + /* + * Quick check to ensure all events aren't busy: + * The actual remove will do a more exhaustive check including + * finding if any outstanding files are opened, etc. + * + * NOTE: event_mutex is held, locking reg_mutex could deadlock so we + * must try to lock it and treat as busy if we cannot. + */ + if (!mutex_trylock(&group->reg_mutex)) + return true; + + hash_for_each(group->register_table, i, user, node) { + if (!user_event_last_ref(user)) { + busy = true; + break; + } + } + + mutex_unlock(&group->reg_mutex); + + user_event_group_release(group); + + return busy; +} + +static struct trace_namespace_operations user_event_ns_ops = { + .create = user_event_ns_create, + .remove = user_event_ns_remove, + .parse = user_event_ns_parse, + .show = user_event_ns_show, + .is_busy = user_event_ns_is_busy, +}; +#endif + static int __init trace_events_user_init(void) { int ret; @@ -1900,7 +2059,8 @@ static int __init trace_events_user_init(void) if (ret) { pr_warn("user_events could not register with tracefs\n"); - user_event_group_destroy(root_group); + user_event_group_release(root_group); + user_event_group_unlink(root_group); root_group = NULL; return ret; } @@ -1908,6 +2068,11 @@ static int __init trace_events_user_init(void) if (dyn_event_register(&user_event_dops)) pr_warn("user_events could not register with dyn_events\n"); +#ifdef CONFIG_TRACE_NAMESPACE + if (trace_namespace_register(&user_event_ns_ops)) + pr_warn("user_events could not register with namespaces\n"); +#endif + return 0; } From patchwork Thu Jul 7 21:58:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6849BC43334 for ; Thu, 7 Jul 2022 21:58:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236906AbiGGV6j (ORCPT ); Thu, 7 Jul 2022 17:58:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236852AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A4C0A1FCD6; Thu, 7 Jul 2022 14:58:34 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id 6EA0320A898A; Thu, 7 Jul 2022 14:58:33 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 6EA0320A898A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231113; bh=bf6pX9yrjyYPJdsKED/irGpNKnf5DOG0pyMILhRzdk4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OhCkEtBhKU89KINiA3tZ/nfsUnMN8yXjJbkdfzNkTNSo/Xxy2qApOQ61RjY278o/W UL47YNGMj2oIil/HyihWG7/pTtcA/2Hc7RxFd9dT2r4GHS6gePXK7x45xENG22Ry+z r3dB2ADHQn7tUlrMolNbkd27EURvf2g+tiRv3t5g= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 6/7] tracing/user_events: Enable setting event limit within namespace Date: Thu, 7 Jul 2022 14:58:27 -0700 Message-Id: <20220707215828.2021-7-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org When granting non-admin users the ability to register and write data to user events they should have a limit imposed. Using the namespace options file, operators can change the limit of the events that are allowed to be created. There is also a new line in the user_events_status file to let users know the current limit (and to ask the operator for more if required). For example, to limit the namespace to only 256 events: echo user_events_limit=256 > options From within the namespace root: cat user_events_status ... Limit: 256 Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 54 +++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 5 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index ef11706c310f..b3b6f14099fe 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -67,6 +67,7 @@ struct user_event_group { DECLARE_BITMAP(page_bitmap, MAX_EVENTS); refcount_t refcnt; int id; + int reg_limit; }; static DEFINE_HASHTABLE(group_table, 8); @@ -234,6 +235,9 @@ static struct user_event_group *user_event_group_create(const char *name, goto error; } + /* Register limit is 1 under max for bitmap logic */ + group->reg_limit = MAX_EVENTS - 1; + group->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); if (!group->pages) @@ -1258,8 +1262,7 @@ static int user_event_parse(struct user_event_group *group, char *name, char *args, char *flags, struct user_event **newuser) { - int ret; - int index; + int ret, index, limit; u32 key; struct user_event *user; @@ -1278,9 +1281,18 @@ static int user_event_parse(struct user_event_group *group, char *name, return 0; } - index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS); + /* + * Bitmap returns actual max when too big: + * We need to add 1 to this limit to ensure proper logic + */ + limit = group->reg_limit + 1; + + if (limit > MAX_EVENTS) + return -E2BIG; + + index = find_first_zero_bit(group->page_bitmap, limit); - if (index == MAX_EVENTS) + if (index == limit) return -EMFILE; user = kzalloc(sizeof(*user), GFP_KERNEL); @@ -1813,6 +1825,7 @@ static int user_seq_show(struct seq_file *m, void *p) seq_printf(m, "Active: %d\n", active); seq_printf(m, "Busy: %d\n", busy); seq_printf(m, "Max: %ld\n", MAX_EVENTS); + seq_printf(m, "Limit: %d\n", group->reg_limit); return 0; } @@ -1992,13 +2005,44 @@ static int user_event_ns_remove(struct trace_namespace *ns) return ret; } +#define NS_EVENT_LIMIT_PREFIX "user_events_limit=" + static int user_event_ns_parse(struct trace_namespace *ns, const char *command) { - return 0; + struct user_event_group *group = user_event_group_find(ns->id); + int len, value, ret = -ECANCELED; + + if (!group) + return -ECANCELED; + + len = str_has_prefix(command, NS_EVENT_LIMIT_PREFIX); + if (len && !kstrtouint(command + len, 0, &value)) { + if (value <= 0 || value > MAX_EVENTS) { + ret = -EINVAL; + goto out; + } + + group->reg_limit = value; + ret = 0; + goto out; + } +out: + user_event_group_release(group); + + return ret; } static int user_event_ns_show(struct trace_namespace *ns, struct seq_file *m) { + struct user_event_group *group = user_event_group_find(ns->id); + + if (!group) + return 0; + + seq_printf(m, "%s%d\n", NS_EVENT_LIMIT_PREFIX, group->reg_limit); + + user_event_group_release(group); + return 0; } From patchwork Thu Jul 7 21:58:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12910287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99E79CCA485 for ; Thu, 7 Jul 2022 21:58:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236919AbiGGV6k (ORCPT ); Thu, 7 Jul 2022 17:58:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236862AbiGGV6g (ORCPT ); Thu, 7 Jul 2022 17:58:36 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A4CAA1FCF0; Thu, 7 Jul 2022 14:58:34 -0700 (PDT) Received: from localhost.localdomain (unknown [98.59.227.103]) by linux.microsoft.com (Postfix) with ESMTPSA id 9F35D20A898B; Thu, 7 Jul 2022 14:58:33 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 9F35D20A898B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1657231113; bh=5eRwg6MlWo573EnBgxRRM6VTU+gEpz5q8Hd5iRDGIg8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mAGaH+zJux+CTe5wuTi7Z0Z5bZ/tsNOE6BjHn6cXeCLhHJKrFSelBUlCabQk8dV3K UVb6/MO1pybGObD9Nr/RwBFtQIVA/xdQ9i16PN2Eu9HDGCENTEqQv8xyAaCkUgXr5u ecRlkZFOn1uJCoGAQm04dMC2UKG45SAzPUFk4CC4= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 7/7] tracing/user_events: Add self-test for namespace integration Date: Thu, 7 Jul 2022 14:58:28 -0700 Message-Id: <20220707215828.2021-8-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220707215828.2021-1-beaub@linux.microsoft.com> References: <20220707215828.2021-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Tests to ensure namespace cases are working correctly. Ensures that namespaces work as before for status/write cases and validates removing a namespace with open files, tracing enabled, etc. Signed-off-by: Beau Belgrave --- .../selftests/user_events/ftrace_test.c | 150 ++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c index 404a2713dcae..5d384c1b31c4 100644 --- a/tools/testing/selftests/user_events/ftrace_test.c +++ b/tools/testing/selftests/user_events/ftrace_test.c @@ -22,6 +22,16 @@ const char *enable_file = "/sys/kernel/debug/tracing/events/user_events/__test_e const char *trace_file = "/sys/kernel/debug/tracing/trace"; const char *fmt_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/format"; +const char *namespace_dir = "/sys/kernel/debug/tracing/namespaces/self_test"; +const char *ns_data_file = "/sys/kernel/debug/tracing/namespaces/self_test/" + "root/user_events_data"; +const char *ns_status_file = "/sys/kernel/debug/tracing/namespaces/self_test/" + "root/user_events_status"; +const char *ns_enable_file = "/sys/kernel/debug/tracing/events/" + "user_events.self_test/__test_event/enable"; +const char *ns_options_file = "/sys/kernel/debug/tracing/namespaces/self_test/" + "options"; + static inline int status_check(char *status_page, int status_bit) { return status_page[status_bit >> 3] & (1 << (status_bit & 7)); @@ -160,6 +170,53 @@ static int check_print_fmt(const char *event, const char *expected) return strcmp(print_fmt, expected); } +FIXTURE(ns) { + int status_fd; + int data_fd; + int enable_fd; + int options_fd; +}; + +FIXTURE_SETUP(ns) { + if (mkdir(namespace_dir, 770)) { + ASSERT_EQ(EEXIST, errno); + } + + self->status_fd = open(ns_status_file, O_RDONLY); + ASSERT_NE(-1, self->status_fd); + + self->data_fd = open(ns_data_file, O_RDWR); + ASSERT_NE(-1, self->data_fd); + + self->options_fd = open(ns_options_file, O_RDWR); + ASSERT_NE(-1, self->options_fd); + + self->enable_fd = -1; +} + +FIXTURE_TEARDOWN(ns) { + if (self->status_fd != -1) + close(self->status_fd); + + if (self->data_fd != -1) + close(self->data_fd); + + if (self->options_fd != -1) + close(self->options_fd); + + if (self->enable_fd != -1) { + write(self->enable_fd, "0", sizeof("0")); + close(self->enable_fd); + self->enable_fd = -1; + } + + ASSERT_EQ(0, clear()); + + if (rmdir(namespace_dir)) { + ASSERT_EQ(ENOENT, errno); + } +} + FIXTURE(user) { int status_fd; int data_fd; @@ -477,6 +534,99 @@ TEST_F(user, print_fmt) { ASSERT_EQ(0, ret); } +TEST_F(ns, namespaces) { + struct user_reg reg = {0}; + struct iovec io[3]; + __u32 field1, field2; + int before = 0, after = 0; + int page_size = sysconf(_SC_PAGESIZE); + char *status_page; + + reg.size = sizeof(reg); + reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; + + field1 = 1; + field2 = 2; + + io[0].iov_base = ®.write_index; + io[0].iov_len = sizeof(reg.write_index); + io[1].iov_base = &field1; + io[1].iov_len = sizeof(field1); + io[2].iov_base = &field2; + io[2].iov_len = sizeof(field2); + + /* Limit to 1 event */ + ASSERT_NE(-1, write(self->options_fd, + "user_events_limit=1\n", + sizeof("user_events_limit=1\n") - 1)); + + /* Register should work */ + ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); + ASSERT_EQ(0, reg.write_index); + ASSERT_NE(0, reg.status_bit); + + status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, + self->status_fd, 0); + + /* MMAP should work and be zero'd */ + ASSERT_NE(MAP_FAILED, status_page); + ASSERT_NE(NULL, status_page); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); + + /* Enable event (start tracing) */ + self->enable_fd = open(ns_enable_file, O_RDWR); + ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) + + /* Event should now be enabled */ + ASSERT_NE(0, status_check(status_page, reg.status_bit)); + + /* Write should make it out to ftrace buffers */ + before = trace_bytes(); + ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 3)); + after = trace_bytes(); + ASSERT_GT(after, before); + + /* Register above limit should fail */ + reg.name_args = (__u64)"__test_event_nope u32 field1; u32 field2"; + ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSREG, ®)); + ASSERT_EQ(EMFILE, errno); + + /* Removing namespace while files open should fail */ + ASSERT_EQ(-1, rmdir(namespace_dir)); + + close(self->options_fd); + self->options_fd = -1; + + /* Removing namespace while files open should fail */ + ASSERT_EQ(-1, rmdir(namespace_dir)); + + close(self->status_fd); + self->status_fd = -1; + + /* Removing namespace while files open should fail */ + ASSERT_EQ(-1, rmdir(namespace_dir)); + + close(self->data_fd); + self->data_fd = -1; + + /* Removing namespace while mmaps are open should fail */ + ASSERT_EQ(-1, rmdir(namespace_dir)); + + /* Unmap */ + ASSERT_EQ(0, munmap(status_page, page_size)); + + /* Removing namespace with no files but tracing should fail */ + ASSERT_EQ(-1, rmdir(namespace_dir)); + + /* Disable event (stop tracing) */ + ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) + close(self->enable_fd); + self->enable_fd = -1; + + /* Removing namespace should now work */ + ASSERT_EQ(0, rmdir(namespace_dir)); +} + int main(int argc, char **argv) { return test_harness_run(argc, argv);