From patchwork Mon Apr 25 18:46:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04478C43219 for ; Mon, 25 Apr 2022 18:46:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244565AbiDYStv (ORCPT ); Mon, 25 Apr 2022 14:49:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244562AbiDYSts (ORCPT ); Mon, 25 Apr 2022 14:49:48 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2B4D06F497; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id 7B63320E8CA3; Mon, 25 Apr 2022 11:46:43 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7B63320E8CA3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912403; bh=naE2yWyZaQaNiMnDIQWiov5WR90Psm2QKZv45rv75Kg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Bdhz+3ytOrUfwsnrUl/o3rX5h6yScjKr0KrbsOIvTqPCzrU84Bn/3jL5QLv9RWhe/ cLsq5ZqBiWFHpG6yiLcUabgT2PCN47Ehf6bnUy6x4mzuAbi5/5w8oA4oQlYBTt8OaX fdbPRfDZzolHKIH75p5u9rqVRShm5HY9/vkcA7no= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 1/7] tracing/user_events: Fix repeated word in comments Date: Mon, 25 Apr 2022 11:46:25 -0700 Message-Id: <20220425184631.2068-2-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Trivial fix to remove accidental double wording of have in comments. Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 706e1686b5eb..a6621c52ce45 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -567,7 +567,7 @@ static int user_event_set_call_visible(struct user_event *user, bool visible) * to allow user_event files to be less locked down. The extreme case * being "other" has read/write access to user_events_data/status. * - * When not locked down, processes may not have have permissions to + * When not locked down, processes may not have permissions to * add/remove calls themselves to tracefs. We need to temporarily * switch to root file permission to allow for this scenario. */ From patchwork Mon Apr 25 18:46:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81B3CC433F5 for ; Mon, 25 Apr 2022 18:46:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244574AbiDYStv (ORCPT ); Mon, 25 Apr 2022 14:49:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244561AbiDYSts (ORCPT ); Mon, 25 Apr 2022 14:49:48 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 31B9C6F49E; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id B9A2B20E8CA6; Mon, 25 Apr 2022 11:46:43 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com B9A2B20E8CA6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912403; bh=Oh80tgumXtzVzhggUEiO+Gt+FYTYz042ycLpTb3589E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FMYMYZCUet8AdU+O5wAVlNQEAXDRLOGP+igAKz16cgji+xyP1PSxBBdYHzff18uiw N0SPsg+EYTgd2NrbUtQIcqCa/u97XnTKMi9pbqGhZVhGeCzRh0pT0AqEuH56/40BKk 1c+i24K/5RPJr9YJwatz9Jr6ZWUakbRLdWU7b8XM= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 2/7] tracing/user_events: Use NULL for strstr checks Date: Mon, 25 Apr 2022 11:46:26 -0700 Message-Id: <20220425184631.2068-3-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Trivial fix to ensure strstr checks use NULL instead of 0. Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index a6621c52ce45..075d694d20e3 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -277,7 +277,7 @@ static int user_event_add_field(struct user_event *user, const char *type, goto add_field; add_validator: - if (strstr(type, "char") != 0) + if (strstr(type, "char") != NULL) validator_flags |= VALIDATOR_ENSURE_NULL; validator = kmalloc(sizeof(*validator), GFP_KERNEL); @@ -458,7 +458,7 @@ static const char *user_field_format(const char *type) return "%d"; if (strcmp(type, "unsigned char") == 0) return "%u"; - if (strstr(type, "char[") != 0) + if (strstr(type, "char[") != NULL) return "%s"; /* Unknown, likely struct, allowed treat as 64-bit */ @@ -479,7 +479,7 @@ static bool user_field_is_dyn_string(const char *type, const char **str_func) return false; check: - return strstr(type, "char") != 0; + return strstr(type, "char") != NULL; } #define LEN_OR_ZERO (len ? len - pos : 0) From patchwork Mon Apr 25 18:46:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5AA0C433EF for ; Mon, 25 Apr 2022 18:46:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244588AbiDYStx (ORCPT ); Mon, 25 Apr 2022 14:49:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244567AbiDYStt (ORCPT ); Mon, 25 Apr 2022 14:49:49 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B87AF6F4AA; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id F22DA20E8CAC; Mon, 25 Apr 2022 11:46:43 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com F22DA20E8CAC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912404; bh=ZmUiorrcv4hSBgHHc4SmHe16T+eq+bKCUHOlyXmVWb8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b+k0xqPXHFbAtfHWKHC34qI5dYzSEAkeqEGKU9kn83ypInT59deJfM81m5Z8ic58J SulDKl9SrMCc8ZGzQ2kL1/+CIJVv17zSaoQfqE6arA9WpFP3JH/WHaZkqSgBY0zxvd mSolsimvFe+KFG0BcJ60XpNvlPY5ASqe6xlC8H6o= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 3/7] tracing/user_events: Use WRITE instead of READ for io vector import Date: Mon, 25 Apr 2022 11:46:27 -0700 Message-Id: <20220425184631.2068-4-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org import_single_range expects the direction/rw to be where it came from, not the protection/limit. Since the import is in a write path use WRITE. Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 075d694d20e3..15edbf6b1e2e 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -1245,7 +1245,8 @@ static ssize_t user_events_write(struct file *file, const char __user *ubuf, if (unlikely(*ppos != 0)) return -EFAULT; - if (unlikely(import_single_range(READ, (char *)ubuf, count, &iov, &i))) + if (unlikely(import_single_range(WRITE, (char __user *)ubuf, + count, &iov, &i))) return -EFAULT; return user_events_write_core(file, &i); From patchwork Mon Apr 25 18:46:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E8E9C4321E for ; Mon, 25 Apr 2022 18:46:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244583AbiDYStw (ORCPT ); Mon, 25 Apr 2022 14:49:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244570AbiDYStu (ORCPT ); Mon, 25 Apr 2022 14:49:50 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BB9FA6F4AC; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id 3785620E8CAE; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 3785620E8CAE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912404; bh=xWZo/CSo22CarPEM9asusj2BEE04dwPEtFudHgCqIKE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mibaR8Qe4s7/Wxy/GiQ+yUA9SmK2aC7/Y+uNfouYe1FJq2XfxHFCOlYNEK8CoEKhp bnvO5D8CLsTuUaN/eWVk8vvL8E3UCLM/iL8nXxQdWTI6+5qdsuahT24XnyKzA7scXz wIpyKGVR0MxYS4xwyxn/p7woGnK+hKQSX6tcB89c= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 4/7] tracing/user_events: Ensure user provided strings are safely formatted Date: Mon, 25 Apr 2022 11:46:28 -0700 Message-Id: <20220425184631.2068-5-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org User processes can provide bad strings that may cause issues or leak kernel details back out. Don't trust the content of these strings when formatting strings for matching. This also moves to a consistent dynamic length string creation model. Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 91 +++++++++++++++++++++----------- 1 file changed, 59 insertions(+), 32 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 15edbf6b1e2e..f9bb7d37d76f 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -45,7 +45,6 @@ #define MAX_EVENT_DESC 512 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) #define MAX_FIELD_ARRAY_SIZE 1024 -#define MAX_FIELD_ARG_NAME 256 static char *register_page_data; @@ -483,6 +482,48 @@ static bool user_field_is_dyn_string(const char *type, const char **str_func) } #define LEN_OR_ZERO (len ? len - pos : 0) +static int user_dyn_field_set_string(int argc, const char **argv, int *iout, + char *buf, int len, bool *colon) +{ + int pos = 0, i = *iout; + + *colon = false; + + for (; i < argc; ++i) { + if (i != *iout) + pos += snprintf(buf + pos, LEN_OR_ZERO, " "); + + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", argv[i]); + + if (strchr(argv[i], ';')) { + ++i; + *colon = true; + break; + } + } + + /* Actual set, advance i */ + if (len != 0) + *iout = i; + + return pos + 1; +} + +static int user_field_set_string(struct ftrace_event_field *field, + char *buf, int len, bool colon) +{ + int pos = 0; + + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->type); + pos += snprintf(buf + pos, LEN_OR_ZERO, " "); + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->name); + + if (colon) + pos += snprintf(buf + pos, LEN_OR_ZERO, ";"); + + return pos + 1; +} + static int user_event_set_print_fmt(struct user_event *user, char *buf, int len) { struct ftrace_event_field *field, *next; @@ -926,49 +967,35 @@ static int user_event_free(struct dyn_event *ev) static bool user_field_match(struct ftrace_event_field *field, int argc, const char **argv, int *iout) { - char *field_name, *arg_name; - int len, pos, i = *iout; + char *field_name = NULL, *dyn_field_name = NULL; bool colon = false, match = false; + int dyn_len, len; - if (i >= argc) + if (*iout >= argc) return false; - len = MAX_FIELD_ARG_NAME; - field_name = kmalloc(len, GFP_KERNEL); - arg_name = kmalloc(len, GFP_KERNEL); + dyn_len = user_dyn_field_set_string(argc, argv, iout, dyn_field_name, + 0, &colon); - if (!arg_name || !field_name) - goto out; - - pos = 0; - - for (; i < argc; ++i) { - if (i != *iout) - pos += snprintf(arg_name + pos, len - pos, " "); + len = user_field_set_string(field, field_name, 0, colon); - pos += snprintf(arg_name + pos, len - pos, argv[i]); - - if (strchr(argv[i], ';')) { - ++i; - colon = true; - break; - } - } + if (dyn_len != len) + return false; - pos = 0; + dyn_field_name = kmalloc(dyn_len, GFP_KERNEL); + field_name = kmalloc(len, GFP_KERNEL); - pos += snprintf(field_name + pos, len - pos, field->type); - pos += snprintf(field_name + pos, len - pos, " "); - pos += snprintf(field_name + pos, len - pos, field->name); + if (!dyn_field_name || !field_name) + goto out; - if (colon) - pos += snprintf(field_name + pos, len - pos, ";"); + user_dyn_field_set_string(argc, argv, iout, dyn_field_name, + dyn_len, &colon); - *iout = i; + user_field_set_string(field, field_name, len, colon); - match = strcmp(arg_name, field_name) == 0; + match = strcmp(dyn_field_name, field_name) == 0; out: - kfree(arg_name); + kfree(dyn_field_name); kfree(field_name); return match; From patchwork Mon Apr 25 18:46:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFCE4C41535 for ; Mon, 25 Apr 2022 18:46:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238202AbiDYStw (ORCPT ); Mon, 25 Apr 2022 14:49:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244571AbiDYStu (ORCPT ); Mon, 25 Apr 2022 14:49:50 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7918E6F9D2; Mon, 25 Apr 2022 11:46:45 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id 7795520E8CB0; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7795520E8CB0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912404; bh=Yas84WJ4NXuOO1nmdoxbY/IEroFAtoghAgKJn/H+CD0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jd/ZmzQF7E0HrpKZRTpBHi3yx04sc+A0TkV2kSU6FQ06l2E4Fs583mGy+343tDAbH Gx1P/rrJGoPOBocqCB0gwmnduvaoK+gN4koNDGm+Xu387L/Otm3+ZlcxNHUh44ItR0 0veSKqJpPS6rkgpXabewqi1RyNQt6THtmJXuw2bo= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 5/7] tracing/user_events: Use refcount instead of atomic for ref tracking Date: Mon, 25 Apr 2022 11:46:29 -0700 Message-Id: <20220425184631.2068-6-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org User processes could open up enough event references to cause rollovers. These could cause use after free scenarios, which we do not want. Switching to refcount APIs prevent this, but will leak memory once saturated. Once saturated, user processes can still use the events. This prevents a bad user process from stopping existing telemetry from being emitted. Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 53 +++++++++++++++----------------- 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index f9bb7d37d76f..2bcae7abfa81 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -57,7 +58,7 @@ static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); * within a file a user_event might be created if it does not * already exist. These are globally used and their lifetime * is tied to the refcnt member. These cannot go away until the - * refcnt reaches zero. + * refcnt reaches one. */ struct user_event { struct tracepoint tracepoint; @@ -67,7 +68,7 @@ struct user_event { struct hlist_node node; struct list_head fields; struct list_head validators; - atomic_t refcnt; + refcount_t refcnt; int index; int flags; int min_size; @@ -105,6 +106,12 @@ static u32 user_event_key(char *name) return jhash(name, strlen(name), 0); } +static __always_inline __must_check +bool user_event_last_ref(struct user_event *user) +{ + return refcount_read(&user->refcnt) == 1; +} + static __always_inline __must_check size_t copy_nofault(void *addr, size_t bytes, struct iov_iter *i) { @@ -662,7 +669,7 @@ static struct user_event *find_user_event(char *name, u32 *outkey) hash_for_each_possible(register_table, user, node, key) if (!strcmp(EVENT_NAME(user), name)) { - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); return user; } @@ -876,12 +883,12 @@ static int user_event_reg(struct trace_event_call *call, return ret; inc: - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); update_reg_page_for(user); return 0; dec: update_reg_page_for(user); - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); return 0; } @@ -907,7 +914,7 @@ static int user_event_create(const char *raw_command) ret = user_event_parse_cmd(name, &user); if (!ret) - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); mutex_unlock(®_mutex); @@ -951,14 +958,14 @@ static bool user_event_is_busy(struct dyn_event *ev) { struct user_event *user = container_of(ev, struct user_event, devent); - return atomic_read(&user->refcnt) != 0; + return !user_event_last_ref(user); } static int user_event_free(struct dyn_event *ev) { struct user_event *user = container_of(ev, struct user_event, devent); - if (atomic_read(&user->refcnt) != 0) + if (!user_event_last_ref(user)) return -EBUSY; return destroy_user_event(user); @@ -1137,8 +1144,8 @@ static int user_event_parse(char *name, char *args, char *flags, user->index = index; - /* Ensure we track ref */ - atomic_inc(&user->refcnt); + /* Ensure we track self ref and caller ref (2) */ + refcount_set(&user->refcnt, 2); dyn_event_init(&user->devent, &user_event_dops); dyn_event_add(&user->devent, &user->call); @@ -1164,29 +1171,17 @@ static int user_event_parse(char *name, char *args, char *flags, static int delete_user_event(char *name) { u32 key; - int ret; struct user_event *user = find_user_event(name, &key); if (!user) return -ENOENT; - /* Ensure we are the last ref */ - if (atomic_read(&user->refcnt) != 1) { - ret = -EBUSY; - goto put_ref; - } - - ret = destroy_user_event(user); + refcount_dec(&user->refcnt); - if (ret) - goto put_ref; - - return ret; -put_ref: - /* No longer have this ref */ - atomic_dec(&user->refcnt); + if (!user_event_last_ref(user)) + return -EBUSY; - return ret; + return destroy_user_event(user); } /* @@ -1314,7 +1309,7 @@ static int user_events_ref_add(struct file *file, struct user_event *user) new_refs->events[i] = user; - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); rcu_assign_pointer(file->private_data, new_refs); @@ -1374,7 +1369,7 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg) ret = user_events_ref_add(file, user); /* No longer need parse ref, ref_add either worked or not */ - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); /* Positive number is index and valid */ if (ret < 0) @@ -1464,7 +1459,7 @@ static int user_events_release(struct inode *node, struct file *file) user = refs->events[i]; if (user) - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); } out: file->private_data = NULL; From patchwork Mon Apr 25 18:46:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69B72C43219 for ; Mon, 25 Apr 2022 18:46:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244596AbiDYStz (ORCPT ); Mon, 25 Apr 2022 14:49:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244573AbiDYStu (ORCPT ); Mon, 25 Apr 2022 14:49:50 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 792416F9D3; Mon, 25 Apr 2022 11:46:45 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id B442220E8CB3; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com B442220E8CB3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912404; bh=ajevIs95ENFLnsELFSTB6AMvBG1b7Z0g03qm2SuwDa8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WxBpEyGuHfs2LZ3K5yn6RLo0vMYHXO0yqzxnary6G+7TSNJfwLUq6GZFsyNVC49Bb vU9N2A10duHEC5cvB0JFKyTrJxEwhYb81VfqEby8MphVm6Ut01HDjAtGELLKKjjNjG 80C9nNc/W4TqVFI5+VapEYBvkU3fpY7YzLWQQYgk= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 6/7] tracing/user_events: Use bits vs bytes for enabled status page data Date: Mon, 25 Apr 2022 11:46:30 -0700 Message-Id: <20220425184631.2068-7-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org User processes may require many events and when they do the cache performance of a byte index status check is less ideal than a bit index. The previous event limit per-page was 4096, the new limit is 32,768. This change adds a bitwise index to the user_reg struct. Programs check that the bit at status_bit has a bit set within the status page(s). Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave --- include/linux/user_events.h | 15 +---- kernel/trace/trace_events_user.c | 57 ++++++++++++++++--- samples/user_events/example.c | 25 +++++--- .../selftests/user_events/ftrace_test.c | 47 ++++++++++++--- .../testing/selftests/user_events/perf_test.c | 11 +++- 5 files changed, 117 insertions(+), 38 deletions(-) diff --git a/include/linux/user_events.h b/include/linux/user_events.h index 736e05603463..592a3fbed98e 100644 --- a/include/linux/user_events.h +++ b/include/linux/user_events.h @@ -20,15 +20,6 @@ #define USER_EVENTS_SYSTEM "user_events" #define USER_EVENTS_PREFIX "u:" -/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ -#define EVENT_BIT_FTRACE 0 -#define EVENT_BIT_PERF 1 -#define EVENT_BIT_OTHER 7 - -#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) -#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) -#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) - /* Create dynamic location entry within a 32-bit value */ #define DYN_LOC(offset, size) ((size) << 16 | (offset)) @@ -45,12 +36,12 @@ struct user_reg { /* Input: Pointer to string with event name, description and flags */ __u64 name_args; - /* Output: Byte index of the event within the status page */ - __u32 status_index; + /* Output: Bitwise index of the event within the status page */ + __u32 status_bit; /* Output: Index of the event to use when writing data */ __u32 write_index; -}; +} __attribute__((__packed__)); #define DIAG_IOC_MAGIC '*' diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 2bcae7abfa81..4afc41e321ac 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -40,17 +40,26 @@ */ #define MAX_PAGE_ORDER 0 #define MAX_PAGES (1 << MAX_PAGE_ORDER) -#define MAX_EVENTS (MAX_PAGES * PAGE_SIZE) +#define MAX_BYTES (MAX_PAGES * PAGE_SIZE) +#define MAX_EVENTS (MAX_BYTES * 8) /* Limit how long of an event name plus args within the subsystem. */ #define MAX_EVENT_DESC 512 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) #define MAX_FIELD_ARRAY_SIZE 1024 +#define STATUS_BYTE(bit) ((bit) >> 3) +#define STATUS_MASK(bit) (1 << ((bit) & 7)) + +/* Internal bits to keep track of connected probes */ +#define EVENT_STATUS_FTRACE (1 << 0) +#define EVENT_STATUS_PERF (1 << 1) +#define EVENT_STATUS_OTHER (1 << 7) + static char *register_page_data; static DEFINE_MUTEX(reg_mutex); -static DEFINE_HASHTABLE(register_table, 4); +static DEFINE_HASHTABLE(register_table, 8); static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); /* @@ -72,6 +81,7 @@ struct user_event { int index; int flags; int min_size; + char status; }; /* @@ -106,6 +116,22 @@ static u32 user_event_key(char *name) return jhash(name, strlen(name), 0); } +static __always_inline +void user_event_register_set(struct user_event *user) +{ + int i = user->index; + + register_page_data[STATUS_BYTE(i)] |= STATUS_MASK(i); +} + +static __always_inline +void user_event_register_clear(struct user_event *user) +{ + int i = user->index; + + register_page_data[STATUS_BYTE(i)] &= ~STATUS_MASK(i); +} + static __always_inline __must_check bool user_event_last_ref(struct user_event *user) { @@ -648,7 +674,7 @@ static int destroy_user_event(struct user_event *user) dyn_event_remove(&user->devent); - register_page_data[user->index] = 0; + user_event_register_clear(user); clear_bit(user->index, page_bitmap); hash_del(&user->node); @@ -827,7 +853,12 @@ static void update_reg_page_for(struct user_event *user) rcu_read_unlock_sched(); } - register_page_data[user->index] = status; + if (status) + user_event_register_set(user); + else + user_event_register_clear(user); + + user->status = status; } /* @@ -1332,7 +1363,17 @@ static long user_reg_get(struct user_reg __user *ureg, struct user_reg *kreg) if (size > PAGE_SIZE) return -E2BIG; - return copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); + if (size < offsetofend(struct user_reg, write_index)) + return -EINVAL; + + ret = copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); + + if (ret) + return ret; + + kreg->size = size; + + return 0; } /* @@ -1376,7 +1417,7 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg) return ret; put_user((u32)ret, &ureg->write_index); - put_user(user->index, &ureg->status_index); + put_user(user->index, &ureg->status_bit); return 0; } @@ -1485,7 +1526,7 @@ static int user_status_mmap(struct file *file, struct vm_area_struct *vma) { unsigned long size = vma->vm_end - vma->vm_start; - if (size != MAX_EVENTS) + if (size != MAX_BYTES) return -EINVAL; return remap_pfn_range(vma, vma->vm_start, @@ -1520,7 +1561,7 @@ static int user_seq_show(struct seq_file *m, void *p) mutex_lock(®_mutex); hash_for_each(register_table, i, user, node) { - status = register_page_data[user->index]; + status = user->status; flags = user->flags; seq_printf(m, "%d:%s", user->index, EVENT_NAME(user)); diff --git a/samples/user_events/example.c b/samples/user_events/example.c index 4f5778e441c0..d06dc24156ec 100644 --- a/samples/user_events/example.c +++ b/samples/user_events/example.c @@ -12,13 +12,21 @@ #include #include #include +#include +#include #include +#if __BITS_PER_LONG == 64 +#define endian_swap(x) htole64(x) +#else +#define endian_swap(x) htole32(x) +#endif + /* Assumes debugfs is mounted */ const char *data_file = "/sys/kernel/debug/tracing/user_events_data"; const char *status_file = "/sys/kernel/debug/tracing/user_events_status"; -static int event_status(char **status) +static int event_status(long **status) { int fd = open(status_file, O_RDONLY); @@ -33,7 +41,8 @@ static int event_status(char **status) return 0; } -static int event_reg(int fd, const char *command, int *status, int *write) +static int event_reg(int fd, const char *command, long *index, long *mask, + int *write) { struct user_reg reg = {0}; @@ -43,7 +52,8 @@ static int event_reg(int fd, const char *command, int *status, int *write) if (ioctl(fd, DIAG_IOCSREG, ®) == -1) return -1; - *status = reg.status_index; + *index = reg.status_bit / __BITS_PER_LONG; + *mask = endian_swap(1L << (reg.status_bit % __BITS_PER_LONG)); *write = reg.write_index; return 0; @@ -51,8 +61,9 @@ static int event_reg(int fd, const char *command, int *status, int *write) int main(int argc, char **argv) { - int data_fd, status, write; - char *status_page; + int data_fd, write; + long index, mask; + long *status_page; struct iovec io[2]; __u32 count = 0; @@ -61,7 +72,7 @@ int main(int argc, char **argv) data_fd = open(data_file, O_RDWR); - if (event_reg(data_fd, "test u32 count", &status, &write) == -1) + if (event_reg(data_fd, "test u32 count", &index, &mask, &write) == -1) return errno; /* Setup iovec */ @@ -75,7 +86,7 @@ int main(int argc, char **argv) getchar(); /* Check if anyone is listening */ - if (status_page[status]) { + if (status_page[index] & mask) { /* Yep, trace out our data */ writev(data_fd, (const struct iovec *)io, 2); diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c index a80fb5ef61d5..404a2713dcae 100644 --- a/tools/testing/selftests/user_events/ftrace_test.c +++ b/tools/testing/selftests/user_events/ftrace_test.c @@ -22,6 +22,11 @@ const char *enable_file = "/sys/kernel/debug/tracing/events/user_events/__test_e const char *trace_file = "/sys/kernel/debug/tracing/trace"; const char *fmt_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/format"; +static inline int status_check(char *status_page, int status_bit) +{ + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); +} + static int trace_bytes(void) { int fd = open(trace_file, O_RDONLY); @@ -197,12 +202,12 @@ TEST_F(user, register_events) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); /* Multiple registers should result in same index */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); /* Ensure disabled */ self->enable_fd = open(enable_file, O_RDWR); @@ -212,15 +217,15 @@ TEST_F(user, register_events) { /* MMAP should work and be zero'd */ ASSERT_NE(MAP_FAILED, status_page); ASSERT_NE(NULL, status_page); - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); /* Enable event and ensure bits updated in status */ ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) - ASSERT_EQ(EVENT_STATUS_FTRACE, status_page[reg.status_index]); + ASSERT_NE(0, status_check(status_page, reg.status_bit)); /* Disable event and ensure bits updated in status */ ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); /* File still open should return -EBUSY for delete */ ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event")); @@ -240,6 +245,8 @@ TEST_F(user, write_events) { struct iovec io[3]; __u32 field1, field2; int before = 0, after = 0; + int page_size = sysconf(_SC_PAGESIZE); + char *status_page; reg.size = sizeof(reg); reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; @@ -254,10 +261,18 @@ TEST_F(user, write_events) { io[2].iov_base = &field2; io[2].iov_len = sizeof(field2); + status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, + self->status_fd, 0); + /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); + + /* MMAP should work and be zero'd */ + ASSERT_NE(MAP_FAILED, status_page); + ASSERT_NE(NULL, status_page); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); /* Write should fail on invalid slot with ENOENT */ io[0].iov_base = &field2; @@ -271,6 +286,9 @@ TEST_F(user, write_events) { self->enable_fd = open(enable_file, O_RDWR); ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) + /* Event should now be enabled */ + ASSERT_NE(0, status_check(status_page, reg.status_bit)); + /* Write should make it out to ftrace buffers */ before = trace_bytes(); ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 3)); @@ -298,7 +316,7 @@ TEST_F(user, write_fault) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); /* Write should work normally */ ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 2)); @@ -315,6 +333,11 @@ TEST_F(user, write_validator) { int loc, bytes; char data[8]; int before = 0, after = 0; + int page_size = sysconf(_SC_PAGESIZE); + char *status_page; + + status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, + self->status_fd, 0); reg.size = sizeof(reg); reg.name_args = (__u64)"__test_event __rel_loc char[] data"; @@ -322,7 +345,12 @@ TEST_F(user, write_validator) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); + + /* MMAP should work and be zero'd */ + ASSERT_NE(MAP_FAILED, status_page); + ASSERT_NE(NULL, status_page); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); io[0].iov_base = ®.write_index; io[0].iov_len = sizeof(reg.write_index); @@ -340,6 +368,9 @@ TEST_F(user, write_validator) { self->enable_fd = open(enable_file, O_RDWR); ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) + /* Event should now be enabled */ + ASSERT_NE(0, status_check(status_page, reg.status_bit)); + /* Full in-bounds write should work */ before = trace_bytes(); loc = DYN_LOC(0, bytes); diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testing/selftests/user_events/perf_test.c index 26851d51d6bb..8b4c7879d5a7 100644 --- a/tools/testing/selftests/user_events/perf_test.c +++ b/tools/testing/selftests/user_events/perf_test.c @@ -35,6 +35,11 @@ static long perf_event_open(struct perf_event_attr *pe, pid_t pid, return syscall(__NR_perf_event_open, pe, pid, cpu, group_fd, flags); } +static inline int status_check(char *status_page, int status_bit) +{ + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); +} + static int get_id(void) { FILE *fp = fopen(id_file, "r"); @@ -120,8 +125,8 @@ TEST_F(user, perf_write) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_NE(0, reg.status_bit); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); /* Id should be there */ id = get_id(); @@ -144,7 +149,7 @@ TEST_F(user, perf_write) { ASSERT_NE(MAP_FAILED, perf_page); /* Status should be updated */ - ASSERT_EQ(EVENT_STATUS_PERF, status_page[reg.status_index]); + ASSERT_NE(0, status_check(status_page, reg.status_bit)); event.index = reg.write_index; event.field1 = 0xc001; From patchwork Mon Apr 25 18:46:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Beau Belgrave X-Patchwork-Id: 12826077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33154C4332F for ; Mon, 25 Apr 2022 18:46:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244593AbiDYSty (ORCPT ); Mon, 25 Apr 2022 14:49:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244572AbiDYStu (ORCPT ); Mon, 25 Apr 2022 14:49:50 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 963286F9E0; Mon, 25 Apr 2022 11:46:45 -0700 (PDT) Received: from localhost.localdomain (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id F163520E8CB6; Mon, 25 Apr 2022 11:46:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com F163520E8CB6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1650912405; bh=XIzO1LLVQCb9A01YoSu8+i/AWBjBDCwqrp4WnDv6/j4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rTHAwlZUH3k98C7f21qPFMAu4n4y+aqsM1JjbN+5oq6EdjxeKa++t4q5Zq0njMiKZ swL7AhompWdTaMku2hQEs4aQhLiJFcA+utTWyIPzMewjjXSv+m6QQrKCJt/6H4scz+ TdKgAnk5GtsPBenBO3Y23zEVvJKucSFzHSqysBtY= From: Beau Belgrave To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, beaub@linux.microsoft.com Subject: [PATCH v2 7/7] tracing/user_events: Update ABI documentation to align to bits vs bytes Date: Mon, 25 Apr 2022 11:46:31 -0700 Message-Id: <20220425184631.2068-8-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220425184631.2068-1-beaub@linux.microsoft.com> References: <20220425184631.2068-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Update the documentation to reflect the new ABI requirements and how to use the byte index with the mask properly to check event status. Signed-off-by: Beau Belgrave --- Documentation/trace/user_events.rst | 86 +++++++++++++++++++---------- 1 file changed, 58 insertions(+), 28 deletions(-) diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst index c180936f49fc..9f181f342a70 100644 --- a/Documentation/trace/user_events.rst +++ b/Documentation/trace/user_events.rst @@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: prefix applied. Typically programs will register a set of events that they wish to expose to tools that can read trace_events (such as ftrace and perf). The registration -process gives back two ints to the program for each event. The first int is the -status index. This index describes which byte in the +process gives back two ints to the program for each event. The first int is +the status bit. This describes which bit in little-endian format in the /sys/kernel/debug/tracing/user_events_status file represents this event. The -second int is the write index. This index describes the data when a write() or +second int is the write index which describes the data when a write() or writev() is called on the /sys/kernel/debug/tracing/user_events_data file. -The structures referenced in this document are contained with the -/include/uap/linux/user_events.h file in the source tree. +The structures referenced in this document are contained within the +/include/uapi/linux/user_events.h file in the source tree. **NOTE:** *Both user_events_status and user_events_data are under the tracefs filesystem and may be mounted at different paths than above.* @@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() out to the /sys/kernel/debug/tracing/user_events_data file. The command to issue is DIAG_IOCSREG. -This command takes a struct user_reg as an argument:: +This command takes a packed struct user_reg as an argument:: struct user_reg { u32 size; u64 name_args; - u32 status_index; + u32 status_bit; u32 write_index; }; The struct user_reg requires two inputs, the first is the size of the structure to ensure forward and backward compatibility. The second is the command string -to issue for registering. Upon success two outputs are set, the status index +to issue for registering. Upon success two outputs are set, the status bit and the write index. User based events show up under tracefs like any other event under the @@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur the cost of the write() or writev() calls when something is actively attached to the event. User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to -check the status for each event that is registered. The byte to check in the -file is given back after the register ioctl() via user_reg.status_index. +check the status for each event that is registered. The bit to check in the +file is given back after the register ioctl() via user_reg.status_bit. The bit +is always in little-endian format. Programs can check if the bit is set either +using a byte-wise index with a mask or a long-wise index with a little-endian +mask. + Currently the size of user_events_status is a single page, however, custom kernel configurations can change this size to allow more user based events. In all cases the size of the file is a multiple of a page size. -For example, if the register ioctl() gives back a status_index of 3 you would -check byte 3 of the returned mmap data to see if anything is attached to that -event. +For example, if the register ioctl() gives back a status_bit of 3 you would +check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8 +(1 << (3 % 8)) to see if anything is attached to that event. + +A byte-wise index check is performed as follows:: + + int index, mask; + char *status_page; + + index = status_bit / 8; + mask = 1 << (status_bit % 8); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } + +A long-wise index check is performed as follows:: + + #include + #include + + #if __BITS_PER_LONG == 64 + #define endian_swap(x) htole64(x) + #else + #define endian_swap(x) htole32(x) + #endif + + long index, mask, *status_page; + + index = status_bit / __BITS_PER_LONG; + mask = 1L << (status_bit % __BITS_PER_LONG); + mask = endian_swap(mask); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } Administrators can easily check the status of all registered events by reading the user_events_status file directly via a terminal. The output is as follows:: @@ -137,7 +178,7 @@ For example, on a system that has a single event the output looks like this:: Active: 1 Busy: 0 - Max: 4096 + Max: 32768 If a user enables the user event via ftrace, the output would change to this:: @@ -145,21 +186,10 @@ If a user enables the user event via ftrace, the output would change to this:: Active: 1 Busy: 1 - Max: 4096 - -**NOTE:** *A status index of 0 will never be returned. This allows user -programs to have an index that can be used on error cases.* - -Status Bits -^^^^^^^^^^^ -The byte being checked will be non-zero if anything is attached. Programs can -check specific bits in the byte to see what mechanism has been attached. - -The following values are defined to aid in checking what has been attached: - -**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0). + Max: 32768 -**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1). +**NOTE:** *A status bit of 0 will never be returned. This allows user programs +to have a bit that can be used on error cases.* Writing Data ------------