From patchwork Wed Jul 24 22:52:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13741437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6903C3DA61 for ; Wed, 24 Jul 2024 22:52:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D48BB6B00A6; Wed, 24 Jul 2024 18:52:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF0816B00A7; Wed, 24 Jul 2024 18:52:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B918C6B00A8; Wed, 24 Jul 2024 18:52:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 928776B00A6 for ; Wed, 24 Jul 2024 18:52:46 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 480E61A0C44 for ; Wed, 24 Jul 2024 22:52:46 +0000 (UTC) X-FDA: 82376147532.27.0065780 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id 9144440002 for ; Wed, 24 Jul 2024 22:52:44 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XmkCnKFT; spf=pass (imf12.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721861541; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xhDbYeLd1PnIzJMLboDYxIqaT7Zmns6MKQbpjvqqRic=; b=afNAQxreURf8NE5v3HhrEgId3mfnZ183M1xxQ27vU/gS+mxRjVS1Gegw32HE5aQWEhDMr3 AVxJ6J0MLdhSdIfKzabvGgJ+6PXtoTtwh/EoCClWS7ZhcU6aKnshIXB2fDhV2Tjv5vRXpo za7H8+vz/vQ8QNaE3BgFm0eatw0amA0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XmkCnKFT; spf=pass (imf12.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721861541; a=rsa-sha256; cv=none; b=5WQtcslfLRbfJeMTWoSStJcZeHPY8sXmQEuNk1mzW+DxsJPE1ly5ka7mIsmQH7aZyX3/G/ ePcCzRBdTM3TfxUFwaKhgDyjRDucTQFfdoxr/8rFjjzjd/ymlbojfHfGlke1SIFoUbgtwe vzcGJUdTnmGURo1nctUDMIMwcRM3B9c= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id BD054612D0; Wed, 24 Jul 2024 22:52:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F06CC32781; Wed, 24 Jul 2024 22:52:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721861563; bh=B7j4vEXsH2ch5co68Pm2tINZDzZ6FR433unf55TyQcU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XmkCnKFThfLhslgvNaBpkxdi+HcyqMmSdoJ7vOi99k2xlFFvwXTTxu2eMxOvbgcdE V8+YYulLSRsNlttrfLABWg8+Fcq1T3D1s0k5dSLdxIacwyUKaa5TRGnDuj8BK0N5Bx DqXjjeP2h34WWW1Q8WfvnoqKlaECJcxTCf5dWsCmSb2gkEwlT+6IydnOCmUz+XVYYL xrbO/OfraE9HZ6NRur1LZl5wkkMrSVeNMhWwSOlwcLeWFJTduL8MmBIrD2UDUo/ffp aiAoVZJlaS+fLgl7zdC+LLLC2u3RJz3iOs02jkgpQC46CJu/aWnvxnmMKlXM4jRLii JRBPTqUx6OVLA== From: Andrii Nakryiko To: bpf@vger.kernel.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, adobriyan@gmail.com, shakeel.butt@linux.dev, hannes@cmpxchg.org, ak@linux.intel.com, osandov@osandov.com, song@kernel.org, Andrii Nakryiko Subject: [PATCH v2 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Date: Wed, 24 Jul 2024 15:52:09 -0700 Message-ID: <20240724225210.545423-10-andrii@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240724225210.545423-1-andrii@kernel.org> References: <20240724225210.545423-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9144440002 X-Stat-Signature: dot68ux46j99up96arxepz5xkqmn5mac X-HE-Tag: 1721861564-446491 X-HE-Meta: U2FsdGVkX1/fTh8vadaDnk2pftCDh8wewxfvMTPVAmlEEhssnBYo/eHqFeoN8YcNWUi1Uqc8HJGXWMIixG1ERU6P7kRuV1gI4UkrIWANrH1XlnUHocPcYl7c//TTYuIASiX/lF0KN8+ZxaWJQi3LSxaMBMpB/PmXPAdqMnCeBiZqRLZ46LgTDmEbjx65SKdZaDX0Xzsu23zPTZdBwbZnSmJbCqVrv46QmXd/cMUIK2bbFaaX6xqBXJrft5F2Sx6S5EZ8t6r646gSibIhtfetHcSxhR5AC4eMjpE4VEgkib+50kwkMnCfGUcIRVEeYe+puEmxD1WUmhujzci7h5UPkletKOnV4H6SH9a2C5GToIeF0chXiZuR1/umcsSQFCNNPHOJtgb0EVJkR47WzEuxcPP7khxDUqTAWUTz6XgopScg5Q5FkAmnkVqBYbBTju5sS7DHd/C6Nn2Qf158s5vwznuMe/TykkQiwpN+dyGQebSek07TDwDGuF81q77f875dkNcYxA5tAWsTw61xuyHxkb5lBnlgXYfKnfHh+G04ry5/8M4gn5Ks97KEBN74CJCdPvESR8h34eqFDZrxNnYj3SzlbPUEwALaZKHF8gMJoAYHokEd159sZvfMkSDrKdFl6Enji/YrkZMLFHsaGHBOuVeQxBAioA1LeDZ8FCC9n4yyQec5tGRrF1Q/w8d6VqwJDlCEpkKTpOJi+53r8vK+xvx/v+X+76SEnyHlk0Vh7i27jutocOVOhNagdDW+qVDvBMLYoFm7Om8xSuQY1BOvTb5VVYDT5Z6WVezwFogJHkBIG+E70P9aO3x0xtCpqnjcidf5q/sE3dzUrTQ2jou0b0jHkF6w65UNK12yunzETw8XiY8ER5k6E3c8XTH/lTvlFJFjg2h46/JlIN6SVC+wgRaIW+mj8uwvq/HsdszkOnA2WBiYaL4X6QhjpQ6OvwQa53lBRE8+Ot7dBH8AApz jNj/QeRl A93pxBJD2wO6oztHfyfxc4tLPXWFfTipv/7822Wz8Ca2P+ee22YvRnGq9KQvg2H0+Ks61QbTtFS2CiYA1eXiOEhtZ+Z/IUGmpcBMKTsN75FpBaLv2r8wFzRzWKcYyh/G2POsEgUEYoaXCmq2ph/ChERAgJZEsQXwSMmCEMQ5dmflvK9Y3Raj76eiGTc4U+JR10p6XJQthmGfQ0QOpnoEwV1zfOVP5cCYgESjAov9VDtBTBJbYIXd5MVLjBuQjre+Yb3kgYZwI4xSHa9xNFROAIoAMStRG0MF780rQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add sleepable implementations of bpf_get_stack() and bpf_get_task_stack() helpers and allow them to be used from sleepable BPF program (e.g., sleepable uprobes). Note, the stack trace IPs capturing itself is not sleepable (that would need to be a separate project), only build ID fetching is sleepable and thus more reliable, as it will wait for data to be paged in, if necessary. For that we make use of sleepable build_id_parse() implementation. Now that build ID related internals in kernel/bpf/stackmap.c can be used both in sleepable and non-sleepable contexts, we need to add additional rcu_read_lock()/rcu_read_unlock() protection around fetching perf_callchain_entry, but with the refactoring in previous commit it's now pretty straightforward. We make sure to do rcu_read_unlock (in sleepable mode only) right before stack_map_get_build_id_offset() call which can sleep. By that time we don't have any more use of perf_callchain_entry. Note, bpf_get_task_stack() will fail for user mode if task != current. And for kernel mode build ID are irrelevant. So in that sense adding sleepable bpf_get_task_stack() implementation is a no-op. It feel right to wire this up for symmetry and completeness, but I'm open to just dropping it until we support `user && crosstask` condition. Signed-off-by: Andrii Nakryiko --- include/linux/bpf.h | 2 + kernel/bpf/stackmap.c | 90 ++++++++++++++++++++++++++++++++-------- kernel/trace/bpf_trace.c | 5 ++- 3 files changed, 77 insertions(+), 20 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7ad37cbdc815..8e7a9f5ccecf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -3194,7 +3194,9 @@ extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; extern const struct bpf_func_proto bpf_get_current_comm_proto; extern const struct bpf_func_proto bpf_get_stackid_proto; extern const struct bpf_func_proto bpf_get_stack_proto; +extern const struct bpf_func_proto bpf_get_stack_sleepable_proto; extern const struct bpf_func_proto bpf_get_task_stack_proto; +extern const struct bpf_func_proto bpf_get_task_stack_sleepable_proto; extern const struct bpf_func_proto bpf_get_stackid_proto_pe; extern const struct bpf_func_proto bpf_get_stack_proto_pe; extern const struct bpf_func_proto bpf_sock_map_update_proto; diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 6457222b0b46..3615c06b7dfa 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -124,6 +124,12 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) return ERR_PTR(err); } +static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, bool may_fault) +{ + return may_fault ? build_id_parse(vma, build_id, NULL) + : build_id_parse_nofault(vma, build_id, NULL); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -135,7 +141,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) * BPF_STACK_BUILD_ID_IP. */ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, - u32 trace_nr, bool user) + u32 trace_nr, bool user, bool may_fault) { int i; struct mmap_unlock_irq_work *work = NULL; @@ -166,7 +172,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, goto build_id_valid; } vma = find_vma(current->mm, ip); - if (!vma || build_id_parse_nofault(vma, id_offs[i].build_id, NULL)) { + if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { /* per entry fall back to ips */ id_offs[i].status = BPF_STACK_BUILD_ID_IP; memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); @@ -257,7 +263,7 @@ static long __bpf_get_stackid(struct bpf_map *map, id_offs = (struct bpf_stack_build_id *)new_bucket->data; for (i = 0; i < trace_nr; i++) id_offs[i].ip = ips[i]; - stack_map_get_build_id_offset(id_offs, trace_nr, user); + stack_map_get_build_id_offset(id_offs, trace_nr, user, false /* !may_fault */); trace_len = trace_nr * sizeof(struct bpf_stack_build_id); if (hash_matches && bucket->nr == trace_nr && memcmp(bucket->data, new_bucket->data, trace_len) == 0) { @@ -398,7 +404,7 @@ const struct bpf_func_proto bpf_get_stackid_proto_pe = { static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, struct perf_callchain_entry *trace_in, - void *buf, u32 size, u64 flags) + void *buf, u32 size, u64 flags, bool may_fault) { u32 trace_nr, copy_len, elem_size, num_elem, max_depth; bool user_build_id = flags & BPF_F_USER_BUILD_ID; @@ -416,8 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, if (kernel && user_build_id) goto clear; - elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id) - : sizeof(u64); + elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64); if (unlikely(size % elem_size)) goto clear; @@ -438,6 +443,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, if (sysctl_perf_event_max_stack < max_depth) max_depth = sysctl_perf_event_max_stack; + if (may_fault) + rcu_read_lock(); /* need RCU for perf's callchain below */ + if (trace_in) trace = trace_in; else if (kernel && task) @@ -445,28 +453,35 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, else trace = get_perf_callchain(regs, 0, kernel, user, max_depth, crosstask, false); - if (unlikely(!trace)) - goto err_fault; - if (trace->nr < skip) + if (unlikely(!trace) || trace->nr < skip) { + if (may_fault) + rcu_read_unlock(); goto err_fault; + } trace_nr = trace->nr - skip; trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem; copy_len = trace_nr * elem_size; ips = trace->ip + skip; - if (user && user_build_id) { + if (user_build_id) { struct bpf_stack_build_id *id_offs = buf; u32 i; for (i = 0; i < trace_nr; i++) id_offs[i].ip = ips[i]; - stack_map_get_build_id_offset(buf, trace_nr, user); } else { memcpy(buf, ips, copy_len); } + /* trace/ips should not be dereferenced after this point */ + if (may_fault) + rcu_read_unlock(); + + if (user_build_id) + stack_map_get_build_id_offset(buf, trace_nr, user, may_fault); + if (size > copy_len) memset(buf + copy_len, 0, size - copy_len); return copy_len; @@ -481,7 +496,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size, u64, flags) { - return __bpf_get_stack(regs, NULL, NULL, buf, size, flags); + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */); } const struct bpf_func_proto bpf_get_stack_proto = { @@ -494,8 +509,24 @@ const struct bpf_func_proto bpf_get_stack_proto = { .arg4_type = ARG_ANYTHING, }; -BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, - u32, size, u64, flags) +BPF_CALL_4(bpf_get_stack_sleepable, struct pt_regs *, regs, void *, buf, u32, size, + u64, flags) +{ + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, true /* may_fault */); +} + +const struct bpf_func_proto bpf_get_stack_sleepable_proto = { + .func = bpf_get_stack_sleepable, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, +}; + +static long __bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, + u64 flags, bool may_fault) { struct pt_regs *regs; long res = -EINVAL; @@ -505,12 +536,18 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, regs = task_pt_regs(task); if (regs) - res = __bpf_get_stack(regs, task, NULL, buf, size, flags); + res = __bpf_get_stack(regs, task, NULL, buf, size, flags, may_fault); put_task_stack(task); return res; } +BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, + u32, size, u64, flags) +{ + return __bpf_get_task_stack(task, buf, size, flags, false /* !may_fault */); +} + const struct bpf_func_proto bpf_get_task_stack_proto = { .func = bpf_get_task_stack, .gpl_only = false, @@ -522,6 +559,23 @@ const struct bpf_func_proto bpf_get_task_stack_proto = { .arg4_type = ARG_ANYTHING, }; +BPF_CALL_4(bpf_get_task_stack_sleepable, struct task_struct *, task, void *, buf, + u32, size, u64, flags) +{ + return __bpf_get_task_stack(task, buf, size, flags, true /* !may_fault */); +} + +const struct bpf_func_proto bpf_get_task_stack_sleepable_proto = { + .func = bpf_get_task_stack_sleepable, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, +}; + BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, void *, buf, u32, size, u64, flags) { @@ -533,7 +587,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, __u64 nr_kernel; if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)) - return __bpf_get_stack(regs, NULL, NULL, buf, size, flags); + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */); if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | BPF_F_USER_BUILD_ID))) @@ -553,7 +607,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, __u64 nr = trace->nr; trace->nr = nr_kernel; - err = __bpf_get_stack(regs, NULL, trace, buf, size, flags); + err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */); /* restore nr */ trace->nr = nr; @@ -565,7 +619,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, goto clear; flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip; - err = __bpf_get_stack(regs, NULL, trace, buf, size, flags); + err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */); } return err; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cd098846e251..c3845470f56d 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1598,7 +1598,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_jiffies64: return &bpf_jiffies64_proto; case BPF_FUNC_get_task_stack: - return &bpf_get_task_stack_proto; + return prog->sleepable ? &bpf_get_task_stack_sleepable_proto + : &bpf_get_task_stack_proto; case BPF_FUNC_copy_from_user: return &bpf_copy_from_user_proto; case BPF_FUNC_copy_from_user_task: @@ -1654,7 +1655,7 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_get_stackid: return &bpf_get_stackid_proto; case BPF_FUNC_get_stack: - return &bpf_get_stack_proto; + return prog->sleepable ? &bpf_get_stack_sleepable_proto : &bpf_get_stack_proto; #ifdef CONFIG_BPF_KPROBE_OVERRIDE case BPF_FUNC_override_return: return &bpf_override_return_proto;