From patchwork Tue Jul 30 20:39:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13747872 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7D20C3DA49 for ; Tue, 30 Jul 2024 20:39:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E77A6B00A1; Tue, 30 Jul 2024 16:39:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 697A46B00A2; Tue, 30 Jul 2024 16:39:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 538146B00A3; Tue, 30 Jul 2024 16:39:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 345416B00A1 for ; Tue, 30 Jul 2024 16:39:53 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E47E01A04D7 for ; Tue, 30 Jul 2024 20:39:52 +0000 (UTC) X-FDA: 82397585424.23.DB64302 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf12.hostedemail.com (Postfix) with ESMTP id AB25D40018 for ; Tue, 30 Jul 2024 20:39:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GbZd1qMe; spf=pass (imf12.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722371963; a=rsa-sha256; cv=none; b=Y8RGYeclCje7toFiGrPmyneUr5X9jkfMC0t5d6pUBwBgCKvCjuu0LsIJ+gLoSXW4+Ec3KT PSFk6yfkzmndnOhq5/DHGHst5/Z83R91Ugp/jVtzcRoiWWWwN8WOrkxDKbSv5smIDJEiPy 0cksS8Xba8phjVqPmdA1tPaqECJny1U= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GbZd1qMe; spf=pass (imf12.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722371963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eExReJV3MWwy+C+SUD+K7fZstGShidKTlNcMIOPUk4w=; b=sgtV7TwM7naDO8USiwjT8FiMKrEChqWpXa9G6bagfC2hXRfOR1K7qJPToInDqt7TPhe3jF 1t3PlXyMmh5wcYgkRy7+tVr1N7cmuyWXZK1WzlsjbvQWAWLqqzGW39TWj7y++q4e89ff78 dzZEOjnTVwK1RKElOPEeMc8nReyEL4o= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 1BF73CE122B; Tue, 30 Jul 2024 20:39:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32118C32782; Tue, 30 Jul 2024 20:39:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722371987; bh=1bAvoqjPKLvvb/1ljVcpsvHiDgEE6PzS0WvM7NPI00U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GbZd1qMes8zndT70VXhtBjwDxcrJ06b5lz/qd/C5mE0UFwA7BDp327NCm1az6YVLY h865Ho7OmJu5gjGCygsCeIKayaRurn/pE8stpJ0ZA0HUjCxMx0I/WTjzn/heQvYHZT Mvu1DEUjkRZW8BjbvFPKZcJVwef0pu5YiNMTy47NIr9t4k18HPaZbuhzi9K5vtyXc8 75tVXEK+laJdCH62p+cTx/Nj3mppZoJ/yI70+mE2u/k5i6Ndmpf4NtzhMDACLCPyZA 2IMHEa2mxMY08z3oRz2TMUb4/mE/lD0aOX0jMYymJrDwLotPua+7WALOsKiZJrnR9x U7e7MEwkBp99g== From: Andrii Nakryiko To: bpf@vger.kernel.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, adobriyan@gmail.com, shakeel.butt@linux.dev, hannes@cmpxchg.org, ak@linux.intel.com, osandov@osandov.com, song@kernel.org, jannh@google.com, Andrii Nakryiko Subject: [PATCH v3 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Date: Tue, 30 Jul 2024 13:39:13 -0700 Message-ID: <20240730203914.1182569-10-andrii@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730203914.1182569-1-andrii@kernel.org> References: <20240730203914.1182569-1-andrii@kernel.org> MIME-Version: 1.0 X-Stat-Signature: uo73gcq7kzobzh7taqfe784e7uqebwb3 X-Rspamd-Queue-Id: AB25D40018 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1722371990-477750 X-HE-Meta: U2FsdGVkX1/kl+Tc25bMhjKLLUZtR5ES3eZyantmDk5BG5FHnVC1QJegOzELxgT0lFA964ufGSGRF/eXxyJWHMtnJ5snNU35wj4gZEvw2oyUN0qzSdLYHuJqgltSDJB+3+Np3mUCuPOJCSNoz7OxIU1unmDqrt/7aZrJVKlGmsaQdXo/QyUjivDMCZeT4rixKezbo3xwHnsac7LwVp5n5T8JObc3uq8BR0sUSQAIkMqrgfAqiYLJJ4SUH5mdqjjq22cVu0sjfEGKAqudBimB9EKOUEr06GrvMMZESk9kZBiYFE4E9QsCFAjuEFSKWYxFTh+d/9ZlMkPPaKMDxclXio/CVj9/90er/GXW1ynw2z8VSH2DviqMucuLlIu2m7f/z0N+WaBxtBvmDgqzyuQPY+jxckbeifwq8avganBOL/tfPsAGMEBCoNIaLknwOhr5+Kw6qquMgBo8hzb+MxhE0/yIpqmKUvb4TLO9AFo4ZlqS4+5klzDH2DMqPXuXdAfiqQ8K9QpUtfX+xHU7HUbio9g+2ubhIHmbNvjDq4kwhoZxvpkqMjp/rX9v6Rh6WubCiYqEEYRaxhtoqLDogoQxUq1CZCuS7e+6o58EpT1d3EK/yKJNEYslX7hNhbLuQaXng3zi7eaNbHQ65YLIHWNJy/jZRGucf1Er2wSLmWbA+OfNRtS8Y9FLOPJOIRocCDgXzqvYAk/PfGmPw5yF1Gz1cVnLaaZxt94d+OCnJxU4L6Olqs1EsfPaqBWNMIxTv/4dJHA2z95XOAg0X+CTHo/wbdWIRdZsla+d8npwBQauS8Mu3vplORtDhMKLWrPpzq2QO8O7C/MBEFPm3z4TPHkoow/55TWPqukytEBEV3d+hEmG4FH1q7FUrFrZWHyX/nNFuDPWkrr2AukQDSEjvJnT4BzD7V3cpAns3oy1hVp9FrDxZMfEmVl9UQ7i+RJTLB3YAt35mWcg/JBckb5Z7Gk VbfxFL9c yNgV/J5QLc7UzeZUtgX8GeDBYKjXm0d73rnCHOtaCkxKkYQaZxTubN+s06xYcPsVKMUMwbx4LGvE/LGExIJBIBGPweqtJHu0w8vWTVIVwnMf0ezF/gOGQpKFIkF1jzceAr8+7VuS5EPWBhdCEEJ3NdByqzy9Rp318PRISJRoePUnsRMctNnpI5dh1O5TTzvIilgFqMJ/MiJ22GMlTx2kKA5q+/bjJSZWxKhUF8hS+XdlKrFfiWKUVMXck/MxI8QiLndsFBCAcvXLZcuNd2xV/IZUpJOMIMdPfFQIb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add sleepable implementations of bpf_get_stack() and bpf_get_task_stack() helpers and allow them to be used from sleepable BPF program (e.g., sleepable uprobes). Note, the stack trace IPs capturing itself is not sleepable (that would need to be a separate project), only build ID fetching is sleepable and thus more reliable, as it will wait for data to be paged in, if necessary. For that we make use of sleepable build_id_parse() implementation. Now that build ID related internals in kernel/bpf/stackmap.c can be used both in sleepable and non-sleepable contexts, we need to add additional rcu_read_lock()/rcu_read_unlock() protection around fetching perf_callchain_entry, but with the refactoring in previous commit it's now pretty straightforward. We make sure to do rcu_read_unlock (in sleepable mode only) right before stack_map_get_build_id_offset() call which can sleep. By that time we don't have any more use of perf_callchain_entry. Note, bpf_get_task_stack() will fail for user mode if task != current. And for kernel mode build ID are irrelevant. So in that sense adding sleepable bpf_get_task_stack() implementation is a no-op. It feel right to wire this up for symmetry and completeness, but I'm open to just dropping it until we support `user && crosstask` condition. Signed-off-by: Andrii Nakryiko --- include/linux/bpf.h | 2 + kernel/bpf/stackmap.c | 90 ++++++++++++++++++++++++++++++++-------- kernel/trace/bpf_trace.c | 5 ++- 3 files changed, 77 insertions(+), 20 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b9425e410bcb..0f3dc903bea8 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -3198,7 +3198,9 @@ extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; extern const struct bpf_func_proto bpf_get_current_comm_proto; extern const struct bpf_func_proto bpf_get_stackid_proto; extern const struct bpf_func_proto bpf_get_stack_proto; +extern const struct bpf_func_proto bpf_get_stack_sleepable_proto; extern const struct bpf_func_proto bpf_get_task_stack_proto; +extern const struct bpf_func_proto bpf_get_task_stack_sleepable_proto; extern const struct bpf_func_proto bpf_get_stackid_proto_pe; extern const struct bpf_func_proto bpf_get_stack_proto_pe; extern const struct bpf_func_proto bpf_sock_map_update_proto; diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 6457222b0b46..3615c06b7dfa 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -124,6 +124,12 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) return ERR_PTR(err); } +static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, bool may_fault) +{ + return may_fault ? build_id_parse(vma, build_id, NULL) + : build_id_parse_nofault(vma, build_id, NULL); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -135,7 +141,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) * BPF_STACK_BUILD_ID_IP. */ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, - u32 trace_nr, bool user) + u32 trace_nr, bool user, bool may_fault) { int i; struct mmap_unlock_irq_work *work = NULL; @@ -166,7 +172,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, goto build_id_valid; } vma = find_vma(current->mm, ip); - if (!vma || build_id_parse_nofault(vma, id_offs[i].build_id, NULL)) { + if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { /* per entry fall back to ips */ id_offs[i].status = BPF_STACK_BUILD_ID_IP; memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); @@ -257,7 +263,7 @@ static long __bpf_get_stackid(struct bpf_map *map, id_offs = (struct bpf_stack_build_id *)new_bucket->data; for (i = 0; i < trace_nr; i++) id_offs[i].ip = ips[i]; - stack_map_get_build_id_offset(id_offs, trace_nr, user); + stack_map_get_build_id_offset(id_offs, trace_nr, user, false /* !may_fault */); trace_len = trace_nr * sizeof(struct bpf_stack_build_id); if (hash_matches && bucket->nr == trace_nr && memcmp(bucket->data, new_bucket->data, trace_len) == 0) { @@ -398,7 +404,7 @@ const struct bpf_func_proto bpf_get_stackid_proto_pe = { static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, struct perf_callchain_entry *trace_in, - void *buf, u32 size, u64 flags) + void *buf, u32 size, u64 flags, bool may_fault) { u32 trace_nr, copy_len, elem_size, num_elem, max_depth; bool user_build_id = flags & BPF_F_USER_BUILD_ID; @@ -416,8 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, if (kernel && user_build_id) goto clear; - elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id) - : sizeof(u64); + elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64); if (unlikely(size % elem_size)) goto clear; @@ -438,6 +443,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, if (sysctl_perf_event_max_stack < max_depth) max_depth = sysctl_perf_event_max_stack; + if (may_fault) + rcu_read_lock(); /* need RCU for perf's callchain below */ + if (trace_in) trace = trace_in; else if (kernel && task) @@ -445,28 +453,35 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, else trace = get_perf_callchain(regs, 0, kernel, user, max_depth, crosstask, false); - if (unlikely(!trace)) - goto err_fault; - if (trace->nr < skip) + if (unlikely(!trace) || trace->nr < skip) { + if (may_fault) + rcu_read_unlock(); goto err_fault; + } trace_nr = trace->nr - skip; trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem; copy_len = trace_nr * elem_size; ips = trace->ip + skip; - if (user && user_build_id) { + if (user_build_id) { struct bpf_stack_build_id *id_offs = buf; u32 i; for (i = 0; i < trace_nr; i++) id_offs[i].ip = ips[i]; - stack_map_get_build_id_offset(buf, trace_nr, user); } else { memcpy(buf, ips, copy_len); } + /* trace/ips should not be dereferenced after this point */ + if (may_fault) + rcu_read_unlock(); + + if (user_build_id) + stack_map_get_build_id_offset(buf, trace_nr, user, may_fault); + if (size > copy_len) memset(buf + copy_len, 0, size - copy_len); return copy_len; @@ -481,7 +496,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size, u64, flags) { - return __bpf_get_stack(regs, NULL, NULL, buf, size, flags); + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */); } const struct bpf_func_proto bpf_get_stack_proto = { @@ -494,8 +509,24 @@ const struct bpf_func_proto bpf_get_stack_proto = { .arg4_type = ARG_ANYTHING, }; -BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, - u32, size, u64, flags) +BPF_CALL_4(bpf_get_stack_sleepable, struct pt_regs *, regs, void *, buf, u32, size, + u64, flags) +{ + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, true /* may_fault */); +} + +const struct bpf_func_proto bpf_get_stack_sleepable_proto = { + .func = bpf_get_stack_sleepable, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, +}; + +static long __bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, + u64 flags, bool may_fault) { struct pt_regs *regs; long res = -EINVAL; @@ -505,12 +536,18 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, regs = task_pt_regs(task); if (regs) - res = __bpf_get_stack(regs, task, NULL, buf, size, flags); + res = __bpf_get_stack(regs, task, NULL, buf, size, flags, may_fault); put_task_stack(task); return res; } +BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, + u32, size, u64, flags) +{ + return __bpf_get_task_stack(task, buf, size, flags, false /* !may_fault */); +} + const struct bpf_func_proto bpf_get_task_stack_proto = { .func = bpf_get_task_stack, .gpl_only = false, @@ -522,6 +559,23 @@ const struct bpf_func_proto bpf_get_task_stack_proto = { .arg4_type = ARG_ANYTHING, }; +BPF_CALL_4(bpf_get_task_stack_sleepable, struct task_struct *, task, void *, buf, + u32, size, u64, flags) +{ + return __bpf_get_task_stack(task, buf, size, flags, true /* !may_fault */); +} + +const struct bpf_func_proto bpf_get_task_stack_sleepable_proto = { + .func = bpf_get_task_stack_sleepable, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, +}; + BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, void *, buf, u32, size, u64, flags) { @@ -533,7 +587,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, __u64 nr_kernel; if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)) - return __bpf_get_stack(regs, NULL, NULL, buf, size, flags); + return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */); if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | BPF_F_USER_BUILD_ID))) @@ -553,7 +607,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, __u64 nr = trace->nr; trace->nr = nr_kernel; - err = __bpf_get_stack(regs, NULL, trace, buf, size, flags); + err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */); /* restore nr */ trace->nr = nr; @@ -565,7 +619,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx, goto clear; flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip; - err = __bpf_get_stack(regs, NULL, trace, buf, size, flags); + err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */); } return err; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cd098846e251..c3845470f56d 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1598,7 +1598,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_jiffies64: return &bpf_jiffies64_proto; case BPF_FUNC_get_task_stack: - return &bpf_get_task_stack_proto; + return prog->sleepable ? &bpf_get_task_stack_sleepable_proto + : &bpf_get_task_stack_proto; case BPF_FUNC_copy_from_user: return &bpf_copy_from_user_proto; case BPF_FUNC_copy_from_user_task: @@ -1654,7 +1655,7 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_get_stackid: return &bpf_get_stackid_proto; case BPF_FUNC_get_stack: - return &bpf_get_stack_proto; + return prog->sleepable ? &bpf_get_stack_sleepable_proto : &bpf_get_stack_proto; #ifdef CONFIG_BPF_KPROBE_OVERRIDE case BPF_FUNC_override_return: return &bpf_override_return_proto;