From patchwork Thu Nov 4 07:00:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12602611 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F3CAC4332F for ; Thu, 4 Nov 2021 07:00:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83F9B60FED for ; Thu, 4 Nov 2021 07:00:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230301AbhKDHDN (ORCPT ); Thu, 4 Nov 2021 03:03:13 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:23384 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230155AbhKDHDE (ORCPT ); Thu, 4 Nov 2021 03:03:04 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.1.2/8.16.1.2) with SMTP id 1A3KToX5017186 for ; Thu, 4 Nov 2021 00:00:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=oR1yw4SxTVPb/VBCYLVNZVtdOzPgDrpjQ7HgG/3X44I=; b=I+60w9hT3mBQ7l4VpCpjYsLZlu6yxCi6gmcrTIuHJwOil/dm8LG/ig8zS+uFEMljWxFV XR4NQqzSiNO6wIrnWK11njQ/hoaW0xn/xu3QCCuad5QauSxmibTFA0R+Cue1CLCY9jOv Guu8UZ9MOzebytHUYdqfHLwKNm9lQsN/yfc= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 3c3ves6819-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 04 Nov 2021 00:00:26 -0700 Received: from intmgw001.05.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 4 Nov 2021 00:00:25 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id A28911DC1656C; Thu, 4 Nov 2021 00:00:20 -0700 (PDT) From: Song Liu To: , CC: , , , , , Song Liu Subject: [PATCH v2 bpf-next 1/2] bpf: introduce helper bpf_find_vma Date: Thu, 4 Nov 2021 00:00:15 -0700 Message-ID: <20211104070016.2463668-2-songliubraving@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211104070016.2463668-1-songliubraving@fb.com> References: <20211104070016.2463668-1-songliubraving@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: BunOn0Y8cO6Wx1IhZKSYBNmXgQ7bAFaP X-Proofpoint-ORIG-GUID: BunOn0Y8cO6Wx1IhZKSYBNmXgQ7bAFaP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-04_02,2021-11-03_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 adultscore=0 malwarescore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111040033 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net In some profiler use cases, it is necessary to map an address to the backing file, e.g., a shared library. bpf_find_vma helper provides a flexible way to achieve this. bpf_find_vma maps an address of a task to the vma (vm_area_struct) for this address, and feed the vma to an callback BPF function. The callback function is necessary here, as we need to ensure mmap_sem is unlocked. It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem safely when irqs are disable, we use the same mechanism as stackmap with build_id. Specifically, when irqs are disabled, the unlocked is postponed in an irq_work. Refactor stackmap.c so that the irq_work is shared among bpf_find_vma and stackmap helpers. Signed-off-by: Song Liu Reported-by: kernel test robot --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 20 ++++++++++ kernel/bpf/mmap_unlock_work.h | 65 +++++++++++++++++++++++++++++++ kernel/bpf/stackmap.c | 71 ++++++++-------------------------- kernel/bpf/task_iter.c | 49 +++++++++++++++++++++++ kernel/bpf/verifier.c | 36 +++++++++++++++++ kernel/trace/bpf_trace.c | 2 + tools/include/uapi/linux/bpf.h | 20 ++++++++++ 8 files changed, 209 insertions(+), 55 deletions(-) create mode 100644 kernel/bpf/mmap_unlock_work.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2be6dfd68df99..df3410bff4b06 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2157,6 +2157,7 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; extern const struct bpf_func_proto bpf_sk_getsockopt_proto; extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto; +extern const struct bpf_func_proto bpf_find_vma_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ba5af15e25f5c..22fa7b74de451 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4938,6 +4938,25 @@ union bpf_attr { * **-ENOENT** if symbol is not found. * * **-EPERM** if caller does not have permission to obtain kernel address. + * + * long bpf_find_vma(struct task_struct *task, u64 addr, void *callback_fn, void *callback_ctx, u64 flags) + * Description + * Find vma of *task* that contains *addr*, call *callback_fn* + * function with *task*, *vma*, and *callback_ctx*. + * The *callback_fn* should be a static function and + * the *callback_ctx* should be a pointer to the stack. + * The *flags* is used to control certain aspects of the helper. + * Currently, the *flags* must be 0. + * + * The expected callback signature is + * + * long (\*callback_fn)(struct task_struct \*task, struct vm_area_struct \*vma, void \*ctx); + * + * Return + * 0 on success. + * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. + * **-EBUSY** if failed to try lock mmap_lock. + * **-EINVAL** for invalid **flags**. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5120,6 +5139,7 @@ union bpf_attr { FN(trace_vprintk), \ FN(skc_to_unix_sock), \ FN(kallsyms_lookup_name), \ + FN(find_vma), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/mmap_unlock_work.h b/kernel/bpf/mmap_unlock_work.h new file mode 100644 index 0000000000000..5d18d7d85bef9 --- /dev/null +++ b/kernel/bpf/mmap_unlock_work.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2021 Facebook + */ + +#ifndef __MMAP_UNLOCK_WORK_H__ +#define __MMAP_UNLOCK_WORK_H__ +#include + +/* irq_work to run mmap_read_unlock() in irq_work */ +struct mmap_unlock_irq_work { + struct irq_work irq_work; + struct mm_struct *mm; +}; + +DECLARE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); + +/* + * We cannot do mmap_read_unlock() when the irq is disabled, because of + * risk to deadlock with rq_lock. To look up vma when the irqs are + * disabled, we need to run mmap_read_unlock() in irq_work. We use a + * percpu variable to do the irq_work. If the irq_work is already used + * by another lookup, we fall over. + */ +static inline bool bpf_mmap_unlock_get_irq_work(struct mmap_unlock_irq_work **work_ptr) +{ + struct mmap_unlock_irq_work *work = NULL; + bool irq_work_busy = false; + + if (irqs_disabled()) { + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { + work = this_cpu_ptr(&mmap_unlock_work); + if (irq_work_is_busy(&work->irq_work)) { + /* cannot queue more up_read, fallback */ + irq_work_busy = true; + } + } else { + /* + * PREEMPT_RT does not allow to trylock mmap sem in + * interrupt disabled context. Force the fallback code. + */ + irq_work_busy = true; + } + } + + *work_ptr = work; + return irq_work_busy; +} + +static inline void bpf_mmap_unlock_mm(struct mmap_unlock_irq_work *work, struct mm_struct *mm) +{ + if (!work) { + mmap_read_unlock(mm); + } else { + work->mm = mm; + + /* The lock will be released once we're out of interrupt + * context. Tell lockdep that we've released it now so + * it doesn't complain that we forgot to release it. + */ + rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); + irq_work_queue(&work->irq_work); + } +} + +#endif /* __MMAP_UNLOCK_WORK_H__ */ diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 6e75bbee39f0b..77e366f69212b 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -7,10 +7,10 @@ #include #include #include -#include #include #include #include "percpu_freelist.h" +#include "mmap_unlock_work.h" #define STACK_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | \ @@ -31,25 +31,19 @@ struct bpf_stack_map { struct stack_map_bucket *buckets[]; }; -/* irq_work to run up_read() for build_id lookup in nmi context */ -struct stack_map_irq_work { - struct irq_work irq_work; - struct mm_struct *mm; -}; +DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); -static void do_up_read(struct irq_work *entry) +static void do_mmap_read_unlock(struct irq_work *entry) { - struct stack_map_irq_work *work; + struct mmap_unlock_irq_work *work; if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT))) return; - work = container_of(entry, struct stack_map_irq_work, irq_work); + work = container_of(entry, struct mmap_unlock_irq_work, irq_work); mmap_read_unlock_non_owner(work->mm); } -static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); - static inline bool stack_map_use_build_id(struct bpf_map *map) { return (map->map_flags & BPF_F_STACK_BUILD_ID); @@ -145,39 +139,18 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) return ERR_PTR(err); } + static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, u64 *ips, u32 trace_nr, bool user) { - int i; + struct mmap_unlock_irq_work *work = NULL; + bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); struct vm_area_struct *vma; - bool irq_work_busy = false; - struct stack_map_irq_work *work = NULL; - - if (irqs_disabled()) { - if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { - work = this_cpu_ptr(&up_read_work); - if (irq_work_is_busy(&work->irq_work)) { - /* cannot queue more up_read, fallback */ - irq_work_busy = true; - } - } else { - /* - * PREEMPT_RT does not allow to trylock mmap sem in - * interrupt disabled context. Force the fallback code. - */ - irq_work_busy = true; - } - } + int i; - /* - * We cannot do up_read() when the irq is disabled, because of - * risk to deadlock with rq_lock. To do build_id lookup when the - * irqs are disabled, we need to run up_read() in irq_work. We use - * a percpu variable to do the irq_work. If the irq_work is - * already used by another lookup, we fall back to report ips. - * - * Same fallback is used for kernel stack (!user) on a stackmap - * with build_id. + /* If the irq_work is in use, fall back to report ips. Same + * fallback is used for kernel stack (!user) on a stackmap with + * build_id. */ if (!user || !current || !current->mm || irq_work_busy || !mmap_read_trylock(current->mm)) { @@ -203,19 +176,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, - vma->vm_start; id_offs[i].status = BPF_STACK_BUILD_ID_VALID; } - - if (!work) { - mmap_read_unlock(current->mm); - } else { - work->mm = current->mm; - - /* The lock will be released once we're out of interrupt - * context. Tell lockdep that we've released it now so - * it doesn't complain that we forgot to release it. - */ - rwsem_release(¤t->mm->mmap_lock.dep_map, _RET_IP_); - irq_work_queue(&work->irq_work); - } + bpf_mmap_unlock_mm(work, current->mm); } static struct perf_callchain_entry * @@ -723,11 +684,11 @@ const struct bpf_map_ops stack_trace_map_ops = { static int __init stack_map_init(void) { int cpu; - struct stack_map_irq_work *work; + struct mmap_unlock_irq_work *work; for_each_possible_cpu(cpu) { - work = per_cpu_ptr(&up_read_work, cpu); - init_irq_work(&work->irq_work, do_up_read); + work = per_cpu_ptr(&mmap_unlock_work, cpu); + init_irq_work(&work->irq_work, do_mmap_read_unlock); } return 0; } diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index b48750bfba5aa..9d92e14a6eea4 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -8,6 +8,7 @@ #include #include #include +#include "mmap_unlock_work.h" struct bpf_iter_seq_task_common { struct pid_namespace *ns; @@ -586,6 +587,54 @@ static struct bpf_iter_reg task_vma_reg_info = { .seq_info = &task_vma_seq_info, }; +BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start, + bpf_callback_t, callback_fn, void *, callback_ctx, u64, flags) +{ + struct mmap_unlock_irq_work *work = NULL; + struct vm_area_struct *vma; + bool irq_work_busy = false; + struct mm_struct *mm; + int ret = -ENOENT; + + if (flags) + return -EINVAL; + + if (!task) + return -ENOENT; + + mm = task->mm; + if (!mm) + return -ENOENT; + + irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); + + if (irq_work_busy || !mmap_read_trylock(mm)) + return -EBUSY; + + vma = find_vma(mm, start); + + if (vma && vma->vm_start <= start && vma->vm_end > start) { + callback_fn((u64)(long)task, (u64)(long)vma, + (u64)(long)callback_ctx, 0, 0); + ret = 0; + } + bpf_mmap_unlock_mm(work, mm); + return ret; +} + +BTF_ID_LIST_SINGLE(btf_find_vma_ids, struct, task_struct) + +const struct bpf_func_proto bpf_find_vma_proto = { + .func = bpf_find_vma, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_find_vma_ids[0], + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_FUNC, + .arg4_type = ARG_PTR_TO_STACK_OR_NULL, + .arg5_type = ARG_ANYTHING, +}; + static int __init task_iter_init(void) { int ret; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f0dca726ebfde..a65526112924a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6132,6 +6132,35 @@ static int set_timer_callback_state(struct bpf_verifier_env *env, return 0; } +BTF_ID_LIST_SINGLE(btf_set_find_vma_ids, struct, vm_area_struct) + +static int set_find_vma_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, + int insn_idx) +{ + /* bpf_find_vma(struct task_struct *task, u64 start, + * void *callback_fn, void *callback_ctx, u64 flags) + * (callback_fn)(struct task_struct *task, + * struct vm_area_struct *vma, void *ctx); + */ + callee->regs[BPF_REG_1] = caller->regs[BPF_REG_1]; + + callee->regs[BPF_REG_2].type = PTR_TO_BTF_ID; + __mark_reg_known_zero(&callee->regs[BPF_REG_2]); + callee->regs[BPF_REG_2].btf = btf_vmlinux; + callee->regs[BPF_REG_2].btf_id = btf_set_find_vma_ids[0]; + + /* pointer to stack or null */ + callee->regs[BPF_REG_3] = caller->regs[BPF_REG_4]; + + /* unused */ + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); + callee->in_callback_fn = true; + return 0; +} + static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) { struct bpf_verifier_state *state = env->cur_state; @@ -6489,6 +6518,13 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EINVAL; } + if (func_id == BPF_FUNC_find_vma) { + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, + set_find_vma_callback_state); + if (err < 0) + return -EINVAL; + } + if (func_id == BPF_FUNC_snprintf) { err = check_bpf_snprintf_call(env, regs); if (err < 0) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 7396488793ff7..390176a3031ab 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1208,6 +1208,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_func_ip_proto_tracing; case BPF_FUNC_get_branch_snapshot: return &bpf_get_branch_snapshot_proto; + case BPF_FUNC_find_vma: + return &bpf_find_vma_proto; case BPF_FUNC_trace_vprintk: return bpf_get_trace_vprintk_proto(); default: diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index ba5af15e25f5c..22fa7b74de451 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4938,6 +4938,25 @@ union bpf_attr { * **-ENOENT** if symbol is not found. * * **-EPERM** if caller does not have permission to obtain kernel address. + * + * long bpf_find_vma(struct task_struct *task, u64 addr, void *callback_fn, void *callback_ctx, u64 flags) + * Description + * Find vma of *task* that contains *addr*, call *callback_fn* + * function with *task*, *vma*, and *callback_ctx*. + * The *callback_fn* should be a static function and + * the *callback_ctx* should be a pointer to the stack. + * The *flags* is used to control certain aspects of the helper. + * Currently, the *flags* must be 0. + * + * The expected callback signature is + * + * long (\*callback_fn)(struct task_struct \*task, struct vm_area_struct \*vma, void \*ctx); + * + * Return + * 0 on success. + * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. + * **-EBUSY** if failed to try lock mmap_lock. + * **-EINVAL** for invalid **flags**. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5120,6 +5139,7 @@ union bpf_attr { FN(trace_vprintk), \ FN(skc_to_unix_sock), \ FN(kallsyms_lookup_name), \ + FN(find_vma), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Thu Nov 4 07:00:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12602609 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10757C433EF for ; Thu, 4 Nov 2021 07:00:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8F1161056 for ; Thu, 4 Nov 2021 07:00:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230361AbhKDHDL (ORCPT ); Thu, 4 Nov 2021 03:03:11 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:24400 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbhKDHDF (ORCPT ); Thu, 4 Nov 2021 03:03:05 -0400 Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1A422xpW005419 for ; Thu, 4 Nov 2021 00:00:28 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=ncmUorEmzmbEO1AKZEZElGYsRqnoHrpi83tdM/o96ck=; b=JdJCyWddX/4fS9FPYdYUo+0EmsGO5ZoqegshHFy8bUAHTemlz89Af3UAh+kBy6rOjBDI 3P5uiLsstAXBBbpdowMa4LjFyz4S8QjU7UvgngxwRBF1OqWeMNWOJkaCJ9OSiFhey1Va 3+Mqrakv9i+K33RXuwLzk5vR22P4JtNYoK4= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 3c42t0u1c8-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 04 Nov 2021 00:00:28 -0700 Received: from intmgw001.38.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 4 Nov 2021 00:00:26 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 94E9F1DC16578; Thu, 4 Nov 2021 00:00:21 -0700 (PDT) From: Song Liu To: , CC: , , , , , Song Liu Subject: [PATCH v2 bpf-next 2/2] selftests/bpf: add tests for bpf_find_vma Date: Thu, 4 Nov 2021 00:00:16 -0700 Message-ID: <20211104070016.2463668-3-songliubraving@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211104070016.2463668-1-songliubraving@fb.com> References: <20211104070016.2463668-1-songliubraving@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-ORIG-GUID: kDUKVRl0qqBSyMb6Il2HLYeWu4HTSTht X-Proofpoint-GUID: kDUKVRl0qqBSyMb6Il2HLYeWu4HTSTht X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-04_01,2021-11-03_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 spamscore=0 adultscore=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 impostorscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111040033 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add tests for bpf_find_vma in perf_event program and kprobe program. The perf_event program is triggered from NMI context, so the second call of bpf_find_vma() will return -EBUSY (irq_work busy). The kprobe program, on the other hand, does not have this constraint. Also add test for illegal writes to task or vma from the callback function. The verifier should reject both cases. Signed-off-by: Song Liu --- .../selftests/bpf/prog_tests/find_vma.c | 115 ++++++++++++++++++ tools/testing/selftests/bpf/progs/find_vma.c | 70 +++++++++++ .../selftests/bpf/progs/find_vma_fail1.c | 30 +++++ .../selftests/bpf/progs/find_vma_fail2.c | 30 +++++ 4 files changed, 245 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/find_vma.c create mode 100644 tools/testing/selftests/bpf/progs/find_vma.c create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail1.c create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail2.c diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c new file mode 100644 index 0000000000000..3955a92d4c152 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ +#include +#include +#include +#include "find_vma.skel.h" +#include "find_vma_fail1.skel.h" +#include "find_vma_fail2.skel.h" + +static void test_and_reset_skel(struct find_vma *skel, int expected_find_zero_ret) +{ + ASSERT_EQ(skel->bss->found_vm_exec, 1, "found_vm_exec"); + ASSERT_EQ(skel->data->find_addr_ret, 0, "find_addr_ret"); + ASSERT_EQ(skel->data->find_zero_ret, expected_find_zero_ret, "find_zero_ret"); + ASSERT_OK_PTR(strstr(skel->bss->d_iname, "test_progs"), "find_test_progs"); + + skel->bss->found_vm_exec = 0; + skel->data->find_addr_ret = -1; + skel->data->find_zero_ret = -1; + skel->bss->d_iname[0] = 0; +} + +static int open_pe(void) +{ + struct perf_event_attr attr = {0}; + int pfd; + + /* create perf event */ + attr.size = sizeof(attr); + attr.type = PERF_TYPE_HARDWARE; + attr.config = PERF_COUNT_HW_CPU_CYCLES; + attr.freq = 1; + attr.sample_freq = 4000; + pfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC); + + return pfd >= 0 ? pfd : -errno; +} + +static void test_find_vma_pe(struct find_vma *skel) +{ + struct bpf_link *link = NULL; + volatile int j = 0; + int pfd = -1, i; + + pfd = open_pe(); + if (pfd < 0) { + if (pfd == -ENOENT || pfd == -EOPNOTSUPP) { + printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__); + test__skip(); + } + if (!ASSERT_GE(pfd, 0, "perf_event_open")) + goto cleanup; + } + + link = bpf_program__attach_perf_event(skel->progs.handle_pe, pfd); + if (!ASSERT_OK_PTR(link, "attach_perf_event")) + goto cleanup; + + for (i = 0; i < 1000000; ++i) + ++j; + + test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */); +cleanup: + bpf_link__destroy(link); + close(pfd); + /* caller will clean up skel */ +} + +static void test_find_vma_kprobe(struct find_vma *skel) +{ + int err; + + err = find_vma__attach(skel); + if (!ASSERT_OK(err, "get_branch_snapshot__attach")) + return; /* caller will cleanup skel */ + + getpgid(skel->bss->target_pid); + test_and_reset_skel(skel, -ENOENT /* could not find vma for ptr 0 */); +} + +static void test_illegal_write_vma(void) +{ + struct find_vma_fail1 *skel; + + skel = find_vma_fail1__open_and_load(); + ASSERT_ERR_PTR(skel, "find_vma_fail1__open_and_load"); +} + +static void test_illegal_write_task(void) +{ + struct find_vma_fail2 *skel; + + skel = find_vma_fail2__open_and_load(); + ASSERT_ERR_PTR(skel, "find_vma_fail2__open_and_load"); +} + +void serial_test_find_vma(void) +{ + struct find_vma *skel; + + skel = find_vma__open_and_load(); + if (!ASSERT_OK_PTR(skel, "find_vma__open_and_load")) + return; + + skel->bss->target_pid = getpid(); + skel->bss->addr = (__u64)test_find_vma_pe; + + test_find_vma_pe(skel); + usleep(100000); /* allow the irq_work to finish */ + test_find_vma_kprobe(skel); + + find_vma__destroy(skel); + test_illegal_write_vma(); + test_illegal_write_task(); +} diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c new file mode 100644 index 0000000000000..2776718a54e29 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/find_vma.c @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ +#include "vmlinux.h" +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct callback_ctx { + int dummy; +}; + +#define VM_EXEC 0x00000004 +#define DNAME_INLINE_LEN 32 + +pid_t target_pid = 0; +char d_iname[DNAME_INLINE_LEN] = {0}; +__u32 found_vm_exec = 0; +__u64 addr = 0; +int find_zero_ret = -1; +int find_addr_ret = -1; + +static __u64 +check_vma(struct task_struct *task, struct vm_area_struct *vma, + struct callback_ctx *data) +{ + if (vma->vm_file) + bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1, + vma->vm_file->f_path.dentry->d_iname); + + /* check for VM_EXEC */ + if (vma->vm_flags & VM_EXEC) + found_vm_exec = 1; + + return 0; +} + +SEC("kprobe/__x64_sys_getpgid") +int handle_getpid(void) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct callback_ctx data = {0}; + + if (task->pid != target_pid) + return 0; + + find_addr_ret = bpf_find_vma(task, addr, check_vma, &data, 0); + + /* this should return -ENOENT */ + find_zero_ret = bpf_find_vma(task, 0, check_vma, &data, 0); + return 0; +} + +SEC("perf_event") +int handle_pe(void) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct callback_ctx data = {0}; + + if (task->pid != target_pid) + return 0; + + find_addr_ret = bpf_find_vma(task, addr, check_vma, &data, 0); + + /* In NMI, this should return -EBUSY, as the previous call is using + * the irq_work. + */ + find_zero_ret = bpf_find_vma(task, 0, check_vma, &data, 0); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/find_vma_fail1.c b/tools/testing/selftests/bpf/progs/find_vma_fail1.c new file mode 100644 index 0000000000000..d17bdcdf76f07 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/find_vma_fail1.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ +#include "vmlinux.h" +#include + +char _license[] SEC("license") = "GPL"; + +struct callback_ctx { + int dummy; +}; + +static __u64 +write_vma(struct task_struct *task, struct vm_area_struct *vma, + struct callback_ctx *data) +{ + /* writing to vma, which is illegal */ + vma->vm_flags |= 0x55; + + return 0; +} + +SEC("kprobe/__x64_sys_getpgid") +int handle_getpid(void) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct callback_ctx data = {0}; + + bpf_find_vma(task, 0, write_vma, &data, 0); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/find_vma_fail2.c b/tools/testing/selftests/bpf/progs/find_vma_fail2.c new file mode 100644 index 0000000000000..079c4594c095d --- /dev/null +++ b/tools/testing/selftests/bpf/progs/find_vma_fail2.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021 Facebook */ +#include "vmlinux.h" +#include + +char _license[] SEC("license") = "GPL"; + +struct callback_ctx { + int dummy; +}; + +static __u64 +write_task(struct task_struct *task, struct vm_area_struct *vma, + struct callback_ctx *data) +{ + /* writing to task, which is illegal */ + task->mm = NULL; + + return 0; +} + +SEC("kprobe/__x64_sys_getpgid") +int handle_getpid(void) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct callback_ctx data = {0}; + + bpf_find_vma(task, 0, write_task, &data, 0); + return 0; +}