From patchwork Mon Mar 18 09:31:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jiri Olsa X-Patchwork-Id: 13595114 X-Patchwork-Delegate: bpf@iogearbox.net Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D23892E413 for ; Mon, 18 Mar 2024 09:31:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754313; cv=none; b=Jp1iB7FT4QG9GT5KA2gKEgcaIoMcPv3eZLe7GoEhLEjO0gl+jctXqJP+E+fQo8qBg2uiYgeX/A/PAnf4Ss1unV5QqAh4byecygvi4+q3R78NqCy+b+hy9O5CAzC05vE/Wow0yZwg1O9dRv4+OoPQ/xvGumKKVBc/bu7GQwO3EV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754313; c=relaxed/simple; bh=sJtRldE1jKSykBRamDo7kUT1GdmxrO/Lg8QfvT88p2c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bS0qsRszXKoljSvQiBnKinL6XrBHX+1JNvGwXbnD8HohnH3aG/Vor5NtDlGGgYWywln8L9CYFuxvlrPl2qUO7HIlwesfq+Eui7neG7y7UvBPS7dAdtxyL8EEDcnH6pkFZ/UETJCbh8CmW5vvlfFfRTGrS/WEDxYYnWzKi/fTS8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IVqV8p4V; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IVqV8p4V" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1D53C433F1; Mon, 18 Mar 2024 09:31:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710754313; bh=sJtRldE1jKSykBRamDo7kUT1GdmxrO/Lg8QfvT88p2c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IVqV8p4VocPWsC7LvYsReF6AvmEbDJl3KXUsnqKpYpnSOaU2BDrF5pU0BhGLsfAyA 0tHLO3clyQR1FEXPi1ZqjmDdxq7QAM6NjHaeLzPQ59LUyJ5hGFAzfGqd7VCYMnM6gZ RN7P9FfxRHjO6YpU6P+6vxlLRdwUjjX7UE6kamJN3JOMNW3gWEFtjvGHKrjgFt6cNE 2Nqqbxih6ReK8r8Htp81SbS88UN1OwkQ8bk/OIrH3rdw6R9cXg1ZVziY297rmPkYS5 npLgUgL905TTmctKE3TvbeyBM2D4VaWTQIldgMW8Ozh8oVBKuTVb24yAt7XKjG3j7I 7SXjPEfyoRMrw== From: Jiri Olsa To: Oleg Nesterov , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: bpf@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend , Peter Zijlstra , Thomas Gleixner , "Borislav Petkov (AMD)" , x86@kernel.org Subject: [PATCH RFC bpf-next 1/3] uprobe: Add uretprobe syscall to speed up return probe Date: Mon, 18 Mar 2024 10:31:36 +0100 Message-ID: <20240318093139.293497-2-jolsa@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240318093139.293497-1-jolsa@kernel.org> References: <20240318093139.293497-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Adding uretprobe syscall instead of trap to speed up return probe. At the moment the uretprobe setup/path is: - install entry uprobe - when the uprobe is hit overwrite probed function's return address on stack with address of the trampoline that contains breakpoint instruction - the breakpoint trap code handles the uretprobe consumers execution and jumps back to original return address This patch changes the above trampoline's breakpoint instruction to new ureprobe syscall call. This syscall does exactly the same job as the trap with some extra work: - syscall trampoline must save original value for rax/r11/rcx registers on stack - rax is set to syscall number and r11/rcx are changed and used by syscall instruction - the syscall code reads the original values of those registers and restore those values in task's pt_regs area Even with the extra registers handling code the having uretprobes handled by syscalls shows speed improvement. On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz) current: base : 15.173 ± 0.052M/s uprobe-nop : 2.797 ± 0.090M/s uprobe-push : 2.580 ± 0.007M/s uprobe-ret : 1.001 ± 0.003M/s uretprobe-nop : 1.373 ± 0.002M/s uretprobe-push : 1.346 ± 0.002M/s uretprobe-ret : 0.747 ± 0.001M/s with the fix: base : 15.704 ± 0.076M/s uprobe-nop : 2.841 ± 0.008M/s uprobe-push : 2.666 ± 0.029M/s uprobe-ret : 1.037 ± 0.008M/s uretprobe-nop : 1.718 ± 0.010M/s < ~25% speed up uretprobe-push : 1.658 ± 0.008M/s < ~23% speed up uretprobe-ret : 0.853 ± 0.004M/s < ~14% speed up On Amd (AMD Ryzen 7 5700U) current: base : 5.702 ± 0.003M/s uprobe-nop : 1.505 ± 0.011M/s uprobe-push : 1.388 ± 0.008M/s uprobe-ret : 0.825 ± 0.001M/s uretprobe-nop : 0.782 ± 0.001M/s uretprobe-push : 0.750 ± 0.001M/s uretprobe-ret : 0.544 ± 0.001M/s with the fix: base : 5.669 ± 0.004M/s uprobe-nop : 1.539 ± 0.001M/s uprobe-push : 1.385 ± 0.003M/s uprobe-ret : 0.819 ± 0.001M/s uretprobe-nop : 0.889 ± 0.001M/s < ~13% speed up uretprobe-push : 0.846 ± 0.001M/s < ~12% speed up uretprobe-ret : 0.594 ± 0.000M/s < ~9% speed up Suggested-by: Andrii Nakryiko Signed-off-by: Jiri Olsa Reviewed-by: Oleg Nesterov Acked-by: Andrii Nakryiko --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/kernel/uprobes.c | 48 ++++++++++++++++++++++++++ include/linux/syscalls.h | 2 ++ include/linux/uprobes.h | 2 ++ include/uapi/asm-generic/unistd.h | 5 ++- kernel/events/uprobes.c | 18 +++++++--- kernel/sys_ni.c | 2 ++ 7 files changed, 73 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 7e8d46f4147f..af0a33ab06ee 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -383,6 +383,7 @@ 459 common lsm_get_self_attr sys_lsm_get_self_attr 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules +462 64 uretprobe sys_uretprobe # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 6c07f6daaa22..069371e86180 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -308,6 +309,53 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool } #ifdef CONFIG_X86_64 + +asm ( + ".pushsection .rodata\n" + ".global uretprobe_syscall_entry\n" + "uretprobe_syscall_entry:\n" + "pushq %rax\n" + "pushq %rcx\n" + "pushq %r11\n" + "movq $462, %rax\n" + "syscall\n" + ".global uretprobe_syscall_end\n" + "uretprobe_syscall_end:\n" + ".popsection\n" +); + +extern u8 uretprobe_syscall_entry[]; +extern u8 uretprobe_syscall_end[]; + +void *arch_uprobe_trampoline(unsigned long *psize) +{ + *psize = uretprobe_syscall_end - uretprobe_syscall_entry; + return uretprobe_syscall_entry; +} + +SYSCALL_DEFINE0(uretprobe) +{ + struct pt_regs *regs = task_pt_regs(current); + unsigned long sregs[3], err; + + /* + * We set rax and syscall itself changes rcx and r11, so the syscall + * trampoline saves their original values on stack. We need to read + * them and set original register values and fix the rsp pointer back. + */ + err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs)); + WARN_ON_ONCE(err); + + regs->r11 = sregs[0]; + regs->cx = sregs[1]; + regs->ax = sregs[2]; + regs->orig_ax = -1; + regs->sp += sizeof(sregs); + + uprobe_handle_trampoline(regs); + return regs->ax; +} + /* * If arch_uprobe->insn doesn't use rip-relative addressing, return * immediately. Otherwise, rewrite the instruction so that it accesses diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 77eb9b0e7685..db150794f89d 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, size_t *size, u32 flags); /* x86 */ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on); +asmlinkage long sys_uretprobe(void); + /* pciconfig: alpha, arm, arm64, ia64, sparc */ asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn, unsigned long off, unsigned long len, diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f46e0ca0169c..a490146ad89d 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -138,6 +138,8 @@ extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs); extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, void *src, unsigned long len); +extern void uprobe_handle_trampoline(struct pt_regs *regs); +extern void *arch_uprobe_trampoline(unsigned long *psize); #else /* !CONFIG_UPROBES */ struct uprobes_state { }; diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 75f00965ab15..8a747cd1d735 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr) #define __NR_lsm_list_modules 461 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules) +#define __NR_uretprobe 462 +__SYSCALL(__NR_uretprobe, sys_uretprobe) + #undef __NR_syscalls -#define __NR_syscalls 462 +#define __NR_syscalls 463 /* * 32 bit systems traditionally used different diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 929e98c62965..90395b16bde0 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) return ret; } +void * __weak arch_uprobe_trampoline(unsigned long *psize) +{ + static uprobe_opcode_t insn = UPROBE_SWBP_INSN; + + *psize = UPROBE_SWBP_INSN_SIZE; + return &insn; +} + static struct xol_area *__create_xol_area(unsigned long vaddr) { struct mm_struct *mm = current->mm; - uprobe_opcode_t insn = UPROBE_SWBP_INSN; + unsigned long insns_size; struct xol_area *area; + void *insns; area = kmalloc(sizeof(*area), GFP_KERNEL); if (unlikely(!area)) @@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long vaddr) /* Reserve the 1st slot for get_trampoline_vaddr() */ set_bit(0, area->bitmap); atomic_set(&area->slot_count, 1); - arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE); + insns = arch_uprobe_trampoline(&insns_size); + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size); if (!xol_add_vma(mm, area)) return area; @@ -2123,7 +2133,7 @@ static struct return_instance *find_next_ret_chain(struct return_instance *ri) return ri; } -static void handle_trampoline(struct pt_regs *regs) +void uprobe_handle_trampoline(struct pt_regs *regs) { struct uprobe_task *utask; struct return_instance *ri, *next; @@ -2188,7 +2198,7 @@ static void handle_swbp(struct pt_regs *regs) bp_vaddr = uprobe_get_swbp_addr(regs); if (bp_vaddr == get_trampoline_vaddr()) - return handle_trampoline(regs); + return uprobe_handle_trampoline(regs); uprobe = find_active_uprobe(bp_vaddr, &is_swbp); if (!uprobe) { diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index faad00cce269..be6195e0d078 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16); /* restartable sequence */ COND_SYSCALL(rseq); + +COND_SYSCALL(uretprobe); From patchwork Mon Mar 18 09:31:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiri Olsa X-Patchwork-Id: 13595115 X-Patchwork-Delegate: bpf@iogearbox.net Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A87372C690 for ; Mon, 18 Mar 2024 09:32:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754323; cv=none; b=FKxjOD+WNG8cvV4m6VHb5LAXw2vIqG1E6je5iD1AciYbFGRa0au7NGKT/HfeIWZS2Y6zVnndQCk4IuJY8VhmX2gIUF7LrG9DeigxDhzhomBXSoqbP4CZU/5NeBhVBrp6X66/9XdDeBJV5kmKl7C40niCe7zXWBYuU2KQv9bHBDA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754323; c=relaxed/simple; bh=5DboNuMwT5PdIDVwPvOeDhnMZylEb9eibCS4DHvnQU4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HyAsn74nR+bOcaTXDPOakYHdSt8EYpazUxmU/ncdDb9lNueHa6LWwH5Cz2VXfmyNgzLO+IrJajD0B4AR4P2+qDwZjBjASflTW1j0HQ3ke0qiVcDn1BFYAdsubctigmD5dXMUPQa7NFtnbFthX6jTZSC9CkjFky45YNcIwezZa8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tvPswqKL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tvPswqKL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FFEAC433F1; Mon, 18 Mar 2024 09:32:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710754323; bh=5DboNuMwT5PdIDVwPvOeDhnMZylEb9eibCS4DHvnQU4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tvPswqKLAiillPvOzVL2ytrZmiP2QirZqu7X8GN8GDqOngfUIJPq3w3gHX/jf69fI O9QrgyhZfuCNHOB5qcTHpm2plkw0RRJg7j5Ly0kSgJk0FR0SNbpA+hwlBteJOAwXfm jBPh0ZmqsOXtOmQZktp0kDlZ/yLvzsy0VzCrG3S9jmiukd2gf4gE3mmdtoVY2MXwVq M1gHEDtkkYN+BRmNzaSAm3mPMd2LWtRvrmP2R0k4UKtQq29GyrXVaN3T+b4bavkqwI SYRLc1cdo3MhAudJIoBz7a13x63IRqKNWXqGHb3cD08WV+MgQr1UZ3q4avAcBxDyc8 WZ+j11gZdOI6Q== From: Jiri Olsa To: Oleg Nesterov , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: bpf@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend , Peter Zijlstra , Thomas Gleixner , "Borislav Petkov (AMD)" , x86@kernel.org Subject: [PATCH RFC bpf-next 2/3] selftests/bpf: Add uretprobe syscall test Date: Mon, 18 Mar 2024 10:31:37 +0100 Message-ID: <20240318093139.293497-3-jolsa@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240318093139.293497-1-jolsa@kernel.org> References: <20240318093139.293497-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add uretprobe syscall test and compare register values before and after the uretprobe is hit. Also compare the register values seen from attached bpf program. Signed-off-by: Jiri Olsa --- tools/testing/selftests/bpf/Makefile | 13 ++- .../bpf/prog_tests/arch/x86/uprobe_syscall.S | 89 +++++++++++++++++++ .../selftests/bpf/prog_tests/uprobe_syscall.c | 84 +++++++++++++++++ .../selftests/bpf/progs/uprobe_syscall.c | 15 ++++ 4 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 3b9eb40d6343..e425a946276b 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -490,6 +490,9 @@ TRUNNER_TEST_OBJS := $$(patsubst %.c,$$(TRUNNER_OUTPUT)/%.test.o, \ $$(notdir $$(wildcard $(TRUNNER_TESTS_DIR)/*.c))) TRUNNER_EXTRA_OBJS := $$(patsubst %.c,$$(TRUNNER_OUTPUT)/%.o, \ $$(filter %.c,$(TRUNNER_EXTRA_SOURCES))) +TRUNNER_ASM_OBJS := $$(patsubst %.S,$$(TRUNNER_OUTPUT)/%.arch.o, \ + $$(notdir $$(wildcard $(TRUNNER_TESTS_DIR)/arch/$(SRCARCH)/*.S))) + TRUNNER_EXTRA_HDRS := $$(filter %.h,$(TRUNNER_EXTRA_SOURCES)) TRUNNER_TESTS_HDR := $(TRUNNER_TESTS_DIR)/tests.h TRUNNER_BPF_SRCS := $$(notdir $$(wildcard $(TRUNNER_BPF_PROGS_DIR)/*.c)) @@ -597,6 +600,13 @@ $(TRUNNER_EXTRA_OBJS): $(TRUNNER_OUTPUT)/%.o: \ $$(call msg,EXT-OBJ,$(TRUNNER_BINARY),$$@) $(Q)$$(CC) $$(CFLAGS) -c $$< $$(LDLIBS) -o $$@ +$(TRUNNER_ASM_OBJS): $(TRUNNER_OUTPUT)/%.arch.o: \ + $(TRUNNER_TESTS_DIR)/arch/$(SRCARCH)/%.S \ + $(TRUNNER_TESTS_HDR) \ + $$(BPFOBJ) | $(TRUNNER_OUTPUT) + $$(call msg,ASM-OBJ,$(TRUNNER_BINARY),$$@) + $(Q)$$(CC) $$(CFLAGS) -c $$< $$(LDLIBS) -o $$@ + # non-flavored in-srctree builds receive special treatment, in particular, we # do not need to copy extra resources (see e.g. test_btf_dump_case()) $(TRUNNER_BINARY)-extras: $(TRUNNER_EXTRA_FILES) | $(TRUNNER_OUTPUT) @@ -606,7 +616,8 @@ ifneq ($2:$(OUTPUT),:$(shell pwd)) endif $(OUTPUT)/$(TRUNNER_BINARY): $(TRUNNER_TEST_OBJS) \ - $(TRUNNER_EXTRA_OBJS) $$(BPFOBJ) \ + $(TRUNNER_EXTRA_OBJS) $(TRUNNER_ASM_OBJS) \ + $$(BPFOBJ) \ $(RESOLVE_BTFIDS) \ $(TRUNNER_BPFTOOL) \ | $(TRUNNER_BINARY)-extras diff --git a/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S b/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S new file mode 100644 index 000000000000..bcbad218c4d6 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/arch/x86/uprobe_syscall.S @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef ASM_NL +#define ASM_NL ; +#endif + +#define SYM_ENTRY(name) \ + .globl name ASM_NL \ + name: + +#define SYM_END(name) \ + .type name STT_FUNC ASM_NL \ + .size name, .-name ASM_NL + +.code64 +.section .text, "ax" + +SYM_ENTRY(uprobe_syscall_arch_test) + movq $0xdeadbeef, %rax + ret +SYM_END(uprobe_syscall_arch_test) + +.globl uprobe_syscall_arch +uprobe_syscall_arch: + movq %r15, 0(%rdi) + movq %r14, 8(%rdi) + movq %r13, 16(%rdi) + movq %r12, 24(%rdi) + movq %rbp, 32(%rdi) + movq %rbx, 40(%rdi) + movq %r11, 48(%rdi) + movq %r10, 56(%rdi) + movq %r9, 64(%rdi) + movq %r8, 72(%rdi) + movq %rax, 80(%rdi) + movq %rcx, 88(%rdi) + movq %rdx, 96(%rdi) + movq %rsi, 104(%rdi) + movq %rdi, 112(%rdi) + movq $0, 120(%rdi) /* orig_rax */ + movq $0, 128(%rdi) /* rip */ + movq $0, 136(%rdi) /* cs */ + + pushf + pop %rax + + movq %rax, 144(%rdi) /* eflags */ + movq %rsp, 152(%rdi) /* rsp */ + movq $0, 160(%rdi) /* ss */ + + pushq %rsi + call uprobe_syscall_arch_test + + /* store return value and get second argument pointer to rax */ + pushq %rax + movq 8(%rsp), %rax + + movq %r15, 0(%rax) + movq %r14, 8(%rax) + movq %r13, 16(%rax) + movq %r12, 24(%rax) + movq %rbp, 32(%rax) + movq %rbx, 40(%rax) + movq %r11, 48(%rax) + movq %r10, 56(%rax) + movq %r9, 64(%rax) + movq %r8, 72(%rax) + movq %rcx, 88(%rax) + movq %rdx, 96(%rax) + movq %rsi, 104(%rax) + movq %rdi, 112(%rax) + movq $0, 120(%rax) /* orig_rax */ + movq $0, 128(%rax) /* rip */ + movq $0, 136(%rax) /* cs */ + + pop %rax + pop %rsi + movq %rax, 80(%rsi) + + pushf + pop %rax + + movq %rax, 144(%rsi) /* eflags */ + movq %rsp, 152(%rsi) /* rsp */ + movq $0, 160(%rsi) /* ss */ + + ret + +.section .note.GNU-stack,"",@progbits diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c new file mode 100644 index 000000000000..0df205fea957 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include + +#ifdef __x86_64__ + +#include +#include +#include "uprobe_syscall.skel.h" + +extern int uprobe_syscall_arch(struct pt_regs *before, struct pt_regs *after); + +static void test_uretprobe(void) +{ + struct pt_regs before = {}, after = {}; + unsigned long *pb = (unsigned long *) &before; + unsigned long *pa = (unsigned long *) &after; + unsigned long *prog_regs; + struct uprobe_syscall *skel = NULL; + unsigned int i, cnt; + int err; + + skel = uprobe_syscall__open_and_load(); + if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load")) + goto cleanup; + + err = uprobe_syscall__attach(skel); + if (!ASSERT_OK(err, "uprobe_syscall__attach")) + goto cleanup; + + uprobe_syscall_arch(&before, &after); + + prog_regs = (unsigned long *) &skel->bss->regs; + cnt = sizeof(before)/sizeof(*pb); + + for (i = 0; i < cnt; i++) { + unsigned int offset = i * sizeof(unsigned long); + + /* + * Check register before and after uprobe_syscall_arch_test call + * that triggers the uretprobe. + */ + switch (offset) { + case offsetof(struct pt_regs, rax): + ASSERT_EQ(pa[i], 0xdeadbeef, "return value"); + break; + default: + if (!ASSERT_EQ(pb[i], pa[i], "register before-after value check")) + fprintf(stdout, "failed register offset %u\n", offset); + } + + /* + * Check register seen from bpf program and register after + * uprobe_syscall_arch_test call + */ + switch (offset) { + /* + * These will be different (not set in uprobe_syscall_arch), + * we don't care. + */ + case offsetof(struct pt_regs, orig_rax): + case offsetof(struct pt_regs, rip): + case offsetof(struct pt_regs, cs): + case offsetof(struct pt_regs, rsp): + case offsetof(struct pt_regs, ss): + break; + default: + if (!ASSERT_EQ(prog_regs[i], pa[i], "register prog-after value check")) + fprintf(stdout, "failed register offset %u\n", offset); + } + } + +cleanup: + uprobe_syscall__destroy(skel); +} +#else +static void test_uretprobe(void) { } +#endif + +void test_uprobe_syscall(void) +{ + if (test__start_subtest("uretprobe")) + test_uretprobe(); +} diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c new file mode 100644 index 000000000000..0cc7e8761410 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "vmlinux.h" +#include +#include + +struct pt_regs regs; + +char _license[] SEC("license") = "GPL"; + +SEC("uretprobe//proc/self/exe:uprobe_syscall_arch_test") +int uretprobe(struct pt_regs *ctx) +{ + memcpy(®s, ctx, sizeof(regs)); + return 0; +} From patchwork Mon Mar 18 09:31:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiri Olsa X-Patchwork-Id: 13595116 X-Patchwork-Delegate: bpf@iogearbox.net Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 453A72C68D for ; Mon, 18 Mar 2024 09:32:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754333; cv=none; b=kKyxJQEyZIwSoFUMKUYW3ZSKwK1VIEktT32vYEzUWleOfH+WWT7MN/BWbjohe+o0g1EWkl8xNe8Ag8KMOuia2ZNZk2o/pouiLcR3L///J0mJ3O1euIrQLbV2Uqdr+/EwtIA3/IjmTWiQ+spzTvcW8DPW1r7rgpvZzjVOZj/LGBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710754333; c=relaxed/simple; bh=Znc57W0nRD+Iz1lbrmnDfpTHcuBN9RCnZRrXZq53IYw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pD9W2oW6obW2WJjKgvEqkMRZsRGRYTIj9Ib3kRRNsk/3vUIf24yXEWWbrbiJzp7v/v/Hw9gysvEuIMGMvlW6nbS17NNOPUg9jV5uRjb/KsUyB0sZ9+oPdjH/2rf/+tzYiJYYH4kT0+UNnfkqH6jnZ6mUI9poDIS87I9UY2E3dGM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VyzQG6x0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VyzQG6x0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F28CC433F1; Mon, 18 Mar 2024 09:32:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710754332; bh=Znc57W0nRD+Iz1lbrmnDfpTHcuBN9RCnZRrXZq53IYw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VyzQG6x0l5xo9y3sg4jkPckYRAGWYEtQzfWZxcKqOho3DK157BwFatamA1fo4W6HF B9sTSqcPXFWJEwcLczyU9k54TpATWNvjU6scX0Vui+v7VXg+hhd4Psumsg7MO1al0o SjMZ/2dHUmkG+XR5mfNZ7PbJlX6SFslNRb9urFbMR9zv9UVb0ND7UlJtJ019yWJBhz cBrbkwgu76Hv7sz6PJWSHYt0B5F2BDexwQkwWFWQGJudOUmNxMZO2z0kjWt5tH5Q5Z dL2ASOhQNCOeQwONQ3FkJl+US1E7Ra7G02ykaGfhwswNl3uhJVwk+IA6Qf1IU4xmPX JLugryTAZu/GA== From: Jiri Olsa To: Oleg Nesterov , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: bpf@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend , Peter Zijlstra , Thomas Gleixner , "Borislav Petkov (AMD)" , x86@kernel.org Subject: [PATCH RFC bpf-next 3/3] selftests/bpf: Mark uprobe trigger functions with nocf_check attribute Date: Mon, 18 Mar 2024 10:31:38 +0100 Message-ID: <20240318093139.293497-4-jolsa@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240318093139.293497-1-jolsa@kernel.org> References: <20240318093139.293497-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Some distros seem to enable the -fcf-protection=branch by default, which breaks our setup on first instruction of uprobe trigger functions and place there endbr64 instruction. Marking them with nocf_check attribute to skip that. Adding -Wno-attributes for bench objects, becase nocf_check can be used only when -fcf-protection=branch is enabled, otherwise we get a warning and break compilation. Signed-off-by: Jiri Olsa --- tools/include/linux/compiler.h | 4 ++++ tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/benchs/bench_trigger.c | 6 +++--- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h index 7b65566f3e42..14038ce04ca4 100644 --- a/tools/include/linux/compiler.h +++ b/tools/include/linux/compiler.h @@ -58,6 +58,10 @@ #define noinline #endif +#ifndef __nocfcheck +#define __nocfcheck __attribute__((nocf_check)) +#endif + /* Are two types/vars the same type (ignoring qualifiers)? */ #ifndef __same_type # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index e425a946276b..506d3d592093 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -726,7 +726,7 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ) # Benchmark runner $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ) $(call msg,CC,,$@) - $(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@ + $(Q)$(CC) $(CFLAGS) -O2 -Wno-attributes -c $(filter %.c,$^) $(LDLIBS) -o $@ $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \ diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c index ace0d1011a8e..3aecc3ef74e9 100644 --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c @@ -137,7 +137,7 @@ static void trigger_fmodret_setup(void) * GCC doesn't generate stack setup preample for these functions due to them * having no input arguments and doing nothing in the body. */ -__weak void uprobe_target_nop(void) +__nocfcheck __weak void uprobe_target_nop(void) { asm volatile ("nop"); } @@ -146,7 +146,7 @@ __weak void opaque_noop_func(void) { } -__weak int uprobe_target_push(void) +__nocfcheck __weak int uprobe_target_push(void) { /* overhead of function call is negligible compared to uprobe * triggering, so this shouldn't affect benchmark results much @@ -155,7 +155,7 @@ __weak int uprobe_target_push(void) return 1; } -__weak void uprobe_target_ret(void) +__nocfcheck __weak void uprobe_target_ret(void) { asm volatile (""); } From patchwork Tue Mar 19 10:25:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oleg Nesterov X-Patchwork-Id: 13596531 X-Patchwork-Delegate: bpf@iogearbox.net Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DD3E7D401 for ; Tue, 19 Mar 2024 10:27:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710844023; cv=none; b=AJCfE26xlD9y+s+cCe5ndiDkSxnSF+o3ZKcC98+NeR6BNB0Jf+W1ut5jnJOsYaaQDlE6sqxTRCA/4Ug9tWCi0q8/vAx+M14FWBGH37Kpj7aPZ9JnA6uhjv0TOKC2Y+b4zRDZvoIc+w2JBxjBSB+WyD8yDF3IwX0SB2logkO4NYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710844023; c=relaxed/simple; bh=JDBEgWxmv3lx9ZUJtA88SrYQaM6/7zfXZOtcpSZ+qjQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=iCmZC0VGy8zDR8OSD9PMFID7EnHpjVh/NtactKmaVyUfob6N1Jqaycc8cCt8L5A+hCGedAw5WWlyHxGSRefmgWZHjIwtzsHthj97at/BiTWkeu1ZhxKnqs425/6u73S1RUE8gJ71LQlUysdJ5ZO/8RSjG4YRdm2oJ/7S2yIKhBo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NNrCw8wq; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NNrCw8wq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710844020; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WHrOefhxxy4qfibpmG156d30i339Q4R83fPgqzMYdFw=; b=NNrCw8wqrw3VJlpEe5KvO4GUsiNRWzkLu8LefBkUQ6Qz0bGnpi1lY5ibd0yx44sAugaTin EifBBoAnzY5FasI084ZHl0b0Ochp6uHxgFPkuenEvMXAbruzIRK44HsQIn8bSQBqf1IIi8 hut0eGY3xzOtb3n4bXWopDp7DlKbb6I= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-557-cYTWm71CPw-NzIhIFZPUiA-1; Tue, 19 Mar 2024 06:26:54 -0400 X-MC-Unique: cYTWm71CPw-NzIhIFZPUiA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F413D29AB3E5; Tue, 19 Mar 2024 10:26:53 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.50]) by smtp.corp.redhat.com (Postfix) with SMTP id 1BA9417A90; Tue, 19 Mar 2024 10:26:50 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Tue, 19 Mar 2024 11:25:31 +0100 (CET) Date: Tue, 19 Mar 2024 11:25:24 +0100 From: Oleg Nesterov To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , bpf@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend , Peter Zijlstra , Thomas Gleixner , "Borislav Petkov (AMD)" , x86@kernel.org Subject: [PATCH RFC bpf-next 4/3] uprobe: ensure sys_uretprobe uses sysret Message-ID: <20240319102523.GC20287@redhat.com> References: <20240318093139.293497-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240318093139.293497-1-jolsa@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Obviously not for inclusion yet ;) untested, lacks the comments, and I am not sure it makes sense. But I am wondering if this change can speedup uretprobes a bit more. Any chance you can test it? With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret. See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64(). With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq, sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to make it possible. Comments? Oleg. diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 069371e86180..b99f1d80a8c8 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -319,6 +319,9 @@ asm ( "pushq %r11\n" "movq $462, %rax\n" "syscall\n" + "popq %r11\n" + "popq %rcx\n" + "retq\n" ".global uretprobe_syscall_end\n" "uretprobe_syscall_end:\n" ".popsection\n" @@ -336,23 +339,20 @@ void *arch_uprobe_trampoline(unsigned long *psize) SYSCALL_DEFINE0(uretprobe) { struct pt_regs *regs = task_pt_regs(current); - unsigned long sregs[3], err; + unsigned long __user *ax_and_ret = (unsigned long __user *)regs->sp + 2; + unsigned long ip, err; - /* - * We set rax and syscall itself changes rcx and r11, so the syscall - * trampoline saves their original values on stack. We need to read - * them and set original register values and fix the rsp pointer back. - */ - err = copy_from_user((void *) &sregs, (void *) regs->sp, sizeof(sregs)); - WARN_ON_ONCE(err); - - regs->r11 = sregs[0]; - regs->cx = sregs[1]; - regs->ax = sregs[2]; + ip = regs->ip; regs->orig_ax = -1; - regs->sp += sizeof(sregs); + err = get_user(regs->ax, ax_and_ret); + WARN_ON_ONCE(err); uprobe_handle_trampoline(regs); + + err = put_user(regs->ip, ax_and_ret); + WARN_ON_ONCE(err); + regs->ip = ip; + return regs->ax; }