Message ID | 20250117005539.325887-1-eyal.birger@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | seccomp: passthrough uretprobe systemcall without filtering | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 01/16, Eyal Birger wrote: > > Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") > Reported-by: Rafael Buchbinder <rafi@rbk.io> > Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ > Cc: stable@vger.kernel.org ... > @@ -1359,6 +1359,11 @@ int __secure_computing(const struct seccomp_data *sd) > this_syscall = sd ? sd->nr : > syscall_get_nr(current, current_pt_regs()); > > +#ifdef CONFIG_X86_64 > + if (unlikely(this_syscall == __NR_uretprobe) && !in_ia32_syscall()) > + return 0; > +#endif Acked-by: Oleg Nesterov <oleg@redhat.com> A note for the seccomp maintainers... I don't know what do you think, but I agree in advance that the very fact this patch adds "#ifdef CONFIG_X86_64" into __secure_computing() doesn't look nice. The problem is that we need a simple patch for -stable which fixes the real problem. We can cleanup this logic later, I think. Oleg.
On Fri, 17 Jan 2025 02:39:28 +0100 Oleg Nesterov <oleg@redhat.com> wrote: > On 01/16, Eyal Birger wrote: > > > > Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") > > Reported-by: Rafael Buchbinder <rafi@rbk.io> > > Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ > > Cc: stable@vger.kernel.org > ... > > @@ -1359,6 +1359,11 @@ int __secure_computing(const struct seccomp_data *sd) > > this_syscall = sd ? sd->nr : > > syscall_get_nr(current, current_pt_regs()); > > > > +#ifdef CONFIG_X86_64 > > + if (unlikely(this_syscall == __NR_uretprobe) && !in_ia32_syscall()) > > + return 0; > > +#endif > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > A note for the seccomp maintainers... > > I don't know what do you think, but I agree in advance that the very fact this > patch adds "#ifdef CONFIG_X86_64" into __secure_computing() doesn't look nice. > Indeed. in_ia32_syscall() depends arch/x86 too. We can add an inline function like; ``` uprobes.h static inline bool is_uprobe_syscall(int syscall) { // arch_is_uprobe_syscall check can be replaced by Kconfig, // something like CONFIG_ARCH_URETPROBE_SYSCALL. #ifdef arch_is_uprobe_syscall return arch_is_uprobe_syscall(syscall) #else return false; #endif } ``` and ``` arch/x86/include/asm/uprobes.h #define arch_is_uprobe_syscall(syscall) \ (IS_ENABLED(CONFIG_X86_64) && syscall == __NR_uretprobe && !in_ia32_syscall()) ``` > The problem is that we need a simple patch for -stable which fixes the real > problem. We can cleanup this logic later, I think. Hmm, at least we should make it is_uprobe_syscall() in uprobes.h so that do not pollute the seccomp subsystem with #ifdef. Thank you, > > Oleg. >
On Fri, Jan 17, 2025 at 12:02 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Fri, 17 Jan 2025 02:39:28 +0100 > Oleg Nesterov <oleg@redhat.com> wrote: > > > On 01/16, Eyal Birger wrote: > > > > > > Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") > > > Reported-by: Rafael Buchbinder <rafi@rbk.io> > > > Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ > > > Cc: stable@vger.kernel.org > > ... > > > @@ -1359,6 +1359,11 @@ int __secure_computing(const struct seccomp_data *sd) > > > this_syscall = sd ? sd->nr : > > > syscall_get_nr(current, current_pt_regs()); > > > > > > +#ifdef CONFIG_X86_64 > > > + if (unlikely(this_syscall == __NR_uretprobe) && !in_ia32_syscall()) > > > + return 0; > > > +#endif > > > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > > > > A note for the seccomp maintainers... > > > > I don't know what do you think, but I agree in advance that the very fact this > > patch adds "#ifdef CONFIG_X86_64" into __secure_computing() doesn't look nice. > > > > Indeed. in_ia32_syscall() depends arch/x86 too. > We can add an inline function like; > > ``` uprobes.h > static inline bool is_uprobe_syscall(int syscall) > { > // arch_is_uprobe_syscall check can be replaced by Kconfig, > // something like CONFIG_ARCH_URETPROBE_SYSCALL. > #ifdef arch_is_uprobe_syscall > return arch_is_uprobe_syscall(syscall) > #else > return false; > #endif > } > ``` > and > ``` arch/x86/include/asm/uprobes.h > #define arch_is_uprobe_syscall(syscall) \ > (IS_ENABLED(CONFIG_X86_64) && syscall == __NR_uretprobe && !in_ia32_syscall()) > ``` Notice it'll need to be enclosed in ifdef CONFIG_X86_64 as __NR_uretprobe isn't defined otherwise so the IS_ENABLED isn't needed. > > > The problem is that we need a simple patch for -stable which fixes the real > > problem. We can cleanup this logic later, I think. > > Hmm, at least we should make it is_uprobe_syscall() in uprobes.h so that > do not pollute the seccomp subsystem with #ifdef. I like this approach. Notice it does add a couple of includes that weren't there before: - arch/x86/include/asm/uprobes.h would include asm/unistd.h - seccomp.c would include linux/uprobes.h So it's a less "trivial" patch... If that's ok I can repost with these changes. Eyal.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 385d48293a5f..10a55c9b5c18 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1359,6 +1359,11 @@ int __secure_computing(const struct seccomp_data *sd) this_syscall = sd ? sd->nr : syscall_get_nr(current, current_pt_regs()); +#ifdef CONFIG_X86_64 + if (unlikely(this_syscall == __NR_uretprobe) && !in_ia32_syscall()) + return 0; +#endif + switch (mode) { case SECCOMP_MODE_STRICT: __secure_computing_strict(this_syscall); /* may call do_exit */
When attaching uretprobes to processes running inside docker, the attached process is segfaulted when encountering the retprobe. The reason is that now that uretprobe is a system call the default seccomp filters in docker block it as they only allow a specific set of known syscalls. This is true for other userspace applications which use seccomp to control their syscall surface. Since uretprobe is a "kernel implementation detail" system call which is not used by userspace application code directly, it is impractical and there's very little point in forcing all userspace applications to explicitly allow it in order to avoid crashing tracked processes. Pass this systemcall through seccomp without depending on configuration. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Rafael Buchbinder <rafi@rbk.io> Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Eyal Birger <eyal.birger@gmail.com> --- The following reproduction script synthetically demonstrates the problem: cat > /tmp/x.c << EOF char *syscalls[] = { "write", "exit_group", "fstat", }; __attribute__((noinline)) int probed(void) { printf("Probed\n"); return 1; } void apply_seccomp_filter(char **syscalls, int num_syscalls) { scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_KILL); for (int i = 0; i < num_syscalls; i++) { seccomp_rule_add(ctx, SCMP_ACT_ALLOW, seccomp_syscall_resolve_name(syscalls[i]), 0); } seccomp_load(ctx); seccomp_release(ctx); } int main(int argc, char *argv[]) { int num_syscalls = sizeof(syscalls) / sizeof(syscalls[0]); apply_seccomp_filter(syscalls, num_syscalls); probed(); return 0; } EOF cat > /tmp/trace.bt << EOF uretprobe:/tmp/x:probed { printf("ret=%d\n", retval); } EOF gcc -o /tmp/x /tmp/x.c -lseccomp /usr/bin/bpftrace /tmp/trace.bt & sleep 5 # wait for uretprobe attach /tmp/x pkill bpftrace rm /tmp/x /tmp/x.c /tmp/trace.bt --- kernel/seccomp.c | 5 +++++ 1 file changed, 5 insertions(+)