Message ID | 20211210111918.4904-1-danieltimlee@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | samples: bpf: fix tracex2 due to empty sys_write count argument | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-next-PR | success | PR summary |
bpf/vmtest-bpf-next | success | VM_Test |
netdev/tree_selection | success | Not a local patch |
Daniel T. Lee wrote: > Currently from syscall entry, argument can't be fetched correctly as a > result of register cleanup. > > commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface") > > For example in upper commit, registers are cleaned prior to syscall. > To be more specific, sys_write syscall has count size as a third argument. > But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to > register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7) > > This commit fix this problem by modifying the trace event to ksys_write > instead of sys_write syscall entry. > > # Wrong example of 'write()' syscall argument fetching > # ./tracex2 > ... > pid 50909 cmd dd uid 0 > syscall write() stats > byte_size : count distribution > 1 -> 1 : 4968837 |************************************* | > > # Successful example of 'write()' syscall argument fetching > # (dd's write bytes at a time defaults to 512) > # ./tracex2 > ... > pid 3095 cmd dd uid 0 > syscall write() stats > byte_size : count distribution > ... > 256 -> 511 : 0 | | > 512 -> 1023 : 4968844 |************************************* | > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> > --- > samples/bpf/tracex2_kern.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c > index 5bc696bac27d..96dff3bea227 100644 > --- a/samples/bpf/tracex2_kern.c > +++ b/samples/bpf/tracex2_kern.c > @@ -78,7 +78,7 @@ struct { > __uint(max_entries, 1024); > } my_hist_map SEC(".maps"); > > -SEC("kprobe/" SYSCALL(sys_write)) > +SEC("kprobe/ksys_write") > int bpf_prog3(struct pt_regs *ctx) > { > long write_size = PT_REGS_PARM3(ctx); > -- > 2.32.0 > LGTM Acked-by: John Fastabend <john.fastabend@gmail.com>
On 12/10/21 3:19 AM, Daniel T. Lee wrote: > Currently from syscall entry, argument can't be fetched correctly as a > result of register cleanup. > > commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface") > > For example in upper commit, registers are cleaned prior to syscall. > To be more specific, sys_write syscall has count size as a third argument. > But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to > register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7) is this the real reason? Did you build 32-bit user space application? Note that the above commit is for compat syscalls. > > This commit fix this problem by modifying the trace event to ksys_write > instead of sys_write syscall entry. > > # Wrong example of 'write()' syscall argument fetching > # ./tracex2 > ... > pid 50909 cmd dd uid 0 > syscall write() stats > byte_size : count distribution > 1 -> 1 : 4968837 |************************************* | > > # Successful example of 'write()' syscall argument fetching > # (dd's write bytes at a time defaults to 512) > # ./tracex2 > ... > pid 3095 cmd dd uid 0 > syscall write() stats > byte_size : count distribution > ... > 256 -> 511 : 0 | | > 512 -> 1023 : 4968844 |************************************* | > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> > --- > samples/bpf/tracex2_kern.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c > index 5bc696bac27d..96dff3bea227 100644 > --- a/samples/bpf/tracex2_kern.c > +++ b/samples/bpf/tracex2_kern.c > @@ -78,7 +78,7 @@ struct { > __uint(max_entries, 1024); > } my_hist_map SEC(".maps"); > > -SEC("kprobe/" SYSCALL(sys_write)) > +SEC("kprobe/ksys_write") > int bpf_prog3(struct pt_regs *ctx) > { > long write_size = PT_REGS_PARM3(ctx); I think the real reason of the failure is due to SYSCALL_WRAPPER. Please take a look at test_probe_write_user_kern.c. The issue with ksys_write() is that it can easily be inlined. For example, the source code, ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) { ... } SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count) { return ksys_write(fd, buf, count); } In my 5.12 kernel, I have ffffffff81053b00 <__x64_sys_write>: ffffffff81053b00: 0f 1f 44 00 00 nopl (%rax,%rax) ffffffff81053b05: 41 57 pushq %r15 ffffffff81053b07: 41 56 pushq %r14 ffffffff81053b09: 41 55 pushq %r13 ffffffff81053b0b: 41 54 pushq %r12 ffffffff81053b0d: 53 pushq %rbx ffffffff81053b0e: 48 83 ec 10 subq $16, %rsp ffffffff81053b12: 65 48 8b 04 25 28 00 00 00 movq %gs:40, %rax ffffffff81053b1b: 48 89 44 24 08 movq %rax, 8(%rsp) ffffffff81053b20: 8b 47 70 movl 112(%rdi), %eax ffffffff81053b23: 4c 8b 7f 60 movq 96(%rdi), %r15 ffffffff81053b27: 4c 8b 67 68 movq 104(%rdi), %r12 ffffffff81053b2b: 89 c7 movl %eax, %edi ffffffff81053b2d: e8 6e a3 00 00 callq 0xffffffff8105dea0 <__fdget_pos> ... The ksys_write() is inlined.
diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c index 5bc696bac27d..96dff3bea227 100644 --- a/samples/bpf/tracex2_kern.c +++ b/samples/bpf/tracex2_kern.c @@ -78,7 +78,7 @@ struct { __uint(max_entries, 1024); } my_hist_map SEC(".maps"); -SEC("kprobe/" SYSCALL(sys_write)) +SEC("kprobe/ksys_write") int bpf_prog3(struct pt_regs *ctx) { long write_size = PT_REGS_PARM3(ctx);
Currently from syscall entry, argument can't be fetched correctly as a result of register cleanup. commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface") For example in upper commit, registers are cleaned prior to syscall. To be more specific, sys_write syscall has count size as a third argument. But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7) This commit fix this problem by modifying the trace event to ksys_write instead of sys_write syscall entry. # Wrong example of 'write()' syscall argument fetching # ./tracex2 ... pid 50909 cmd dd uid 0 syscall write() stats byte_size : count distribution 1 -> 1 : 4968837 |************************************* | # Successful example of 'write()' syscall argument fetching # (dd's write bytes at a time defaults to 512) # ./tracex2 ... pid 3095 cmd dd uid 0 syscall write() stats byte_size : count distribution ... 256 -> 511 : 0 | | 512 -> 1023 : 4968844 |************************************* | Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> --- samples/bpf/tracex2_kern.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)