diff mbox series

RISC-V/bpf: Enable bpf_probe_read{, str}()

Message ID 20220703130924.57240-1-dlan@gentoo.org (mailing list archive)
State New, archived
Headers show
Series RISC-V/bpf: Enable bpf_probe_read{, str}() | expand

Commit Message

Yixun Lan July 3, 2022, 1:09 p.m. UTC
Enable this option to fix a bcc error in RISC-V platform

And, the error shows as follows:

~ # runqlen
WARNING: This target JIT is not designed for the host you are running. \
If bad things happen, please choose a different -march switch.
bpf: Failed to load program: Invalid argument
0: R1=ctx(off=0,imm=0) R10=fp0
0: (85) call bpf_get_current_task#35          ; R0_w=scalar()
1: (b7) r6 = 0                        ; R6_w=0
2: (7b) *(u64 *)(r10 -8) = r6         ; R6_w=P0 R10=fp0 fp-8_w=00000000
3: (07) r0 += 312                     ; R0_w=scalar()
4: (bf) r1 = r10                      ; R1_w=fp0 R10=fp0
5: (07) r1 += -8                      ; R1_w=fp-8
6: (b7) r2 = 8                        ; R2_w=8
7: (bf) r3 = r0                       ; R0_w=scalar(id=1) R3_w=scalar(id=1)
8: (85) call bpf_probe_read#4
unknown func bpf_probe_read#4
processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module>
    b.attach_perf_event(ev_type=PerfType.SOFTWARE,
  File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event
    fn = self.load_func(fn_name, BPF.PERF_EVENT)
  File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func
    raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'do_perf_event': Invalid argument

Signed-off-by: Yixun Lan <dlan@gentoo.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Comments

Conor Dooley July 3, 2022, 1:12 p.m. UTC | #1
On 03/07/2022 14:09, Yixun Lan wrote:
> Enable this option to fix a bcc error in RISC-V platform
> 
> And, the error shows as follows:
> 
> ~ # runqlen
> WARNING: This target JIT is not designed for the host you are running. \
> If bad things happen, please choose a different -march switch.
> bpf: Failed to load program: Invalid argument
> 0: R1=ctx(off=0,imm=0) R10=fp0
> 0: (85) call bpf_get_current_task#35          ; R0_w=scalar()
> 1: (b7) r6 = 0                        ; R6_w=0
> 2: (7b) *(u64 *)(r10 -8) = r6         ; R6_w=P0 R10=fp0 fp-8_w=00000000
> 3: (07) r0 += 312                     ; R0_w=scalar()
> 4: (bf) r1 = r10                      ; R1_w=fp0 R10=fp0
> 5: (07) r1 += -8                      ; R1_w=fp-8
> 6: (b7) r2 = 8                        ; R2_w=8
> 7: (bf) r3 = r0                       ; R0_w=scalar(id=1) R3_w=scalar(id=1)
> 8: (85) call bpf_probe_read#4
> unknown func bpf_probe_read#4
> processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> Traceback (most recent call last):
>   File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module>
>     b.attach_perf_event(ev_type=PerfType.SOFTWARE,
>   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event
>     fn = self.load_func(fn_name, BPF.PERF_EVENT)
>   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func
>     raise Exception("Failed to load BPF program %s: %s" %
> Exception: Failed to load BPF program b'do_perf_event': Invalid argument
> 
> Signed-off-by: Yixun Lan <dlan@gentoo.org>

Do you know what commit this fixes?
Thanks,
Conor.

> ---
>  arch/riscv/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 32ffef9f6e5b4..da0016f1be6ce 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -25,6 +25,7 @@ config RISCV
>  	select ARCH_HAS_GIGANTIC_PAGE
>  	select ARCH_HAS_KCOV
>  	select ARCH_HAS_MMIOWB
> +	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>  	select ARCH_HAS_PTE_SPECIAL
>  	select ARCH_HAS_SET_DIRECT_MAP if MMU
>  	select ARCH_HAS_SET_MEMORY if MMU
Yixun Lan July 4, 2022, 2:20 a.m. UTC | #2
Hi Conor Dooley:

On 14:12 Sun 03 Jul     , Conor Dooley wrote:
> On 03/07/2022 14:09, Yixun Lan wrote:
> > Enable this option to fix a bcc error in RISC-V platform
> > 
> > And, the error shows as follows:
> > 
> > ~ # runqlen
> > WARNING: This target JIT is not designed for the host you are running. \
> > If bad things happen, please choose a different -march switch.
> > bpf: Failed to load program: Invalid argument
> > 0: R1=ctx(off=0,imm=0) R10=fp0
> > 0: (85) call bpf_get_current_task#35          ; R0_w=scalar()
> > 1: (b7) r6 = 0                        ; R6_w=0
> > 2: (7b) *(u64 *)(r10 -8) = r6         ; R6_w=P0 R10=fp0 fp-8_w=00000000
> > 3: (07) r0 += 312                     ; R0_w=scalar()
> > 4: (bf) r1 = r10                      ; R1_w=fp0 R10=fp0
> > 5: (07) r1 += -8                      ; R1_w=fp-8
> > 6: (b7) r2 = 8                        ; R2_w=8
> > 7: (bf) r3 = r0                       ; R0_w=scalar(id=1) R3_w=scalar(id=1)
> > 8: (85) call bpf_probe_read#4
> > unknown func bpf_probe_read#4
> > processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > 
> > Traceback (most recent call last):
> >   File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module>
> >     b.attach_perf_event(ev_type=PerfType.SOFTWARE,
> >   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event
> >     fn = self.load_func(fn_name, BPF.PERF_EVENT)
> >   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func
> >     raise Exception("Failed to load BPF program %s: %s" %
> > Exception: Failed to load BPF program b'do_perf_event': Invalid argument
> > 
> > Signed-off-by: Yixun Lan <dlan@gentoo.org>
> 
> Do you know what commit this fixes?
> Thanks,
> Conor.
> 

I think this is effectively broken for RISC-V 64 at the commit:
 0ebeea8ca8a4: bpf: Restrict bpf_probe_read{, str}() only to archs where they work

However, the bcc tools haven't got BPF support for RISC-V at that time,
so no one noticed it

I can add a Fixes tag if you think it's a proper way

> > ---
> >  arch/riscv/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 32ffef9f6e5b4..da0016f1be6ce 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -25,6 +25,7 @@ config RISCV
> >  	select ARCH_HAS_GIGANTIC_PAGE
> >  	select ARCH_HAS_KCOV
> >  	select ARCH_HAS_MMIOWB
> > +	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> >  	select ARCH_HAS_PTE_SPECIAL
> >  	select ARCH_HAS_SET_DIRECT_MAP if MMU
> >  	select ARCH_HAS_SET_MEMORY if MMU
Christoph Hellwig July 4, 2022, 5:53 a.m. UTC | #3
On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote:
> Enable this option to fix a bcc error in RISC-V platform
> 
> And, the error shows as follows:

These should not be enabled on new platforms.  Use the proper helpers
to probe kernel vs user pointers instead.
Conor Dooley July 4, 2022, 6:29 a.m. UTC | #4
On 04/07/2022 03:20, Yixun Lan wrote:
> [You don't often get email from dlan@gentoo.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Conor Dooley:
> 
> On 14:12 Sun 03 Jul     , Conor Dooley wrote:
>> On 03/07/2022 14:09, Yixun Lan wrote:
>>> Enable this option to fix a bcc error in RISC-V platform
>>>
>>> And, the error shows as follows:
>>>
>>> ~ # runqlen
>>> WARNING: This target JIT is not designed for the host you are running. \
>>> If bad things happen, please choose a different -march switch.
>>> bpf: Failed to load program: Invalid argument
>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>> 0: (85) call bpf_get_current_task#35          ; R0_w=scalar()
>>> 1: (b7) r6 = 0                        ; R6_w=0
>>> 2: (7b) *(u64 *)(r10 -8) = r6         ; R6_w=P0 R10=fp0 fp-8_w=00000000
>>> 3: (07) r0 += 312                     ; R0_w=scalar()
>>> 4: (bf) r1 = r10                      ; R1_w=fp0 R10=fp0
>>> 5: (07) r1 += -8                      ; R1_w=fp-8
>>> 6: (b7) r2 = 8                        ; R2_w=8
>>> 7: (bf) r3 = r0                       ; R0_w=scalar(id=1) R3_w=scalar(id=1)
>>> 8: (85) call bpf_probe_read#4
>>> unknown func bpf_probe_read#4
>>> processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>>>
>>> Traceback (most recent call last):
>>>    File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module>
>>>      b.attach_perf_event(ev_type=PerfType.SOFTWARE,
>>>    File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event
>>>      fn = self.load_func(fn_name, BPF.PERF_EVENT)
>>>    File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func
>>>      raise Exception("Failed to load BPF program %s: %s" %
>>> Exception: Failed to load BPF program b'do_perf_event': Invalid argument
>>>
>>> Signed-off-by: Yixun Lan <dlan@gentoo.org>
>>
>> Do you know what commit this fixes?
>> Thanks,
>> Conor.
>>
> 
> I think this is effectively broken for RISC-V 64 at the commit:
>   0ebeea8ca8a4: bpf: Restrict bpf_probe_read{, str}() only to archs where they work
> 
> However, the bcc tools haven't got BPF support for RISC-V at that time,
> so no one noticed it
> 
> I can add a Fixes tag if you think it's a proper way

Hmm, I had a read of that commit and the breakage sounded intentional.
I ran a git log --grep "0ebeea8ca8a4" & it seems like a mixed bag of
fixes tags. Whether or not it deserves the explicit tag, mentioning
the commit would be useful. The tag would be:
Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")

Thanks,
Conor.

> 
>>> ---
>>>   arch/riscv/Kconfig | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 32ffef9f6e5b4..da0016f1be6ce 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -25,6 +25,7 @@ config RISCV
>>>      select ARCH_HAS_GIGANTIC_PAGE
>>>      select ARCH_HAS_KCOV
>>>      select ARCH_HAS_MMIOWB
>>> +   select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>>>      select ARCH_HAS_PTE_SPECIAL
>>>      select ARCH_HAS_SET_DIRECT_MAP if MMU
>>>      select ARCH_HAS_SET_MEMORY if MMU
> 
> --
> Yixun Lan (dlan)
> Gentoo Linux Developer
> GPG Key ID AABEFD55
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Andrii Nakryiko July 6, 2022, 5 a.m. UTC | #5
On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote:
> > Enable this option to fix a bcc error in RISC-V platform
> >
> > And, the error shows as follows:
>
> These should not be enabled on new platforms.  Use the proper helpers
> to probe kernel vs user pointers instead.

riscv existed as of [0], so I'd argue it is a proper bug fix, as
corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
have been added back then.

But I also agree that BCC tools should be updated to use proper
bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such
fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in
libbpf-tools seems to be using bpf_probe_read() as well and needs to
be fixed.

  [0] 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to
archs where they work")
Yonghong Song July 6, 2022, 6:41 a.m. UTC | #6
On 7/5/22 10:00 PM, Andrii Nakryiko wrote:
> On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote:
>>> Enable this option to fix a bcc error in RISC-V platform
>>>
>>> And, the error shows as follows:
>>
>> These should not be enabled on new platforms.  Use the proper helpers
>> to probe kernel vs user pointers instead.
> 
> riscv existed as of [0], so I'd argue it is a proper bug fix, as
> corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> have been added back then.
> 
> But I also agree that BCC tools should be updated to use proper
> bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such
> fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in
> libbpf-tools seems to be using bpf_probe_read() as well and needs to
> be fixed.

Yixun, the bcc change looks like below:

--- a/src/cc/frontends/clang/b_frontend_action.cc
+++ b/src/cc/frontends/clang/b_frontend_action.cc
@@ -132,7 +132,7 @@ static std::string 
check_bpf_probe_read_user(llvm::StringRef probe,

      /* For arch with overlapping address space, dont use 
bpf_probe_read for
       * user read. Just error out */
-#if defined(__s390x__)
+#if defined(__s390x__) || defined(__riscv_)
      overlap_addr = true;
      return "";
  #endif


Basically, prevent using bpf_probe_read() helper, so it will force user
to use bpf_probe_read_user() or bpf_probe_read_kernel(). and this should
make it work for old kernels.

> 
>    [0] 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to
> archs where they work")
Christoph Hellwig July 6, 2022, 7 a.m. UTC | #7
On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote:
> riscv existed as of [0], so I'd argue it is a proper bug fix, as
> corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> have been added back then.

How much of an eBPF ecosystem was there on RISC-V at the point?
Christoph Hellwig July 6, 2022, 7:01 a.m. UTC | #8
On Tue, Jul 05, 2022 at 11:41:30PM -0700, Yonghong Song wrote:
> 
> 
> On 7/5/22 10:00 PM, Andrii Nakryiko wrote:
> > On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > 
> > > On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote:
> > > > Enable this option to fix a bcc error in RISC-V platform
> > > > 
> > > > And, the error shows as follows:
> > > 
> > > These should not be enabled on new platforms.  Use the proper helpers
> > > to probe kernel vs user pointers instead.
> > 
> > riscv existed as of [0], so I'd argue it is a proper bug fix, as
> > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> > have been added back then.
> > 
> > But I also agree that BCC tools should be updated to use proper
> > bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such
> > fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in
> > libbpf-tools seems to be using bpf_probe_read() as well and needs to
> > be fixed.
> 
> Yixun, the bcc change looks like below:

No, this is broken.  bcc needs to stop using bpf_probe_read entirely
for user addresses and unconditionally use bpf_probe_read_user first
and only fall back to bpf_probe_read if not supported.
Andrii Nakryiko July 8, 2022, 10:22 p.m. UTC | #9
On Wed, Jul 6, 2022 at 12:00 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote:
> > riscv existed as of [0], so I'd argue it is a proper bug fix, as
> > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> > have been added back then.
>
> How much of an eBPF ecosystem was there on RISC-V at the point?

No idea, never used RISC-V and didn't pay much attention. But why does
it matter?
Yixun Lan July 9, 2022, 1:01 a.m. UTC | #10
Hi Christoph, YongHong

On 00:01 Wed 06 Jul     , Christoph Hellwig wrote:
> On Tue, Jul 05, 2022 at 11:41:30PM -0700, Yonghong Song wrote:
> > 
> > 
> > On 7/5/22 10:00 PM, Andrii Nakryiko wrote:
> > > On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > > 
> > > > On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote:
> > > > > Enable this option to fix a bcc error in RISC-V platform
> > > > > 
> > > > > And, the error shows as follows:
> > > > 
> > > > These should not be enabled on new platforms.  Use the proper helpers
> > > > to probe kernel vs user pointers instead.
> > > 
> > > riscv existed as of [0], so I'd argue it is a proper bug fix, as
> > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> > > have been added back then.
> > > 
> > > But I also agree that BCC tools should be updated to use proper
> > > bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such
> > > fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in
> > > libbpf-tools seems to be using bpf_probe_read() as well and needs to
> > > be fixed.
> > 
> > Yixun, the bcc change looks like below:
> 
> No, this is broken.  bcc needs to stop using bpf_probe_read entirely
> for user addresses and unconditionally use bpf_probe_read_user first
> and only fall back to bpf_probe_read if not supported.
I agree with Christoph, there is something in the bcc tools that
need to adjust in order to use new bpf_probe_read_{kernel,user}

Please check the ongoing discussion [0] in the bcc tools if you're
interested in, advice and comments are welcome

[0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738
Christoph Hellwig July 9, 2022, 6:24 a.m. UTC | #11
On Sat, Jul 09, 2022 at 09:01:10AM +0800, Yixun Lan wrote:
> Please check the ongoing discussion [0] in the bcc tools if you're
> interested in, advice and comments are welcome
> 
> [0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738

I can't find a way to post there, as replying eems to require a login.
Is there a mailing list discussion somewhere that is broadly accessible?
Christoph Hellwig July 9, 2022, 6:25 a.m. UTC | #12
On Fri, Jul 08, 2022 at 03:22:51PM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 6, 2022 at 12:00 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote:
> > > riscv existed as of [0], so I'd argue it is a proper bug fix, as
> > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should
> > > have been added back then.
> >
> > How much of an eBPF ecosystem was there on RISC-V at the point?
> 
> No idea, never used RISC-V and didn't pay much attention. But why does
> it matter?

It matters because we should not spread broken legacy interfaces.
Yixun Lan July 9, 2022, 8:48 a.m. UTC | #13
Hi Christoph:

On 23:24 Fri 08 Jul     , Christoph Hellwig wrote:
> On Sat, Jul 09, 2022 at 09:01:10AM +0800, Yixun Lan wrote:
> > Please check the ongoing discussion [0] in the bcc tools if you're
> > interested in, advice and comments are welcome
> > 
> > [0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738
> 
> I can't find a way to post there, as replying eems to require a login.
> Is there a mailing list discussion somewhere that is broadly accessible?

never mind, I think the logic is quite clear, we can do something in bcc:

1) adopt new _{kernel,user} interface whenever possible, this will
work fine for all arch with new kernel versions

2) for old kernel versions which lack the _{kernel,user} support,
fall back to old bpf_probe_read(), but take care of the Archs which
have overlaping address space - like s390, and just error out for it
Christoph Hellwig July 11, 2022, 3:58 a.m. UTC | #14
On Sat, Jul 09, 2022 at 04:48:05PM +0800, Yixun Lan wrote:
> never mind, I think the logic is quite clear, we can do something in bcc:
> 
> 1) adopt new _{kernel,user} interface whenever possible, this will
> work fine for all arch with new kernel versions
> 
> 2) for old kernel versions which lack the _{kernel,user} support,
> fall back to old bpf_probe_read(), but take care of the Archs which
> have overlaping address space - like s390, and just error out for it

Yes, that is the right thing to do.
diff mbox series

Patch

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 32ffef9f6e5b4..da0016f1be6ce 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -25,6 +25,7 @@  config RISCV
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
 	select ARCH_HAS_MMIOWB
+	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_SET_DIRECT_MAP if MMU
 	select ARCH_HAS_SET_MEMORY if MMU