Message ID | 20240411131127.73098-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: Add a generic bits iterator | expand |
On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > added for the new bpf_iter_bits functionality. These kfuncs enable the > iteration of the bits from a given address and a given number of bits. > > - bpf_iter_bits_new > Initialize a new bits iterator for a given memory area. Due to the > limitation of bpf memalloc, the max number of bits to be iterated > over is (4096 * 8). > - bpf_iter_bits_next > Get the next bit in a bpf_iter_bits > - bpf_iter_bits_destroy > Destroy a bpf_iter_bits > > The bits iterator can be used in any context and on any address. > > Changes: > - v5->v6: > - Add positive tests (Andrii) > - v4->v5: > - Simplify test cases (Andrii) > - v3->v4: > - Fix endianness error on s390x (Andrii) > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > - v2->v3: > - Optimization for u64/u32 mask (Andrii) > - v1->v2: > - Simplify the CPU number verification code to avoid the failure on s390x > (Eduard) > - bpf: Add bpf_iter_cpumask > https://lwn.net/Articles/961104/ > - bpf: Add new bpf helper bpf_for_each_cpu > https://lwn.net/Articles/939939/ > > Yafang Shao (2): > bpf: Add bits iterator > selftests/bpf: Add selftest for bits iter > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > .../selftests/bpf/prog_tests/verifier.c | 2 + > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > 3 files changed, 249 insertions(+) > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > -- > 2.39.1 > It appears that the test case failed on s390x when the data is a u32 value because we need to set the higher 32 bits. will analyze it. -- Regards Yafang
On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > iteration of the bits from a given address and a given number of bits. > > > > - bpf_iter_bits_new > > Initialize a new bits iterator for a given memory area. Due to the > > limitation of bpf memalloc, the max number of bits to be iterated > > over is (4096 * 8). > > - bpf_iter_bits_next > > Get the next bit in a bpf_iter_bits > > - bpf_iter_bits_destroy > > Destroy a bpf_iter_bits > > > > The bits iterator can be used in any context and on any address. > > > > Changes: > > - v5->v6: > > - Add positive tests (Andrii) > > - v4->v5: > > - Simplify test cases (Andrii) > > - v3->v4: > > - Fix endianness error on s390x (Andrii) > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > - v2->v3: > > - Optimization for u64/u32 mask (Andrii) > > - v1->v2: > > - Simplify the CPU number verification code to avoid the failure on s390x > > (Eduard) > > - bpf: Add bpf_iter_cpumask > > https://lwn.net/Articles/961104/ > > - bpf: Add new bpf helper bpf_for_each_cpu > > https://lwn.net/Articles/939939/ > > > > Yafang Shao (2): > > bpf: Add bits iterator > > selftests/bpf: Add selftest for bits iter > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > 3 files changed, 249 insertions(+) > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > -- > > 2.39.1 > > > > It appears that the test case failed on s390x when the data is > a u32 value because we need to set the higher 32 bits. > will analyze it. > Hey Yafang, did you get a chance to debug and fix the issue? > > -- > Regards > Yafang
On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > iteration of the bits from a given address and a given number of bits. > > > > > > - bpf_iter_bits_new > > > Initialize a new bits iterator for a given memory area. Due to the > > > limitation of bpf memalloc, the max number of bits to be iterated > > > over is (4096 * 8). > > > - bpf_iter_bits_next > > > Get the next bit in a bpf_iter_bits > > > - bpf_iter_bits_destroy > > > Destroy a bpf_iter_bits > > > > > > The bits iterator can be used in any context and on any address. > > > > > > Changes: > > > - v5->v6: > > > - Add positive tests (Andrii) > > > - v4->v5: > > > - Simplify test cases (Andrii) > > > - v3->v4: > > > - Fix endianness error on s390x (Andrii) > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > - v2->v3: > > > - Optimization for u64/u32 mask (Andrii) > > > - v1->v2: > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > (Eduard) > > > - bpf: Add bpf_iter_cpumask > > > https://lwn.net/Articles/961104/ > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > https://lwn.net/Articles/939939/ > > > > > > Yafang Shao (2): > > > bpf: Add bits iterator > > > selftests/bpf: Add selftest for bits iter > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > 3 files changed, 249 insertions(+) > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > -- > > > 2.39.1 > > > > > > > It appears that the test case failed on s390x when the data is > > a u32 value because we need to set the higher 32 bits. > > will analyze it. > > > > Hey Yafang, did you get a chance to debug and fix the issue? > Hi Andrii, Apologies for the delay; I recently returned from an extended holiday. The issue stems from the limitations of bpf_probe_read_kernel() on s390 architecture. The attachment provides a straightforward example to illustrate this issue. The observed results are as follows: Error: #463/1 verifier_probe_read/probe read 4 bytes 8897 run_subtest:PASS:obj_open_mem 0 nsec 8898 run_subtest:PASS:unexpected_load_failure 0 nsec 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 Error: #463/2 verifier_probe_read/probe read 8 bytes 8903 run_subtest:PASS:obj_open_mem 0 nsec 8904 run_subtest:PASS:unexpected_load_failure 0 nsec 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 Should we consider this behavior of bpf_probe_read_kernel() as expected, or is it something that requires fixing?
On 4/24/24 10:36 PM, Yafang Shao wrote: > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: >>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: >>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been >>>> added for the new bpf_iter_bits functionality. These kfuncs enable the >>>> iteration of the bits from a given address and a given number of bits. >>>> >>>> - bpf_iter_bits_new >>>> Initialize a new bits iterator for a given memory area. Due to the >>>> limitation of bpf memalloc, the max number of bits to be iterated >>>> over is (4096 * 8). >>>> - bpf_iter_bits_next >>>> Get the next bit in a bpf_iter_bits >>>> - bpf_iter_bits_destroy >>>> Destroy a bpf_iter_bits >>>> >>>> The bits iterator can be used in any context and on any address. >>>> >>>> Changes: >>>> - v5->v6: >>>> - Add positive tests (Andrii) >>>> - v4->v5: >>>> - Simplify test cases (Andrii) >>>> - v3->v4: >>>> - Fix endianness error on s390x (Andrii) >>>> - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) >>>> - v2->v3: >>>> - Optimization for u64/u32 mask (Andrii) >>>> - v1->v2: >>>> - Simplify the CPU number verification code to avoid the failure on s390x >>>> (Eduard) >>>> - bpf: Add bpf_iter_cpumask >>>> https://lwn.net/Articles/961104/ >>>> - bpf: Add new bpf helper bpf_for_each_cpu >>>> https://lwn.net/Articles/939939/ >>>> >>>> Yafang Shao (2): >>>> bpf: Add bits iterator >>>> selftests/bpf: Add selftest for bits iter >>>> >>>> kernel/bpf/helpers.c | 120 +++++++++++++++++ >>>> .../selftests/bpf/prog_tests/verifier.c | 2 + >>>> .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ >>>> 3 files changed, 249 insertions(+) >>>> create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c >>>> >>>> -- >>>> 2.39.1 >>>> >>> It appears that the test case failed on s390x when the data is >>> a u32 value because we need to set the higher 32 bits. >>> will analyze it. >>> >> Hey Yafang, did you get a chance to debug and fix the issue? >> > Hi Andrii, > > Apologies for the delay; I recently returned from an extended holiday. > > The issue stems from the limitations of bpf_probe_read_kernel() on > s390 architecture. The attachment provides a straightforward example > to illustrate this issue. The observed results are as follows: > > Error: #463/1 verifier_probe_read/probe read 4 bytes > 8897 run_subtest:PASS:obj_open_mem 0 nsec > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > Error: #463/2 verifier_probe_read/probe read 8 bytes > 8903 run_subtest:PASS:obj_open_mem 0 nsec > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > Should we consider this behavior of bpf_probe_read_kernel() as > expected, or is it something that requires fixing? Maybe to guard the result with macros like __s390x__ to differentiate s390 vs. arm64/x86_64? There are some examples in prog_tests/* having such guards. probe_user.c:#if defined(__s390x__) test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__) xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if defined(__s390x__)
On Thu, Apr 25, 2024 at 2:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 4/24/24 10:36 PM, Yafang Shao wrote: > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > >>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > >>>> added for the new bpf_iter_bits functionality. These kfuncs enable the > >>>> iteration of the bits from a given address and a given number of bits. > >>>> > >>>> - bpf_iter_bits_new > >>>> Initialize a new bits iterator for a given memory area. Due to the > >>>> limitation of bpf memalloc, the max number of bits to be iterated > >>>> over is (4096 * 8). > >>>> - bpf_iter_bits_next > >>>> Get the next bit in a bpf_iter_bits > >>>> - bpf_iter_bits_destroy > >>>> Destroy a bpf_iter_bits > >>>> > >>>> The bits iterator can be used in any context and on any address. > >>>> > >>>> Changes: > >>>> - v5->v6: > >>>> - Add positive tests (Andrii) > >>>> - v4->v5: > >>>> - Simplify test cases (Andrii) > >>>> - v3->v4: > >>>> - Fix endianness error on s390x (Andrii) > >>>> - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > >>>> - v2->v3: > >>>> - Optimization for u64/u32 mask (Andrii) > >>>> - v1->v2: > >>>> - Simplify the CPU number verification code to avoid the failure on s390x > >>>> (Eduard) > >>>> - bpf: Add bpf_iter_cpumask > >>>> https://lwn.net/Articles/961104/ > >>>> - bpf: Add new bpf helper bpf_for_each_cpu > >>>> https://lwn.net/Articles/939939/ > >>>> > >>>> Yafang Shao (2): > >>>> bpf: Add bits iterator > >>>> selftests/bpf: Add selftest for bits iter > >>>> > >>>> kernel/bpf/helpers.c | 120 +++++++++++++++++ > >>>> .../selftests/bpf/prog_tests/verifier.c | 2 + > >>>> .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > >>>> 3 files changed, 249 insertions(+) > >>>> create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > >>>> > >>>> -- > >>>> 2.39.1 > >>>> > >>> It appears that the test case failed on s390x when the data is > >>> a u32 value because we need to set the higher 32 bits. > >>> will analyze it. > >>> > >> Hey Yafang, did you get a chance to debug and fix the issue? > >> > > Hi Andrii, > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > s390 architecture. The attachment provides a straightforward example > > to illustrate this issue. The observed results are as follows: > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > expected, or is it something that requires fixing? > > Maybe to guard the result with macros like __s390x__ to differentiate > s390 vs. arm64/x86_64? There are some examples in prog_tests/* having > such guards. probe_user.c:#if defined(__s390x__) > test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__) > xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if > defined(__s390x__) > That's feasible. Thank you for your suggestion. I'll make the necessary changes.
On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > - bpf_iter_bits_new > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > over is (4096 * 8). > > > > - bpf_iter_bits_next > > > > Get the next bit in a bpf_iter_bits > > > > - bpf_iter_bits_destroy > > > > Destroy a bpf_iter_bits > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > Changes: > > > > - v5->v6: > > > > - Add positive tests (Andrii) > > > > - v4->v5: > > > > - Simplify test cases (Andrii) > > > > - v3->v4: > > > > - Fix endianness error on s390x (Andrii) > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > - v2->v3: > > > > - Optimization for u64/u32 mask (Andrii) > > > > - v1->v2: > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > (Eduard) > > > > - bpf: Add bpf_iter_cpumask > > > > https://lwn.net/Articles/961104/ > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > https://lwn.net/Articles/939939/ > > > > > > > > Yafang Shao (2): > > > > bpf: Add bits iterator > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > 3 files changed, 249 insertions(+) > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > -- > > > > 2.39.1 > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > a u32 value because we need to set the higher 32 bits. > > > will analyze it. > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > Hi Andrii, > > Apologies for the delay; I recently returned from an extended holiday. > > The issue stems from the limitations of bpf_probe_read_kernel() on > s390 architecture. The attachment provides a straightforward example > to illustrate this issue. The observed results are as follows: > > Error: #463/1 verifier_probe_read/probe read 4 bytes > 8897 run_subtest:PASS:obj_open_mem 0 nsec > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > Error: #463/2 verifier_probe_read/probe read 8 bytes > 8903 run_subtest:PASS:obj_open_mem 0 nsec > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > Should we consider this behavior of bpf_probe_read_kernel() as > expected, or is it something that requires fixing? > I might be missing something, but there is nothing wrong with bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" you are reading (upper) 4 bytes of garbage from uninitialized data_dst. So getting back to iter implementation. Make sure you are zero-initializing that u64 value you are reading into? > -- > Regards > Yafang
On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > - bpf_iter_bits_new > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > over is (4096 * 8). > > > > > - bpf_iter_bits_next > > > > > Get the next bit in a bpf_iter_bits > > > > > - bpf_iter_bits_destroy > > > > > Destroy a bpf_iter_bits > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > Changes: > > > > > - v5->v6: > > > > > - Add positive tests (Andrii) > > > > > - v4->v5: > > > > > - Simplify test cases (Andrii) > > > > > - v3->v4: > > > > > - Fix endianness error on s390x (Andrii) > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > - v2->v3: > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > - v1->v2: > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > (Eduard) > > > > > - bpf: Add bpf_iter_cpumask > > > > > https://lwn.net/Articles/961104/ > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > Yafang Shao (2): > > > > > bpf: Add bits iterator > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > 3 files changed, 249 insertions(+) > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > -- > > > > > 2.39.1 > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > a u32 value because we need to set the higher 32 bits. > > > > will analyze it. > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > Hi Andrii, > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > s390 architecture. The attachment provides a straightforward example > > to illustrate this issue. The observed results are as follows: > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > expected, or is it something that requires fixing? > > > > I might be missing something, but there is nothing wrong with > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > you are reading (upper) 4 bytes of garbage from uninitialized > data_dst. The issue doesn't lie with the dst but rather with the src. Even after initializing the destination, the operation still fails. You can find more details in the following link: https://github.com/kernel-patches/bpf/pull/6882. It appears that bpf_probe_read_kernel() encounters difficulties when dealing with non-long-aligned source addresses. > > So getting back to iter implementation. Make sure you are > zero-initializing that u64 value you are reading into? > It has been zero-initialized: + kit->nr_bits = 0; + kit->bits_copy = 0;
On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > over is (4096 * 8). > > > > > > - bpf_iter_bits_next > > > > > > Get the next bit in a bpf_iter_bits > > > > > > - bpf_iter_bits_destroy > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > Changes: > > > > > > - v5->v6: > > > > > > - Add positive tests (Andrii) > > > > > > - v4->v5: > > > > > > - Simplify test cases (Andrii) > > > > > > - v3->v4: > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > - v2->v3: > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > - v1->v2: > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > (Eduard) > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > https://lwn.net/Articles/961104/ > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > Yafang Shao (2): > > > > > > bpf: Add bits iterator > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > 3 files changed, 249 insertions(+) > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > -- > > > > > > 2.39.1 > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > a u32 value because we need to set the higher 32 bits. > > > > > will analyze it. > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > Hi Andrii, > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > s390 architecture. The attachment provides a straightforward example > > > to illustrate this issue. The observed results are as follows: > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > expected, or is it something that requires fixing? > > > > > > > I might be missing something, but there is nothing wrong with > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > you are reading (upper) 4 bytes of garbage from uninitialized > > data_dst. > > The issue doesn't lie with the dst but rather with the src. Even after > initializing the destination, the operation still fails. You can find Are you sure the operation "fails"? If it would fail, you'd get a negative error code, but you are getting zero. Which actually makes sense. I think you are just getting confused by big endianness of s390x, and there is nothing wrong with bpf_probe_read_kernel(). In both of your tests (I pasted your code below, it would be better if you did it in your initial emails) you end up with 0x200 in *upper* 32 bits (on big endian) and lower bits are zeros. And __retval thing is 32-bit (despite BPF program returning long), so this return value is truncated to *lower* 32-bits, which are, expectedly, zeroes. So I think everything works as expected, but your tests (at least) don't handle the big-endian arch well. __description("probe read 4 bytes") __success __retval(0x200) long probe_read_4(void) { int data = 0x200; long data_dst = 0; int err; err = bpf_probe_read_kernel(&data_dst, 4, &data); if (err) return err; return data_dst; } SEC("syscall") __description("probe read 8 bytes") __success __retval(0x200) long probe_read_8(void) { int data = 0x200; long data_dst = 0; int err; err = bpf_probe_read_kernel(&data_dst, 8, &data); if (err) return err; return data_dst; } > more details in the following link: > https://github.com/kernel-patches/bpf/pull/6882. It appears that > bpf_probe_read_kernel() encounters difficulties when dealing with > non-long-aligned source addresses. > > > > > So getting back to iter implementation. Make sure you are > > zero-initializing that u64 value you are reading into? > > > > It has been zero-initialized: > > + kit->nr_bits = 0; > + kit->bits_copy = 0; > ok, then the problem is somewhere else, but it doesn't seem to be in bpf_probe_read_kernel(). I'm forgetting what was the original test failure for your patch set, but please double check again, taking into account the big endianness of s390x. > -- > Regards > Yafang
On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > over is (4096 * 8). > > > > > > > - bpf_iter_bits_next > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > - bpf_iter_bits_destroy > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > Changes: > > > > > > > - v5->v6: > > > > > > > - Add positive tests (Andrii) > > > > > > > - v4->v5: > > > > > > > - Simplify test cases (Andrii) > > > > > > > - v3->v4: > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > - v2->v3: > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > - v1->v2: > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > (Eduard) > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > bpf: Add bits iterator > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > -- > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > will analyze it. > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > Hi Andrii, > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > s390 architecture. The attachment provides a straightforward example > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > expected, or is it something that requires fixing? > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > data_dst. > > > > The issue doesn't lie with the dst but rather with the src. Even after > > initializing the destination, the operation still fails. You can find > > Are you sure the operation "fails"? If it would fail, you'd get a > negative error code, but you are getting zero. Which actually makes > sense. > > I think you are just getting confused by big endianness of s390x, and > there is nothing wrong with bpf_probe_read_kernel(). > > In both of your tests (I pasted your code below, it would be better if > you did it in your initial emails) you end up with 0x200 in *upper* 32 > bits (on big endian) and lower bits are zeros. And __retval thing is > 32-bit (despite BPF program returning long), so this return value is > truncated to *lower* 32-bits, which are, expectedly, zeroes. Thank you for clarifying. The presence of the 32-bit __retval led to my misunderstanding :( > > So I think everything works as expected, but your tests (at least) > don't handle the big-endian arch well. The issue arises when the dst and src have different sizes, causing bpf_probe_read_kernel_common() to handle them poorly on big-endian machines. To address this, we need to calculate the offset for copying, as demonstrated by the following bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, unsafe_ptr__ign); One might wonder why this calculation is not incorporated directly into the implementation of bpf_probe_read_kernel_common() ? > > __description("probe read 4 bytes") > __success __retval(0x200) > long probe_read_4(void) > { > int data = 0x200; > long data_dst = 0; > int err; > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > if (err) > return err; > > return data_dst; > } > > SEC("syscall") > __description("probe read 8 bytes") > __success __retval(0x200) > long probe_read_8(void) > { > int data = 0x200; > long data_dst = 0; > int err; > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > if (err) > return err; > > return data_dst; > > } > > > more details in the following link: > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > bpf_probe_read_kernel() encounters difficulties when dealing with > > non-long-aligned source addresses. > > > > > > > > So getting back to iter implementation. Make sure you are > > > zero-initializing that u64 value you are reading into? > > > > > > > It has been zero-initialized: > > > > + kit->nr_bits = 0; > > + kit->bits_copy = 0; > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > bpf_probe_read_kernel(). I'm forgetting what was the original test > failure for your patch set, but please double check again, taking into > account the big endianness of s390x. > If we aim to make it compatible with s390, we need to introduce some constraints regarding the bits iteration. 1. We must replace nr_bits with size: bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 size) 2. The size must adhere to alignment requirements: if (size <= sizeof(u64)) { int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; switch (size) { case 1: case 2: case 4: case 8: break; default: return -EINVAL; } err = bpf_probe_read_kernel_common(((char *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); if (err) return -EFAULT; kit->size = size; kit->bit = -1; return 0; } /* Not long-aligned */ if (size & (sizeof(unsigned long) - 1)) return -EINVAL; .... Does this meet your expectations? -- Regards Yafang
On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > over is (4096 * 8). > > > > > > > > - bpf_iter_bits_next > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > Changes: > > > > > > > > - v5->v6: > > > > > > > > - Add positive tests (Andrii) > > > > > > > > - v4->v5: > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > - v3->v4: > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > - v2->v3: > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > - v1->v2: > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > (Eduard) > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > bpf: Add bits iterator > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > -- > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > s390 architecture. The attachment provides a straightforward example > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > data_dst. > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > initializing the destination, the operation still fails. You can find > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > negative error code, but you are getting zero. Which actually makes > > sense. > > > > I think you are just getting confused by big endianness of s390x, and > > there is nothing wrong with bpf_probe_read_kernel(). > > > > In both of your tests (I pasted your code below, it would be better if > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > bits (on big endian) and lower bits are zeros. And __retval thing is > > 32-bit (despite BPF program returning long), so this return value is > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > Thank you for clarifying. The presence of the 32-bit __retval led to > my misunderstanding :( > > > > > So I think everything works as expected, but your tests (at least) > > don't handle the big-endian arch well. > > The issue arises when the dst and src have different sizes, causing > bpf_probe_read_kernel_common() to handle them poorly on big-endian > machines. To address this, we need to calculate the offset for > copying, as demonstrated by the following > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > unsafe_ptr__ign); > > One might wonder why this calculation is not incorporated directly > into the implementation of bpf_probe_read_kernel_common() ? Let's stop talking about bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything wrong or not handling anything wrong. It's not, it's correct. Your code is *using* it wrong, and that's what we'll have to fix. The contract of bpf_probe_read_kernel is in terms of memory addresses of source/destination *bytes* and the *common* size of both source and destination. bpf_probe_read_kernel() cannot know that you are passing a pointer to int or long or whatever, it's just a void * pointer. So if you are writing bytes over long, you need to take care of adjusting offsets to accommodate big-endian. It's a distraction to talk about bpf_probe_read_kernel() here (but it's useful to understand its contract). > > > > > __description("probe read 4 bytes") > > __success __retval(0x200) > > long probe_read_4(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > if (err) > > return err; > > > > return data_dst; > > } > > > > SEC("syscall") > > __description("probe read 8 bytes") > > __success __retval(0x200) > > long probe_read_8(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > if (err) > > return err; > > > > return data_dst; > > > > } > > > > > more details in the following link: > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > non-long-aligned source addresses. > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > It has been zero-initialized: > > > > > > + kit->nr_bits = 0; > > > + kit->bits_copy = 0; > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > failure for your patch set, but please double check again, taking into > > account the big endianness of s390x. > > > > If we aim to make it compatible with s390, we need to introduce some > constraints regarding the bits iteration. > > 1. We must replace nr_bits with size: > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > *unsafe_ptr__ign, u32 size) > > 2. The size must adhere to alignment requirements: > > if (size <= sizeof(u64)) { > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > switch (size) { > case 1: > case 2: > case 4: > case 8: > break; > default: > return -EINVAL; > } > > err = bpf_probe_read_kernel_common(((char > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > if (err) > return -EFAULT; > > kit->size = size; > kit->bit = -1; > return 0; > } > > /* Not long-aligned */ > if (size & (sizeof(unsigned long) - 1)) > return -EINVAL; > > .... > > Does this meet your expectations? > I don't think you need to add any restrictions or change anything to be byte-sized. You just need to calculate byte offset and do a bit shift to place N bits from the mask into upper N bits of long on big-endian. You might need to do some masking even for little endian as well. Which is why I suggested investing in simple but thorough tests. Write down a few bit mask patterns of variable length (not just multiple-of-8 sizes) and think about the sequence of bits that should be returned. And then codify that. Then check that both little- and big-endian arches work correctly. This is all a bit tricky because kernel's find_next_bit() makes the assumption that bit masks are long-based, while this bits iterator protocol is based on bit sizes, which are not necessarily multiples of 8 bits. So after you copy memory as bytes, you might need to clear (and shift, for big endian) some bits. Use simple but non-symmetrical bit masks to test everything. > -- > Regards > > > > Yafang
On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > > over is (4096 * 8). > > > > > > > > > - bpf_iter_bits_next > > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > > > Changes: > > > > > > > > > - v5->v6: > > > > > > > > > - Add positive tests (Andrii) > > > > > > > > > - v4->v5: > > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > > - v3->v4: > > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > > - v2->v3: > > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > > - v1->v2: > > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > > (Eduard) > > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > > bpf: Add bits iterator > > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > > > -- > > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > > s390 architecture. The attachment provides a straightforward example > > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > > data_dst. > > > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > > initializing the destination, the operation still fails. You can find > > > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > > negative error code, but you are getting zero. Which actually makes > > > sense. > > > > > > I think you are just getting confused by big endianness of s390x, and > > > there is nothing wrong with bpf_probe_read_kernel(). > > > > > > In both of your tests (I pasted your code below, it would be better if > > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > > bits (on big endian) and lower bits are zeros. And __retval thing is > > > 32-bit (despite BPF program returning long), so this return value is > > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > > > Thank you for clarifying. The presence of the 32-bit __retval led to > > my misunderstanding :( > > > > > > > > So I think everything works as expected, but your tests (at least) > > > don't handle the big-endian arch well. > > > > The issue arises when the dst and src have different sizes, causing > > bpf_probe_read_kernel_common() to handle them poorly on big-endian > > machines. To address this, we need to calculate the offset for > > copying, as demonstrated by the following > > > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > > unsafe_ptr__ign); > > > > One might wonder why this calculation is not incorporated directly > > into the implementation of bpf_probe_read_kernel_common() ? > > Let's stop talking about > bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything > wrong or not handling anything wrong. It's not, it's correct. Your > code is *using* it wrong, and that's what we'll have to fix. The > contract of bpf_probe_read_kernel is in terms of memory addresses of > source/destination *bytes* and the *common* size of both source and > destination. bpf_probe_read_kernel() cannot know that you are passing > a pointer to int or long or whatever, it's just a void * pointer. So > if you are writing bytes over long, you need to take care of adjusting > offsets to accommodate big-endian. > > It's a distraction to talk about bpf_probe_read_kernel() here (but > it's useful to understand its contract). > > > > > > > > > __description("probe read 4 bytes") > > > __success __retval(0x200) > > > long probe_read_4(void) > > > { > > > int data = 0x200; > > > long data_dst = 0; > > > int err; > > > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > > if (err) > > > return err; > > > > > > return data_dst; > > > } > > > > > > SEC("syscall") > > > __description("probe read 8 bytes") > > > __success __retval(0x200) > > > long probe_read_8(void) > > > { > > > int data = 0x200; > > > long data_dst = 0; > > > int err; > > > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > > if (err) > > > return err; > > > > > > return data_dst; > > > > > > } > > > > > > > more details in the following link: > > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > > non-long-aligned source addresses. > > > > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > > > > It has been zero-initialized: > > > > > > > > + kit->nr_bits = 0; > > > > + kit->bits_copy = 0; > > > > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > > failure for your patch set, but please double check again, taking into > > > account the big endianness of s390x. > > > > > > > If we aim to make it compatible with s390, we need to introduce some > > constraints regarding the bits iteration. > > > > 1. We must replace nr_bits with size: > > > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > > *unsafe_ptr__ign, u32 size) > > > > 2. The size must adhere to alignment requirements: > > > > if (size <= sizeof(u64)) { > > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > > > switch (size) { > > case 1: > > case 2: > > case 4: > > case 8: > > break; > > default: > > return -EINVAL; > > } > > > > err = bpf_probe_read_kernel_common(((char > > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > > if (err) > > return -EFAULT; > > > > kit->size = size; > > kit->bit = -1; > > return 0; > > } > > > > /* Not long-aligned */ > > if (size & (sizeof(unsigned long) - 1)) > > return -EINVAL; > > > > .... > > > > Does this meet your expectations? > > > > I don't think you need to add any restrictions or change anything to > be byte-sized. You just need to calculate byte offset and do a bit > shift to place N bits from the mask into upper N bits of long on > big-endian. You might need to do some masking even for little endian > as well. > > Which is why I suggested investing in simple but thorough tests. Write > down a few bit mask patterns of variable length (not just > multiple-of-8 sizes) and think about the sequence of bits that should > be returned. And then codify that. > > Then check that both little- and big-endian arches work correctly. > > This is all a bit tricky because kernel's find_next_bit() makes the > assumption that bit masks are long-based, while this bits iterator > protocol is based on bit sizes, which are not necessarily multiples of > 8 bits. So after you copy memory as bytes, you might need to clear > (and shift, for big endian) some bits. Use simple but non-symmetrical > bit masks to test everything. > The reason for using 'size' instead of 'nr_bits' lies in the nature of the pointer 'unsafe_ptr__ign', which is of type void *. Due to this ambiguity in the type, the 'size' parameter serves to indicate the size of the data being accessed. For instance, if the type is u32, then the 'size' parameter corresponds to sizeof(u32) u32 src = 0x100; bpf_for_each(bits, bit, &src, sizeof(u32)); This approach shields the user of the bits iterator from concerns about endianness, simplifying usage. Conversely, if we were to use 'nr_bits', the user would need to account for endianness themselves. In other words, the user would be responsible for calculating the offset of 'src'.For instance, on a big-endian machine, if the 'src' is of type u64 and the user only want to iterate over 32 bits, the code would appear as follows: u64 src = 0x100; bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32); It may indeed be less user-friendly. However, I can make the adjustment as you suggested, given that it aligns with the pattern observed in bpf_probe_read_kernel_common(). -- Regards Yafang
On Tue, Apr 30, 2024 at 7:12 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > > > over is (4096 * 8). > > > > > > > > > > - bpf_iter_bits_next > > > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > > > > > Changes: > > > > > > > > > > - v5->v6: > > > > > > > > > > - Add positive tests (Andrii) > > > > > > > > > > - v4->v5: > > > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > > > - v3->v4: > > > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > > > - v2->v3: > > > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > > > - v1->v2: > > > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > > > (Eduard) > > > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > > > bpf: Add bits iterator > > > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > > > s390 architecture. The attachment provides a straightforward example > > > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > > > data_dst. > > > > > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > > > initializing the destination, the operation still fails. You can find > > > > > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > > > negative error code, but you are getting zero. Which actually makes > > > > sense. > > > > > > > > I think you are just getting confused by big endianness of s390x, and > > > > there is nothing wrong with bpf_probe_read_kernel(). > > > > > > > > In both of your tests (I pasted your code below, it would be better if > > > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > > > bits (on big endian) and lower bits are zeros. And __retval thing is > > > > 32-bit (despite BPF program returning long), so this return value is > > > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > > > > > Thank you for clarifying. The presence of the 32-bit __retval led to > > > my misunderstanding :( > > > > > > > > > > > So I think everything works as expected, but your tests (at least) > > > > don't handle the big-endian arch well. > > > > > > The issue arises when the dst and src have different sizes, causing > > > bpf_probe_read_kernel_common() to handle them poorly on big-endian > > > machines. To address this, we need to calculate the offset for > > > copying, as demonstrated by the following > > > > > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > > > unsafe_ptr__ign); > > > > > > One might wonder why this calculation is not incorporated directly > > > into the implementation of bpf_probe_read_kernel_common() ? > > > > Let's stop talking about > > bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything > > wrong or not handling anything wrong. It's not, it's correct. Your > > code is *using* it wrong, and that's what we'll have to fix. The > > contract of bpf_probe_read_kernel is in terms of memory addresses of > > source/destination *bytes* and the *common* size of both source and > > destination. bpf_probe_read_kernel() cannot know that you are passing > > a pointer to int or long or whatever, it's just a void * pointer. So > > if you are writing bytes over long, you need to take care of adjusting > > offsets to accommodate big-endian. > > > > It's a distraction to talk about bpf_probe_read_kernel() here (but > > it's useful to understand its contract). > > > > > > > > > > > > > __description("probe read 4 bytes") > > > > __success __retval(0x200) > > > > long probe_read_4(void) > > > > { > > > > int data = 0x200; > > > > long data_dst = 0; > > > > int err; > > > > > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > > > if (err) > > > > return err; > > > > > > > > return data_dst; > > > > } > > > > > > > > SEC("syscall") > > > > __description("probe read 8 bytes") > > > > __success __retval(0x200) > > > > long probe_read_8(void) > > > > { > > > > int data = 0x200; > > > > long data_dst = 0; > > > > int err; > > > > > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > > > if (err) > > > > return err; > > > > > > > > return data_dst; > > > > > > > > } > > > > > > > > > more details in the following link: > > > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > > > non-long-aligned source addresses. > > > > > > > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > > > > > > > It has been zero-initialized: > > > > > > > > > > + kit->nr_bits = 0; > > > > > + kit->bits_copy = 0; > > > > > > > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > > > failure for your patch set, but please double check again, taking into > > > > account the big endianness of s390x. > > > > > > > > > > If we aim to make it compatible with s390, we need to introduce some > > > constraints regarding the bits iteration. > > > > > > 1. We must replace nr_bits with size: > > > > > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > > > *unsafe_ptr__ign, u32 size) > > > > > > 2. The size must adhere to alignment requirements: > > > > > > if (size <= sizeof(u64)) { > > > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > > > > > switch (size) { > > > case 1: > > > case 2: > > > case 4: > > > case 8: > > > break; > > > default: > > > return -EINVAL; > > > } > > > > > > err = bpf_probe_read_kernel_common(((char > > > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > > > if (err) > > > return -EFAULT; > > > > > > kit->size = size; > > > kit->bit = -1; > > > return 0; > > > } > > > > > > /* Not long-aligned */ > > > if (size & (sizeof(unsigned long) - 1)) > > > return -EINVAL; > > > > > > .... > > > > > > Does this meet your expectations? > > > > > > > I don't think you need to add any restrictions or change anything to > > be byte-sized. You just need to calculate byte offset and do a bit > > shift to place N bits from the mask into upper N bits of long on > > big-endian. You might need to do some masking even for little endian > > as well. > > > > Which is why I suggested investing in simple but thorough tests. Write > > down a few bit mask patterns of variable length (not just > > multiple-of-8 sizes) and think about the sequence of bits that should > > be returned. And then codify that. > > > > Then check that both little- and big-endian arches work correctly. > > > > This is all a bit tricky because kernel's find_next_bit() makes the > > assumption that bit masks are long-based, while this bits iterator > > protocol is based on bit sizes, which are not necessarily multiples of > > 8 bits. So after you copy memory as bytes, you might need to clear > > (and shift, for big endian) some bits. Use simple but non-symmetrical > > bit masks to test everything. > > > > The reason for using 'size' instead of 'nr_bits' lies in the nature of > the pointer 'unsafe_ptr__ign', which is of type void *. Due to this > ambiguity in the type, the 'size' parameter serves to indicate the > size of the data being accessed. For instance, if the type is u32, > then the 'size' parameter corresponds to sizeof(u32) > > u32 src = 0x100; > bpf_for_each(bits, bit, &src, sizeof(u32)); > > This approach shields the user of the bits iterator from concerns > about endianness, simplifying usage. > > Conversely, if we were to use 'nr_bits', the user would need to > account for endianness themselves. In other words, the user would be Well, not exactly. Only if they try to cleverly use int/long for bitmask instead of sticking to an array of bytes. All this endianness comes into effect only when you are dealing with something else than pure bytes. And if they do, I think it's ok to expect them to deal with endianness correctly. (But also for little endian it will just work even if they don't think much about it). So I'd keep it as nr_bits and define an interface based on `char *` or `void *` to make it clear we are dealing with bits in arbitrary-sized byte arrays. But if someone has strong opinions otherwise, I'd be happy to hear some other arguments, of course. I just think it's confusing to be dealing with *bit*masks but in terms of byte sizes, tbh. And also, consider that in practice most users will be dealing with sizeof(u32) or sizeof(u64) (or larger, but multiple-of-4-or-8-bytes) bit masks. But the implementation has to be correct for all valid inputs and use cases. > responsible for calculating the offset of 'src'.For instance, on a > big-endian machine, if the 'src' is of type u64 and the user only want > to iterate over 32 bits, the code would appear as follows: > > u64 src = 0x100; > bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32); > > It may indeed be less user-friendly. However, I can make the > adjustment as you suggested, given that it aligns with the pattern > observed in bpf_probe_read_kernel_common(). > > -- > Regards > Yafang