diff mbox series

[bpf-next] RFC: libbpf: resolve rodata lookups

Message ID 20220718190748.2988882-1-sdf@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] RFC: libbpf: resolve rodata lookups | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: quentin@isovalent.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch fail ERROR: code indent should use tabs where possible WARNING: Too many leading tabs - consider code refactoring WARNING: else is not generally useful after a break or return WARNING: line length of 101 exceeds 80 columns WARNING: line length of 107 exceeds 80 columns WARNING: line length of 122 exceeds 80 columns WARNING: line length of 125 exceeds 80 columns WARNING: line length of 137 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 fail Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for Kernel LATEST on ubuntu-latest with llvm-15
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for Kernel LATEST on z15 with gcc

Commit Message

Stanislav Fomichev July 18, 2022, 7:07 p.m. UTC
Motivation:

Our bpf programs have a bunch of options which are set at the loading
time. After loading, they don't change. We currently use array map
to store them and bpf program does the following:

val = bpf_map_lookup_elem(&config_map, &key);
if (likely(val && *val)) {
  // do some optional feature
}

Since the configuration is static and we have a lot of those features,
I feel like we're wasting precious cycles doing dynamic lookups
(and stalling on memory loads).

I was assuming that converting those to some fake kconfig options
would solve it, but it still seems like kconfig is stored in the
global map and kconfig entries are resolved dynamically.

Proposal:

Resolve kconfig options statically upon loading. Basically rewrite
ld+ldx to two nops and 'mov val, x'.

I'm also trying to rewrite conditional jump when the condition is
!imm. This seems to be catching all the cases in my program, but
it's probably too hacky.

I've attached very raw RFC patch to demonstrate the idea. Anything
I'm missing? Any potential problems with this approach?
Note, I don't intent to submit as is, mostly to start a discussion..

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/bpf/bpftool/prog.c |  3 +++
 tools/lib/bpf/libbpf.c   | 43 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 3 deletions(-)

Comments

Alexei Starovoitov July 19, 2022, 8:20 p.m. UTC | #1
On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Motivation:
>
> Our bpf programs have a bunch of options which are set at the loading
> time. After loading, they don't change. We currently use array map
> to store them and bpf program does the following:
>
> val = bpf_map_lookup_elem(&config_map, &key);
> if (likely(val && *val)) {
>   // do some optional feature
> }
>
> Since the configuration is static and we have a lot of those features,
> I feel like we're wasting precious cycles doing dynamic lookups
> (and stalling on memory loads).
>
> I was assuming that converting those to some fake kconfig options
> would solve it, but it still seems like kconfig is stored in the
> global map and kconfig entries are resolved dynamically.
>
> Proposal:
>
> Resolve kconfig options statically upon loading. Basically rewrite
> ld+ldx to two nops and 'mov val, x'.
>
> I'm also trying to rewrite conditional jump when the condition is
> !imm. This seems to be catching all the cases in my program, but
> it's probably too hacky.
>
> I've attached very raw RFC patch to demonstrate the idea. Anything
> I'm missing? Any potential problems with this approach?

Have you considered using global variables for that?
With skeleton the user space has a natural way to set
all of these knobs after doing skel_open and before skel_load.
Then the verifier sees them as readonly vars and
automatically converts LDX into fixed constants and if the code
looks like if (my_config_var) then the verifier will remove
all the dead code too.
Stanislav Fomichev July 19, 2022, 8:33 p.m. UTC | #2
On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Motivation:
> >
> > Our bpf programs have a bunch of options which are set at the loading
> > time. After loading, they don't change. We currently use array map
> > to store them and bpf program does the following:
> >
> > val = bpf_map_lookup_elem(&config_map, &key);
> > if (likely(val && *val)) {
> >   // do some optional feature
> > }
> >
> > Since the configuration is static and we have a lot of those features,
> > I feel like we're wasting precious cycles doing dynamic lookups
> > (and stalling on memory loads).
> >
> > I was assuming that converting those to some fake kconfig options
> > would solve it, but it still seems like kconfig is stored in the
> > global map and kconfig entries are resolved dynamically.
> >
> > Proposal:
> >
> > Resolve kconfig options statically upon loading. Basically rewrite
> > ld+ldx to two nops and 'mov val, x'.
> >
> > I'm also trying to rewrite conditional jump when the condition is
> > !imm. This seems to be catching all the cases in my program, but
> > it's probably too hacky.
> >
> > I've attached very raw RFC patch to demonstrate the idea. Anything
> > I'm missing? Any potential problems with this approach?
>
> Have you considered using global variables for that?
> With skeleton the user space has a natural way to set
> all of these knobs after doing skel_open and before skel_load.
> Then the verifier sees them as readonly vars and
> automatically converts LDX into fixed constants and if the code
> looks like if (my_config_var) then the verifier will remove
> all the dead code too.

Hm, that's a good alternative, let me try it out. Thanks!
Stanislav Fomichev July 19, 2022, 9:41 p.m. UTC | #3
On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Motivation:
> > >
> > > Our bpf programs have a bunch of options which are set at the loading
> > > time. After loading, they don't change. We currently use array map
> > > to store them and bpf program does the following:
> > >
> > > val = bpf_map_lookup_elem(&config_map, &key);
> > > if (likely(val && *val)) {
> > >   // do some optional feature
> > > }
> > >
> > > Since the configuration is static and we have a lot of those features,
> > > I feel like we're wasting precious cycles doing dynamic lookups
> > > (and stalling on memory loads).
> > >
> > > I was assuming that converting those to some fake kconfig options
> > > would solve it, but it still seems like kconfig is stored in the
> > > global map and kconfig entries are resolved dynamically.
> > >
> > > Proposal:
> > >
> > > Resolve kconfig options statically upon loading. Basically rewrite
> > > ld+ldx to two nops and 'mov val, x'.
> > >
> > > I'm also trying to rewrite conditional jump when the condition is
> > > !imm. This seems to be catching all the cases in my program, but
> > > it's probably too hacky.
> > >
> > > I've attached very raw RFC patch to demonstrate the idea. Anything
> > > I'm missing? Any potential problems with this approach?
> >
> > Have you considered using global variables for that?
> > With skeleton the user space has a natural way to set
> > all of these knobs after doing skel_open and before skel_load.
> > Then the verifier sees them as readonly vars and
> > automatically converts LDX into fixed constants and if the code
> > looks like if (my_config_var) then the verifier will remove
> > all the dead code too.
>
> Hm, that's a good alternative, let me try it out. Thanks!

Turns out we already freeze kconfig map in libbpf:
if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
        err = bpf_map_freeze(map->fd);

And I've verified that I do hit bpf_map_direct_read in the verifier.

But the code still stays the same (bpftool dump xlated):
  72: (18) r1 = map[id:24][0]+20
  74: (61) r1 = *(u32 *)(r1 +0)
  75: (bf) r2 = r9
  76: (b7) r0 = 0
  77: (15) if r1 == 0x0 goto pc+9

I guess there is nothing for sanitize_dead_code to do because my
conditional is "if (likely(some_condition)) { do something }" and the
branch instruction itself is '.seen' by the verifier.

And, most annoyingly for me, 72 and 74 lookups still stay :-(

I can try to unwrap/resolve these on the verifier side instead of libbpf maybe..
Alexei Starovoitov July 19, 2022, 10:14 p.m. UTC | #4
On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Motivation:
> > > >
> > > > Our bpf programs have a bunch of options which are set at the loading
> > > > time. After loading, they don't change. We currently use array map
> > > > to store them and bpf program does the following:
> > > >
> > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > if (likely(val && *val)) {
> > > >   // do some optional feature
> > > > }
> > > >
> > > > Since the configuration is static and we have a lot of those features,
> > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > (and stalling on memory loads).
> > > >
> > > > I was assuming that converting those to some fake kconfig options
> > > > would solve it, but it still seems like kconfig is stored in the
> > > > global map and kconfig entries are resolved dynamically.
> > > >
> > > > Proposal:
> > > >
> > > > Resolve kconfig options statically upon loading. Basically rewrite
> > > > ld+ldx to two nops and 'mov val, x'.
> > > >
> > > > I'm also trying to rewrite conditional jump when the condition is
> > > > !imm. This seems to be catching all the cases in my program, but
> > > > it's probably too hacky.
> > > >
> > > > I've attached very raw RFC patch to demonstrate the idea. Anything
> > > > I'm missing? Any potential problems with this approach?
> > >
> > > Have you considered using global variables for that?
> > > With skeleton the user space has a natural way to set
> > > all of these knobs after doing skel_open and before skel_load.
> > > Then the verifier sees them as readonly vars and
> > > automatically converts LDX into fixed constants and if the code
> > > looks like if (my_config_var) then the verifier will remove
> > > all the dead code too.
> >
> > Hm, that's a good alternative, let me try it out. Thanks!
>
> Turns out we already freeze kconfig map in libbpf:
> if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
>         err = bpf_map_freeze(map->fd);
>
> And I've verified that I do hit bpf_map_direct_read in the verifier.
>
> But the code still stays the same (bpftool dump xlated):
>   72: (18) r1 = map[id:24][0]+20
>   74: (61) r1 = *(u32 *)(r1 +0)
>   75: (bf) r2 = r9
>   76: (b7) r0 = 0
>   77: (15) if r1 == 0x0 goto pc+9
>
> I guess there is nothing for sanitize_dead_code to do because my
> conditional is "if (likely(some_condition)) { do something }" and the
> branch instruction itself is '.seen' by the verifier.

I bet your variable is not 'const'.
Please see any of the progs in selftests that do:
const volatile int var = 123;
to express configs.
Stanislav Fomichev July 19, 2022, 11:20 p.m. UTC | #5
On 07/19, Alexei Starovoitov wrote:
> On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>  
> wrote:
> > >
> > > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev  
> <sdf@google.com> wrote:
> > > > >
> > > > > Motivation:
> > > > >
> > > > > Our bpf programs have a bunch of options which are set at the  
> loading
> > > > > time. After loading, they don't change. We currently use array map
> > > > > to store them and bpf program does the following:
> > > > >
> > > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > > if (likely(val && *val)) {
> > > > >   // do some optional feature
> > > > > }
> > > > >
> > > > > Since the configuration is static and we have a lot of those  
> features,
> > > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > > (and stalling on memory loads).
> > > > >
> > > > > I was assuming that converting those to some fake kconfig options
> > > > > would solve it, but it still seems like kconfig is stored in the
> > > > > global map and kconfig entries are resolved dynamically.
> > > > >
> > > > > Proposal:
> > > > >
> > > > > Resolve kconfig options statically upon loading. Basically rewrite
> > > > > ld+ldx to two nops and 'mov val, x'.
> > > > >
> > > > > I'm also trying to rewrite conditional jump when the condition is
> > > > > !imm. This seems to be catching all the cases in my program, but
> > > > > it's probably too hacky.
> > > > >
> > > > > I've attached very raw RFC patch to demonstrate the idea. Anything
> > > > > I'm missing? Any potential problems with this approach?
> > > >
> > > > Have you considered using global variables for that?
> > > > With skeleton the user space has a natural way to set
> > > > all of these knobs after doing skel_open and before skel_load.
> > > > Then the verifier sees them as readonly vars and
> > > > automatically converts LDX into fixed constants and if the code
> > > > looks like if (my_config_var) then the verifier will remove
> > > > all the dead code too.
> > >
> > > Hm, that's a good alternative, let me try it out. Thanks!
> >
> > Turns out we already freeze kconfig map in libbpf:
> > if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> >         err = bpf_map_freeze(map->fd);
> >
> > And I've verified that I do hit bpf_map_direct_read in the verifier.
> >
> > But the code still stays the same (bpftool dump xlated):
> >   72: (18) r1 = map[id:24][0]+20
> >   74: (61) r1 = *(u32 *)(r1 +0)
> >   75: (bf) r2 = r9
> >   76: (b7) r0 = 0
> >   77: (15) if r1 == 0x0 goto pc+9
> >
> > I guess there is nothing for sanitize_dead_code to do because my
> > conditional is "if (likely(some_condition)) { do something }" and the
> > branch instruction itself is '.seen' by the verifier.

> I bet your variable is not 'const'.
> Please see any of the progs in selftests that do:
> const volatile int var = 123;
> to express configs.

Yeah, I was testing against the following:

	extern int CONFIG_XYZ __kconfig __weak;

But ended up writing this small reproducer:

	struct __sk_buff;

	const volatile int CONFIG_DROP = 1; // volatile so it's not
					    // clang-optimized

	__attribute__((section("tc"), used))
	int my_config(struct __sk_buff *skb)
	{
		int ret = 0; /*TC_ACT_OK*/

		if (CONFIG_DROP)
			ret = 2 /*TC_ACT_SHOT*/;

		return ret;
	}

$ bpftool map dump name my_confi.rodata

[{
         "value": {
             ".rodata": [{
                     "CONFIG_DROP": 1
                 }
             ]
         }
     }
]

$ bpftool prog dump xlated name my_config

int my_config(struct __sk_buff * skb):
; if (CONFIG_DROP)
    0: (18) r1 = map[id:3][0]+0
    2: (61) r1 = *(u32 *)(r1 +0)
    3: (b4) w0 = 1
; if (CONFIG_DROP)
    4: (64) w0 <<= 1
; return ret;
    5: (95) exit

The branch is gone, but the map lookup is still there :-(
Stanislav Fomichev July 20, 2022, 6:04 p.m. UTC | #6
On 07/19, sdf@google.com wrote:
> On 07/19, Alexei Starovoitov wrote:
> > On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>  
> wrote:
> > >
> > > On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
> > wrote:
> > > >
> > > > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> > <sdf@google.com> wrote:
> > > > > >
> > > > > > Motivation:
> > > > > >
> > > > > > Our bpf programs have a bunch of options which are set at the
> > loading
> > > > > > time. After loading, they don't change. We currently use array  
> map
> > > > > > to store them and bpf program does the following:
> > > > > >
> > > > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > > > if (likely(val && *val)) {
> > > > > >   // do some optional feature
> > > > > > }
> > > > > >
> > > > > > Since the configuration is static and we have a lot of those
> > features,
> > > > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > > > (and stalling on memory loads).
> > > > > >
> > > > > > I was assuming that converting those to some fake kconfig  
> options
> > > > > > would solve it, but it still seems like kconfig is stored in the
> > > > > > global map and kconfig entries are resolved dynamically.
> > > > > >
> > > > > > Proposal:
> > > > > >
> > > > > > Resolve kconfig options statically upon loading. Basically  
> rewrite
> > > > > > ld+ldx to two nops and 'mov val, x'.
> > > > > >
> > > > > > I'm also trying to rewrite conditional jump when the condition  
> is
> > > > > > !imm. This seems to be catching all the cases in my program, but
> > > > > > it's probably too hacky.
> > > > > >
> > > > > > I've attached very raw RFC patch to demonstrate the idea.  
> Anything
> > > > > > I'm missing? Any potential problems with this approach?
> > > > >
> > > > > Have you considered using global variables for that?
> > > > > With skeleton the user space has a natural way to set
> > > > > all of these knobs after doing skel_open and before skel_load.
> > > > > Then the verifier sees them as readonly vars and
> > > > > automatically converts LDX into fixed constants and if the code
> > > > > looks like if (my_config_var) then the verifier will remove
> > > > > all the dead code too.
> > > >
> > > > Hm, that's a good alternative, let me try it out. Thanks!
> > >
> > > Turns out we already freeze kconfig map in libbpf:
> > > if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> > >         err = bpf_map_freeze(map->fd);
> > >
> > > And I've verified that I do hit bpf_map_direct_read in the verifier.
> > >
> > > But the code still stays the same (bpftool dump xlated):
> > >   72: (18) r1 = map[id:24][0]+20
> > >   74: (61) r1 = *(u32 *)(r1 +0)
> > >   75: (bf) r2 = r9
> > >   76: (b7) r0 = 0
> > >   77: (15) if r1 == 0x0 goto pc+9
> > >
> > > I guess there is nothing for sanitize_dead_code to do because my
> > > conditional is "if (likely(some_condition)) { do something }" and the
> > > branch instruction itself is '.seen' by the verifier.

> > I bet your variable is not 'const'.
> > Please see any of the progs in selftests that do:
> > const volatile int var = 123;
> > to express configs.

> Yeah, I was testing against the following:

> 	extern int CONFIG_XYZ __kconfig __weak;

> But ended up writing this small reproducer:

> 	struct __sk_buff;

> 	const volatile int CONFIG_DROP = 1; // volatile so it's not
> 					    // clang-optimized

> 	__attribute__((section("tc"), used))
> 	int my_config(struct __sk_buff *skb)
> 	{
> 		int ret = 0; /*TC_ACT_OK*/

> 		if (CONFIG_DROP)
> 			ret = 2 /*TC_ACT_SHOT*/;

> 		return ret;
> 	}

> $ bpftool map dump name my_confi.rodata

> [{
>          "value": {
>              ".rodata": [{
>                      "CONFIG_DROP": 1
>                  }
>              ]
>          }
>      }
> ]

> $ bpftool prog dump xlated name my_config

> int my_config(struct __sk_buff * skb):
> ; if (CONFIG_DROP)
>     0: (18) r1 = map[id:3][0]+0
>     2: (61) r1 = *(u32 *)(r1 +0)
>     3: (b4) w0 = 1
> ; if (CONFIG_DROP)
>     4: (64) w0 <<= 1
> ; return ret;
>     5: (95) exit

> The branch is gone, but the map lookup is still there :-(

Attached another RFC below which is doing the same but from the verifier
side. It seems we should be able to resolve LD+LDX if their dst_reg
is the same? If they are different, we should be able to pre-lookup
LDX value at least. Would something like this work (haven't run full
verifier/test_progs yet)?

(note, in this case, with kconfig, I still see the branch)

  test_fold_ro_ldx:PASS:open 0 nsec
  test_fold_ro_ldx:PASS:load 0 nsec
  test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
  int fold_ro_ldx(struct __sk_buff * skb):
  ; if (CONFIG_DROP)
     0: (b7) r1 = 1
     1: (b4) w0 = 1
  ; if (CONFIG_DROP)
     2: (16) if w1 == 0x0 goto pc+1
     3: (b4) w0 = 2
  ; return ret;
     4: (95) exit
  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
  #66      fold_ro_ldx:OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
  kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
  .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
  .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
  3 files changed, 144 insertions(+), 2 deletions(-)
  create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
  create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c59c3df0fea6..ffedd8234288 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct  
bpf_map *map)
  		map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
  }

+/* if the map is read-only, we can try to fully resolve the load */
+static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
+				   struct bpf_map *map,
+				   struct bpf_insn *insn)
+{
+	struct bpf_insn *ldx_insn = insn + 2;
+	int dst_reg = ldx_insn->dst_reg;
+	u64 val = 0;
+	int size;
+	int err;
+
+	if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
+		return false;
+
+	/* 0: BPF_LD  r=MAP
+	 * 1: BPF_LD  r=MAP
+	 * 2: BPF_LDX r=MAP->VAL
+	 */
+
+	if (BPF_CLASS((insn+0)->code) != BPF_LD ||
+	    BPF_CLASS((insn+1)->code) != BPF_LD ||
+	    BPF_CLASS((insn+2)->code) != BPF_LDX)
+		return false;
+
+	if (BPF_MODE((insn+0)->code) != BPF_IMM ||
+	    BPF_MODE((insn+1)->code) != BPF_IMM ||
+	    BPF_MODE((insn+2)->code) != BPF_MEM)
+		return false;
+
+	if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
+	    insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
+		return false;
+
+	if (insn->dst_reg != ldx_insn->src_reg)
+		return false;
+
+	if (ldx_insn->off != 0)
+		return false;
+
+	size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
+	if (size < 0 || size > 4)
+		return false;
+
+	err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
+	if (err)
+		return false;
+
+	if (insn->dst_reg == ldx_insn->dst_reg) {
+		/* LDX is using the same destination register as LD.
+		 * This means we are not interested in the map
+		 * pointer itself and can remove it.
+		 */
+		*(insn + 0) = BPF_JMP_A(0);
+		*(insn + 1) = BPF_JMP_A(0);
+		*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
+		return true;
+	}
+
+	*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
+	/* Only LDX can be resolved, we still have to resolve LD address. */
+	return false;
+}
+
  /* find and rewrite pseudo imm in ld_imm64 instructions:
   *
   * 1. if it accesses map FD, replace it with actual map pointer.
@@ -12713,6 +12776,8 @@ static int resolve_pseudo_ldimm64(struct  
bpf_verifier_env *env)
  		return err;

  	for (i = 0; i < insn_cnt; i++, insn++) {
+		bool folded = false;
+
  		if (BPF_CLASS(insn->code) == BPF_LDX &&
  		    (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0)) {
  			verbose(env, "BPF_LDX uses reserved fields\n");
@@ -12830,8 +12895,13 @@ static int resolve_pseudo_ldimm64(struct  
bpf_verifier_env *env)
  				addr += off;
  			}

-			insn[0].imm = (u32)addr;
-			insn[1].imm = addr >> 32;
+			if (i + 2 < insn_cnt)
+				folded = fold_ro_pseudo_ldimm64(env, map, insn);
+
+			if (!folded) {
+				insn[0].imm = (u32)addr;
+				insn[1].imm = addr >> 32;
+			}

  			/* check whether we recorded this map already */
  			for (j = 0; j < env->used_map_cnt; j++) {
diff --git a/tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c  
b/tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
new file mode 100644
index 000000000000..faaf8423ebca
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include <bpf/btf.h>
+
+#include "fold_ro_ldx.skel.h"
+
+void test_fold_ro_ldx(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_object_open_opts, skel_opts,
+		.kconfig = "CONFIG_DROP=1",
+	);
+	struct fold_ro_ldx *skel = NULL;
+	struct bpf_prog_info info = {};
+	struct bpf_insn insn[16];
+	__u32 info_len;
+	FILE *bpftool;
+	char buf[256];
+	int err;
+	int i;
+
+	skel = fold_ro_ldx__open_opts(&skel_opts);
+	if (!ASSERT_OK_PTR(skel, "open"))
+		goto close_skel;
+
+	if (!ASSERT_OK(fold_ro_ldx__load(skel), "load"))
+		goto close_skel;
+
+	info.xlated_prog_len = sizeof(insn);
+	info.xlated_prog_insns = ptr_to_u64(insn);
+
+	info_len = sizeof(struct bpf_prog_info);
+	err = bpf_obj_get_info_by_fd(bpf_program__fd(skel->progs.fold_ro_ldx),
+				     &info, &info_len);
+	if (!ASSERT_GE(err, 0, "bpf_obj_get_info_by_fd"))
+		goto close_skel;
+
+	// Put xlated output into stdout in case verification below fails.
+	bpftool = popen("bpftool prog dump xlated name fold_ro_ldx", "r");
+	while (fread(buf, 1, sizeof(buf), bpftool) > 0)
+		fwrite(buf, 1, strlen(buf), stdout);
+	pclose(bpftool);
+
+	for (i = 0; i < info.xlated_prog_len / sizeof(struct bpf_insn); i++) {
+		ASSERT_NEQ(BPF_CLASS(insn[i].code), BPF_LD, "found BPF_LD");
+		ASSERT_NEQ(BPF_CLASS(insn[i].code), BPF_LDX, "found BPF_LDX");
+	}
+
+close_skel:
+	fold_ro_ldx__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/fold_ro_ldx.c  
b/tools/testing/selftests/bpf/progs/fold_ro_ldx.c
new file mode 100644
index 000000000000..c83ac65e2dee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fold_ro_ldx.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+extern const int CONFIG_DROP __kconfig __weak;
+
+struct __sk_buff;
+
+SEC("tc")
+int fold_ro_ldx(struct __sk_buff *skb)
+{
+	int ret = 1;
+
+	if (CONFIG_DROP)
+		ret = 2;
+
+	return ret;
+}
Martin KaFai Lau July 20, 2022, 8:52 p.m. UTC | #7
On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@google.com wrote:
> On 07/19, sdf@google.com wrote:
> > On 07/19, Alexei Starovoitov wrote:
> > > On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>
> > wrote:
> > > >
> > > > On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
> > > wrote:
> > > > >
> > > > > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> > > <sdf@google.com> wrote:
> > > > > > >
> > > > > > > Motivation:
> > > > > > >
> > > > > > > Our bpf programs have a bunch of options which are set at the
> > > loading
> > > > > > > time. After loading, they don't change. We currently use array
> > map
> > > > > > > to store them and bpf program does the following:
> > > > > > >
> > > > > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > > > > if (likely(val && *val)) {
> > > > > > >   // do some optional feature
> > > > > > > }
> > > > > > >
> > > > > > > Since the configuration is static and we have a lot of those
> > > features,
> > > > > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > > > > (and stalling on memory loads).
> > > > > > >
> > > > > > > I was assuming that converting those to some fake kconfig
> > options
> > > > > > > would solve it, but it still seems like kconfig is stored in the
> > > > > > > global map and kconfig entries are resolved dynamically.
> > > > > > >
> > > > > > > Proposal:
> > > > > > >
> > > > > > > Resolve kconfig options statically upon loading. Basically
> > rewrite
> > > > > > > ld+ldx to two nops and 'mov val, x'.
> > > > > > >
> > > > > > > I'm also trying to rewrite conditional jump when the condition
> > is
> > > > > > > !imm. This seems to be catching all the cases in my program, but
> > > > > > > it's probably too hacky.
> > > > > > >
> > > > > > > I've attached very raw RFC patch to demonstrate the idea.
> > Anything
> > > > > > > I'm missing? Any potential problems with this approach?
> > > > > >
> > > > > > Have you considered using global variables for that?
> > > > > > With skeleton the user space has a natural way to set
> > > > > > all of these knobs after doing skel_open and before skel_load.
> > > > > > Then the verifier sees them as readonly vars and
> > > > > > automatically converts LDX into fixed constants and if the code
> > > > > > looks like if (my_config_var) then the verifier will remove
> > > > > > all the dead code too.
> > > > >
> > > > > Hm, that's a good alternative, let me try it out. Thanks!
> > > >
> > > > Turns out we already freeze kconfig map in libbpf:
> > > > if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> > > >         err = bpf_map_freeze(map->fd);
> > > >
> > > > And I've verified that I do hit bpf_map_direct_read in the verifier.
> > > >
> > > > But the code still stays the same (bpftool dump xlated):
> > > >   72: (18) r1 = map[id:24][0]+20
> > > >   74: (61) r1 = *(u32 *)(r1 +0)
> > > >   75: (bf) r2 = r9
> > > >   76: (b7) r0 = 0
> > > >   77: (15) if r1 == 0x0 goto pc+9
> > > >
> > > > I guess there is nothing for sanitize_dead_code to do because my
> > > > conditional is "if (likely(some_condition)) { do something }" and the
> > > > branch instruction itself is '.seen' by the verifier.
> 
> > > I bet your variable is not 'const'.
> > > Please see any of the progs in selftests that do:
> > > const volatile int var = 123;
> > > to express configs.
> 
> > Yeah, I was testing against the following:
> 
> > 	extern int CONFIG_XYZ __kconfig __weak;
> 
> > But ended up writing this small reproducer:
> 
> > 	struct __sk_buff;
> 
> > 	const volatile int CONFIG_DROP = 1; // volatile so it's not
> > 					    // clang-optimized
> 
> > 	__attribute__((section("tc"), used))
> > 	int my_config(struct __sk_buff *skb)
> > 	{
> > 		int ret = 0; /*TC_ACT_OK*/
> 
> > 		if (CONFIG_DROP)
> > 			ret = 2 /*TC_ACT_SHOT*/;
> 
> > 		return ret;
> > 	}
> 
> > $ bpftool map dump name my_confi.rodata
> 
> > [{
> >          "value": {
> >              ".rodata": [{
> >                      "CONFIG_DROP": 1
> >                  }
> >              ]
> >          }
> >      }
> > ]
> 
> > $ bpftool prog dump xlated name my_config
> 
> > int my_config(struct __sk_buff * skb):
> > ; if (CONFIG_DROP)
> >     0: (18) r1 = map[id:3][0]+0
> >     2: (61) r1 = *(u32 *)(r1 +0)
> >     3: (b4) w0 = 1
> > ; if (CONFIG_DROP)
> >     4: (64) w0 <<= 1
> > ; return ret;
> >     5: (95) exit
> 
> > The branch is gone, but the map lookup is still there :-(
> 
> Attached another RFC below which is doing the same but from the verifier
> side. It seems we should be able to resolve LD+LDX if their dst_reg
> is the same? If they are different, we should be able to pre-lookup
> LDX value at least. Would something like this work (haven't run full
> verifier/test_progs yet)?
> 
> (note, in this case, with kconfig, I still see the branch)
> 
>  test_fold_ro_ldx:PASS:open 0 nsec
>  test_fold_ro_ldx:PASS:load 0 nsec
>  test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
>  int fold_ro_ldx(struct __sk_buff * skb):
>  ; if (CONFIG_DROP)
>     0: (b7) r1 = 1
>     1: (b4) w0 = 1
>  ; if (CONFIG_DROP)
>     2: (16) if w1 == 0x0 goto pc+1
>     3: (b4) w0 = 2
>  ; return ret;
>     4: (95) exit
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>  #66      fold_ro_ldx:OK
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
>  .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
>  .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
>  3 files changed, 144 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c59c3df0fea6..ffedd8234288 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
> bpf_map *map)
>  		map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
>  }
> 
> +/* if the map is read-only, we can try to fully resolve the load */
> +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
> +				   struct bpf_map *map,
> +				   struct bpf_insn *insn)
> +{
> +	struct bpf_insn *ldx_insn = insn + 2;
> +	int dst_reg = ldx_insn->dst_reg;
> +	u64 val = 0;
> +	int size;
> +	int err;
> +
> +	if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
> +		return false;
> +
> +	/* 0: BPF_LD  r=MAP
> +	 * 1: BPF_LD  r=MAP
> +	 * 2: BPF_LDX r=MAP->VAL
> +	 */
> +
> +	if (BPF_CLASS((insn+0)->code) != BPF_LD ||
> +	    BPF_CLASS((insn+1)->code) != BPF_LD ||
> +	    BPF_CLASS((insn+2)->code) != BPF_LDX)
> +		return false;
> +
> +	if (BPF_MODE((insn+0)->code) != BPF_IMM ||
> +	    BPF_MODE((insn+1)->code) != BPF_IMM ||
> +	    BPF_MODE((insn+2)->code) != BPF_MEM)
> +		return false;
> +
> +	if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
> +	    insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
> +		return false;
> +
> +	if (insn->dst_reg != ldx_insn->src_reg)
> +		return false;
> +
> +	if (ldx_insn->off != 0)
> +		return false;
> +
> +	size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
> +	if (size < 0 || size > 4)
> +		return false;
> +
> +	err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
> +	if (err)
> +		return false;
> +
> +	if (insn->dst_reg == ldx_insn->dst_reg) {
> +		/* LDX is using the same destination register as LD.
> +		 * This means we are not interested in the map
> +		 * pointer itself and can remove it.
> +		 */
> +		*(insn + 0) = BPF_JMP_A(0);
> +		*(insn + 1) = BPF_JMP_A(0);
> +		*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
Have you figured out why the branch is not removed
with BPF_ALU64_IMM(BPF_MOV) ?

Can it also support 8 bytes (BPF_DW) ?  Is it because there
is not enough space for ld_imm64?  so wonder if this
patching can be done in do_misc_fixups() instead.

> +		return true;
> +	}
> +
> +	*(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> +	/* Only LDX can be resolved, we still have to resolve LD address. */
> +	return false;
> +}
Stanislav Fomichev July 20, 2022, 10:33 p.m. UTC | #8
On Wed, Jul 20, 2022 at 1:53 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@google.com wrote:
> > On 07/19, sdf@google.com wrote:
> > > On 07/19, Alexei Starovoitov wrote:
> > > > On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>
> > > wrote:
> > > > >
> > > > > On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
> > > > wrote:
> > > > > >
> > > > > > On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> > > > <sdf@google.com> wrote:
> > > > > > > >
> > > > > > > > Motivation:
> > > > > > > >
> > > > > > > > Our bpf programs have a bunch of options which are set at the
> > > > loading
> > > > > > > > time. After loading, they don't change. We currently use array
> > > map
> > > > > > > > to store them and bpf program does the following:
> > > > > > > >
> > > > > > > > val = bpf_map_lookup_elem(&config_map, &key);
> > > > > > > > if (likely(val && *val)) {
> > > > > > > >   // do some optional feature
> > > > > > > > }
> > > > > > > >
> > > > > > > > Since the configuration is static and we have a lot of those
> > > > features,
> > > > > > > > I feel like we're wasting precious cycles doing dynamic lookups
> > > > > > > > (and stalling on memory loads).
> > > > > > > >
> > > > > > > > I was assuming that converting those to some fake kconfig
> > > options
> > > > > > > > would solve it, but it still seems like kconfig is stored in the
> > > > > > > > global map and kconfig entries are resolved dynamically.
> > > > > > > >
> > > > > > > > Proposal:
> > > > > > > >
> > > > > > > > Resolve kconfig options statically upon loading. Basically
> > > rewrite
> > > > > > > > ld+ldx to two nops and 'mov val, x'.
> > > > > > > >
> > > > > > > > I'm also trying to rewrite conditional jump when the condition
> > > is
> > > > > > > > !imm. This seems to be catching all the cases in my program, but
> > > > > > > > it's probably too hacky.
> > > > > > > >
> > > > > > > > I've attached very raw RFC patch to demonstrate the idea.
> > > Anything
> > > > > > > > I'm missing? Any potential problems with this approach?
> > > > > > >
> > > > > > > Have you considered using global variables for that?
> > > > > > > With skeleton the user space has a natural way to set
> > > > > > > all of these knobs after doing skel_open and before skel_load.
> > > > > > > Then the verifier sees them as readonly vars and
> > > > > > > automatically converts LDX into fixed constants and if the code
> > > > > > > looks like if (my_config_var) then the verifier will remove
> > > > > > > all the dead code too.
> > > > > >
> > > > > > Hm, that's a good alternative, let me try it out. Thanks!
> > > > >
> > > > > Turns out we already freeze kconfig map in libbpf:
> > > > > if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> > > > >         err = bpf_map_freeze(map->fd);
> > > > >
> > > > > And I've verified that I do hit bpf_map_direct_read in the verifier.
> > > > >
> > > > > But the code still stays the same (bpftool dump xlated):
> > > > >   72: (18) r1 = map[id:24][0]+20
> > > > >   74: (61) r1 = *(u32 *)(r1 +0)
> > > > >   75: (bf) r2 = r9
> > > > >   76: (b7) r0 = 0
> > > > >   77: (15) if r1 == 0x0 goto pc+9
> > > > >
> > > > > I guess there is nothing for sanitize_dead_code to do because my
> > > > > conditional is "if (likely(some_condition)) { do something }" and the
> > > > > branch instruction itself is '.seen' by the verifier.
> >
> > > > I bet your variable is not 'const'.
> > > > Please see any of the progs in selftests that do:
> > > > const volatile int var = 123;
> > > > to express configs.
> >
> > > Yeah, I was testing against the following:
> >
> > >     extern int CONFIG_XYZ __kconfig __weak;
> >
> > > But ended up writing this small reproducer:
> >
> > >     struct __sk_buff;
> >
> > >     const volatile int CONFIG_DROP = 1; // volatile so it's not
> > >                                         // clang-optimized
> >
> > >     __attribute__((section("tc"), used))
> > >     int my_config(struct __sk_buff *skb)
> > >     {
> > >             int ret = 0; /*TC_ACT_OK*/
> >
> > >             if (CONFIG_DROP)
> > >                     ret = 2 /*TC_ACT_SHOT*/;
> >
> > >             return ret;
> > >     }
> >
> > > $ bpftool map dump name my_confi.rodata
> >
> > > [{
> > >          "value": {
> > >              ".rodata": [{
> > >                      "CONFIG_DROP": 1
> > >                  }
> > >              ]
> > >          }
> > >      }
> > > ]
> >
> > > $ bpftool prog dump xlated name my_config
> >
> > > int my_config(struct __sk_buff * skb):
> > > ; if (CONFIG_DROP)
> > >     0: (18) r1 = map[id:3][0]+0
> > >     2: (61) r1 = *(u32 *)(r1 +0)
> > >     3: (b4) w0 = 1
> > > ; if (CONFIG_DROP)
> > >     4: (64) w0 <<= 1
> > > ; return ret;
> > >     5: (95) exit
> >
> > > The branch is gone, but the map lookup is still there :-(
> >
> > Attached another RFC below which is doing the same but from the verifier
> > side. It seems we should be able to resolve LD+LDX if their dst_reg
> > is the same? If they are different, we should be able to pre-lookup
> > LDX value at least. Would something like this work (haven't run full
> > verifier/test_progs yet)?
> >
> > (note, in this case, with kconfig, I still see the branch)
> >
> >  test_fold_ro_ldx:PASS:open 0 nsec
> >  test_fold_ro_ldx:PASS:load 0 nsec
> >  test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
> >  int fold_ro_ldx(struct __sk_buff * skb):
> >  ; if (CONFIG_DROP)
> >     0: (b7) r1 = 1
> >     1: (b4) w0 = 1
> >  ; if (CONFIG_DROP)
> >     2: (16) if w1 == 0x0 goto pc+1
> >     3: (b4) w0 = 2
> >  ; return ret;
> >     4: (95) exit
> >  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >  test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >  #66      fold_ro_ldx:OK
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
> >  .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
> >  .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
> >  3 files changed, 144 insertions(+), 2 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c59c3df0fea6..ffedd8234288 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
> > bpf_map *map)
> >               map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> >  }
> >
> > +/* if the map is read-only, we can try to fully resolve the load */
> > +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
> > +                                struct bpf_map *map,
> > +                                struct bpf_insn *insn)
> > +{
> > +     struct bpf_insn *ldx_insn = insn + 2;
> > +     int dst_reg = ldx_insn->dst_reg;
> > +     u64 val = 0;
> > +     int size;
> > +     int err;
> > +
> > +     if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
> > +             return false;
> > +
> > +     /* 0: BPF_LD  r=MAP
> > +      * 1: BPF_LD  r=MAP
> > +      * 2: BPF_LDX r=MAP->VAL
> > +      */
> > +
> > +     if (BPF_CLASS((insn+0)->code) != BPF_LD ||
> > +         BPF_CLASS((insn+1)->code) != BPF_LD ||
> > +         BPF_CLASS((insn+2)->code) != BPF_LDX)
> > +             return false;
> > +
> > +     if (BPF_MODE((insn+0)->code) != BPF_IMM ||
> > +         BPF_MODE((insn+1)->code) != BPF_IMM ||
> > +         BPF_MODE((insn+2)->code) != BPF_MEM)
> > +             return false;
> > +
> > +     if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
> > +         insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
> > +             return false;
> > +
> > +     if (insn->dst_reg != ldx_insn->src_reg)
> > +             return false;
> > +
> > +     if (ldx_insn->off != 0)
> > +             return false;
> > +
> > +     size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
> > +     if (size < 0 || size > 4)
> > +             return false;
> > +
> > +     err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
> > +     if (err)
> > +             return false;
> > +
> > +     if (insn->dst_reg == ldx_insn->dst_reg) {
> > +             /* LDX is using the same destination register as LD.
> > +              * This means we are not interested in the map
> > +              * pointer itself and can remove it.
> > +              */
> > +             *(insn + 0) = BPF_JMP_A(0);
> > +             *(insn + 1) = BPF_JMP_A(0);
> > +             *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> Have you figured out why the branch is not removed
> with BPF_ALU64_IMM(BPF_MOV) ?

I do have an idea, yes, but I'm not 100% certain. The rewrite has
nothing to do with it.
If I change the default ret from 1 to 0, I get a different bytecode
where this branch is eventually removed by the verifier.

I think it comes down to the following snippet:
r1 = 0 ll
r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
w0 = 1
if w1 == 0 goto +1
w0 = 2
exit
Here, 'if w1 == 0' is never taken, but it has been 'seen' by the
verifier. There is no 'dead' code, it just jumps around 'w0 = 2' which
has seen=true.

VS this one:
r1 = 0 ll
r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
w0 = 1
if w1 != 0 goto +1
w0 = 0
w0 <<= 1
exit
Here, 'if w1 != 0' is seen, but the next insn has 'seen=false', so
this whole thing is ripped out.

So basically, from my quick look, it seems like there should be some
real dead code to trigger the removal. If you simply have 'if
condition_that_never_happens goto +1' which doesn't lead to some dead
code, this single jump-over-some-seen-code is never gonna be removed.
So, IMO, that's something that should be addressed independently.

> Can it also support 8 bytes (BPF_DW) ?  Is it because there
> is not enough space for ld_imm64?  so wonder if this
> patching can be done in do_misc_fixups() instead.

Yeah, I've limited it to 4 bytes because of sizeof(imm) for now.
I think one complication might be that at do_misc_fixups point, the
immediate args of bpf_ld have been rewritten which might make it
harder to get the data offset.

But I guess I'm still at the point where I'm trying to understand
whether what I'm doing makes sense or not :-) And whether we should do
it at the verifier level, in libbpf or if at all..




> > +             return true;
> > +     }
> > +
> > +     *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> > +     /* Only LDX can be resolved, we still have to resolve LD address. */
> > +     return false;
> > +}
Yonghong Song July 21, 2022, 10:29 p.m. UTC | #9
On 7/20/22 3:33 PM, Stanislav Fomichev wrote:
> On Wed, Jul 20, 2022 at 1:53 PM Martin KaFai Lau <kafai@fb.com> wrote:
>>
>> On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@google.com wrote:
>>> On 07/19, sdf@google.com wrote:
>>>> On 07/19, Alexei Starovoitov wrote:
>>>>> On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>
>>>> wrote:
>>>>>>
>>>>>> On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
>>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
>>>>> <sdf@google.com> wrote:
>>>>>>>>>
>>>>>>>>> Motivation:
>>>>>>>>>
>>>>>>>>> Our bpf programs have a bunch of options which are set at the
>>>>> loading
>>>>>>>>> time. After loading, they don't change. We currently use array
>>>> map
>>>>>>>>> to store them and bpf program does the following:
>>>>>>>>>
>>>>>>>>> val = bpf_map_lookup_elem(&config_map, &key);
>>>>>>>>> if (likely(val && *val)) {
>>>>>>>>>    // do some optional feature
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Since the configuration is static and we have a lot of those
>>>>> features,
>>>>>>>>> I feel like we're wasting precious cycles doing dynamic lookups
>>>>>>>>> (and stalling on memory loads).
>>>>>>>>>
>>>>>>>>> I was assuming that converting those to some fake kconfig
>>>> options
>>>>>>>>> would solve it, but it still seems like kconfig is stored in the
>>>>>>>>> global map and kconfig entries are resolved dynamically.
>>>>>>>>>
>>>>>>>>> Proposal:
>>>>>>>>>
>>>>>>>>> Resolve kconfig options statically upon loading. Basically
>>>> rewrite
>>>>>>>>> ld+ldx to two nops and 'mov val, x'.
>>>>>>>>>
>>>>>>>>> I'm also trying to rewrite conditional jump when the condition
>>>> is
>>>>>>>>> !imm. This seems to be catching all the cases in my program, but
>>>>>>>>> it's probably too hacky.
>>>>>>>>>
>>>>>>>>> I've attached very raw RFC patch to demonstrate the idea.
>>>> Anything
>>>>>>>>> I'm missing? Any potential problems with this approach?
>>>>>>>>
>>>>>>>> Have you considered using global variables for that?
>>>>>>>> With skeleton the user space has a natural way to set
>>>>>>>> all of these knobs after doing skel_open and before skel_load.
>>>>>>>> Then the verifier sees them as readonly vars and
>>>>>>>> automatically converts LDX into fixed constants and if the code
>>>>>>>> looks like if (my_config_var) then the verifier will remove
>>>>>>>> all the dead code too.
>>>>>>>
>>>>>>> Hm, that's a good alternative, let me try it out. Thanks!
>>>>>>
>>>>>> Turns out we already freeze kconfig map in libbpf:
>>>>>> if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
>>>>>>          err = bpf_map_freeze(map->fd);
>>>>>>
>>>>>> And I've verified that I do hit bpf_map_direct_read in the verifier.
>>>>>>
>>>>>> But the code still stays the same (bpftool dump xlated):
>>>>>>    72: (18) r1 = map[id:24][0]+20
>>>>>>    74: (61) r1 = *(u32 *)(r1 +0)
>>>>>>    75: (bf) r2 = r9
>>>>>>    76: (b7) r0 = 0
>>>>>>    77: (15) if r1 == 0x0 goto pc+9
>>>>>>
>>>>>> I guess there is nothing for sanitize_dead_code to do because my
>>>>>> conditional is "if (likely(some_condition)) { do something }" and the
>>>>>> branch instruction itself is '.seen' by the verifier.
>>>
>>>>> I bet your variable is not 'const'.
>>>>> Please see any of the progs in selftests that do:
>>>>> const volatile int var = 123;
>>>>> to express configs.
>>>
>>>> Yeah, I was testing against the following:
>>>
>>>>      extern int CONFIG_XYZ __kconfig __weak;
>>>
>>>> But ended up writing this small reproducer:
>>>
>>>>      struct __sk_buff;
>>>
>>>>      const volatile int CONFIG_DROP = 1; // volatile so it's not
>>>>                                          // clang-optimized
>>>
>>>>      __attribute__((section("tc"), used))
>>>>      int my_config(struct __sk_buff *skb)
>>>>      {
>>>>              int ret = 0; /*TC_ACT_OK*/
>>>
>>>>              if (CONFIG_DROP)
>>>>                      ret = 2 /*TC_ACT_SHOT*/;
>>>
>>>>              return ret;
>>>>      }
>>>
>>>> $ bpftool map dump name my_confi.rodata
>>>
>>>> [{
>>>>           "value": {
>>>>               ".rodata": [{
>>>>                       "CONFIG_DROP": 1
>>>>                   }
>>>>               ]
>>>>           }
>>>>       }
>>>> ]
>>>
>>>> $ bpftool prog dump xlated name my_config
>>>
>>>> int my_config(struct __sk_buff * skb):
>>>> ; if (CONFIG_DROP)
>>>>      0: (18) r1 = map[id:3][0]+0
>>>>      2: (61) r1 = *(u32 *)(r1 +0)
>>>>      3: (b4) w0 = 1
>>>> ; if (CONFIG_DROP)
>>>>      4: (64) w0 <<= 1
>>>> ; return ret;
>>>>      5: (95) exit
>>>
>>>> The branch is gone, but the map lookup is still there :-(
>>>
>>> Attached another RFC below which is doing the same but from the verifier
>>> side. It seems we should be able to resolve LD+LDX if their dst_reg
>>> is the same? If they are different, we should be able to pre-lookup
>>> LDX value at least. Would something like this work (haven't run full
>>> verifier/test_progs yet)?
>>>
>>> (note, in this case, with kconfig, I still see the branch)
>>>
>>>   test_fold_ro_ldx:PASS:open 0 nsec
>>>   test_fold_ro_ldx:PASS:load 0 nsec
>>>   test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
>>>   int fold_ro_ldx(struct __sk_buff * skb):
>>>   ; if (CONFIG_DROP)
>>>      0: (b7) r1 = 1
>>>      1: (b4) w0 = 1
>>>   ; if (CONFIG_DROP)
>>>      2: (16) if w1 == 0x0 goto pc+1
>>>      3: (b4) w0 = 2
>>>   ; return ret;
>>>      4: (95) exit
>>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
>>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
>>>   #66      fold_ro_ldx:OK
>>>
>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>> ---
>>>   kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
>>>   .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
>>>   .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
>>>   3 files changed, 144 insertions(+), 2 deletions(-)
>>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
>>>   create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index c59c3df0fea6..ffedd8234288 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
>>> bpf_map *map)
>>>                map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
>>>   }
>>>
>>> +/* if the map is read-only, we can try to fully resolve the load */
>>> +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
>>> +                                struct bpf_map *map,
>>> +                                struct bpf_insn *insn)
>>> +{
>>> +     struct bpf_insn *ldx_insn = insn + 2;
>>> +     int dst_reg = ldx_insn->dst_reg;
>>> +     u64 val = 0;
>>> +     int size;
>>> +     int err;
>>> +
>>> +     if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
>>> +             return false;
>>> +
>>> +     /* 0: BPF_LD  r=MAP
>>> +      * 1: BPF_LD  r=MAP
>>> +      * 2: BPF_LDX r=MAP->VAL
>>> +      */
>>> +
>>> +     if (BPF_CLASS((insn+0)->code) != BPF_LD ||
>>> +         BPF_CLASS((insn+1)->code) != BPF_LD ||
>>> +         BPF_CLASS((insn+2)->code) != BPF_LDX)
>>> +             return false;
>>> +
>>> +     if (BPF_MODE((insn+0)->code) != BPF_IMM ||
>>> +         BPF_MODE((insn+1)->code) != BPF_IMM ||
>>> +         BPF_MODE((insn+2)->code) != BPF_MEM)
>>> +             return false;
>>> +
>>> +     if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
>>> +         insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
>>> +             return false;
>>> +
>>> +     if (insn->dst_reg != ldx_insn->src_reg)
>>> +             return false;
>>> +
>>> +     if (ldx_insn->off != 0)
>>> +             return false;
>>> +
>>> +     size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
>>> +     if (size < 0 || size > 4)
>>> +             return false;
>>> +
>>> +     err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
>>> +     if (err)
>>> +             return false;
>>> +
>>> +     if (insn->dst_reg == ldx_insn->dst_reg) {
>>> +             /* LDX is using the same destination register as LD.
>>> +              * This means we are not interested in the map
>>> +              * pointer itself and can remove it.
>>> +              */
>>> +             *(insn + 0) = BPF_JMP_A(0);
>>> +             *(insn + 1) = BPF_JMP_A(0);
>>> +             *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
>> Have you figured out why the branch is not removed
>> with BPF_ALU64_IMM(BPF_MOV) ?
> 
> I do have an idea, yes, but I'm not 100% certain. The rewrite has
> nothing to do with it.
> If I change the default ret from 1 to 0, I get a different bytecode
> where this branch is eventually removed by the verifier.
> 
> I think it comes down to the following snippet:
> r1 = 0 ll
> r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> w0 = 1
> if w1 == 0 goto +1
> w0 = 2
> exit
> Here, 'if w1 == 0' is never taken, but it has been 'seen' by the
> verifier. There is no 'dead' code, it just jumps around 'w0 = 2' which
> has seen=true.
> 
> VS this one:
> r1 = 0 ll
> r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> w0 = 1
> if w1 != 0 goto +1
> w0 = 0
> w0 <<= 1
> exit
> Here, 'if w1 != 0' is seen, but the next insn has 'seen=false', so
> this whole thing is ripped out.
> 
> So basically, from my quick look, it seems like there should be some
> real dead code to trigger the removal. If you simply have 'if
> condition_that_never_happens goto +1' which doesn't lead to some dead
> code, this single jump-over-some-seen-code is never gonna be removed.
> So, IMO, that's something that should be addressed independently.
> 
>> Can it also support 8 bytes (BPF_DW) ?  Is it because there
>> is not enough space for ld_imm64?  so wonder if this
>> patching can be done in do_misc_fixups() instead.
> 
> Yeah, I've limited it to 4 bytes because of sizeof(imm) for now.
> I think one complication might be that at do_misc_fixups point, the
> immediate args of bpf_ld have been rewritten which might make it
> harder to get the data offset.
> 
> But I guess I'm still at the point where I'm trying to understand
> whether what I'm doing makes sense or not :-) And whether we should do
> it at the verifier level, in libbpf or if at all..

I think the approach above is a little bit fragile.
There is no guarantee that ldx immediately after ld.
Also, after rewrite, some redundant instructions still
left (e.g., map pointer load) although it can be rewritten
as a nop.

In general, 'const volatile int var' should be good enough
although it still have one extra load instruction. How much
did you see performance hit with this extra load?

If we truely want to generate best code (no libbpf/verifier
rewrite with nop's) robustly, we can implement this in llvm as
a CO-RE relocation. In such cases, compiler will be
able to generate
      r = <patchable_value>
      ... using r ...

So in the above code, the CO-RE based approach will generate:
   r1 = <patchable value>  // we can make it always 64bit value
   w0 = 1
   if w1 != 0 goto +1
   w0 = 0
   w0 <<= 1
   exit

This will only work for upto 64bit int types.
The kconfig itself supports array as well for which
CO-RE won't work.

> 
> 
> 
> 
>>> +             return true;
>>> +     }
>>> +
>>> +     *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
>>> +     /* Only LDX can be resolved, we still have to resolve LD address. */
>>> +     return false;
>>> +}
Stanislav Fomichev July 21, 2022, 11:16 p.m. UTC | #10
On Thu, Jul 21, 2022 at 3:30 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/20/22 3:33 PM, Stanislav Fomichev wrote:
> > On Wed, Jul 20, 2022 at 1:53 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >>
> >> On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@google.com wrote:
> >>> On 07/19, sdf@google.com wrote:
> >>>> On 07/19, Alexei Starovoitov wrote:
> >>>>> On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>
> >>>> wrote:
> >>>>>>
> >>>>>> On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> >>>>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> >>>>> <sdf@google.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Motivation:
> >>>>>>>>>
> >>>>>>>>> Our bpf programs have a bunch of options which are set at the
> >>>>> loading
> >>>>>>>>> time. After loading, they don't change. We currently use array
> >>>> map
> >>>>>>>>> to store them and bpf program does the following:
> >>>>>>>>>
> >>>>>>>>> val = bpf_map_lookup_elem(&config_map, &key);
> >>>>>>>>> if (likely(val && *val)) {
> >>>>>>>>>    // do some optional feature
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Since the configuration is static and we have a lot of those
> >>>>> features,
> >>>>>>>>> I feel like we're wasting precious cycles doing dynamic lookups
> >>>>>>>>> (and stalling on memory loads).
> >>>>>>>>>
> >>>>>>>>> I was assuming that converting those to some fake kconfig
> >>>> options
> >>>>>>>>> would solve it, but it still seems like kconfig is stored in the
> >>>>>>>>> global map and kconfig entries are resolved dynamically.
> >>>>>>>>>
> >>>>>>>>> Proposal:
> >>>>>>>>>
> >>>>>>>>> Resolve kconfig options statically upon loading. Basically
> >>>> rewrite
> >>>>>>>>> ld+ldx to two nops and 'mov val, x'.
> >>>>>>>>>
> >>>>>>>>> I'm also trying to rewrite conditional jump when the condition
> >>>> is
> >>>>>>>>> !imm. This seems to be catching all the cases in my program, but
> >>>>>>>>> it's probably too hacky.
> >>>>>>>>>
> >>>>>>>>> I've attached very raw RFC patch to demonstrate the idea.
> >>>> Anything
> >>>>>>>>> I'm missing? Any potential problems with this approach?
> >>>>>>>>
> >>>>>>>> Have you considered using global variables for that?
> >>>>>>>> With skeleton the user space has a natural way to set
> >>>>>>>> all of these knobs after doing skel_open and before skel_load.
> >>>>>>>> Then the verifier sees them as readonly vars and
> >>>>>>>> automatically converts LDX into fixed constants and if the code
> >>>>>>>> looks like if (my_config_var) then the verifier will remove
> >>>>>>>> all the dead code too.
> >>>>>>>
> >>>>>>> Hm, that's a good alternative, let me try it out. Thanks!
> >>>>>>
> >>>>>> Turns out we already freeze kconfig map in libbpf:
> >>>>>> if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> >>>>>>          err = bpf_map_freeze(map->fd);
> >>>>>>
> >>>>>> And I've verified that I do hit bpf_map_direct_read in the verifier.
> >>>>>>
> >>>>>> But the code still stays the same (bpftool dump xlated):
> >>>>>>    72: (18) r1 = map[id:24][0]+20
> >>>>>>    74: (61) r1 = *(u32 *)(r1 +0)
> >>>>>>    75: (bf) r2 = r9
> >>>>>>    76: (b7) r0 = 0
> >>>>>>    77: (15) if r1 == 0x0 goto pc+9
> >>>>>>
> >>>>>> I guess there is nothing for sanitize_dead_code to do because my
> >>>>>> conditional is "if (likely(some_condition)) { do something }" and the
> >>>>>> branch instruction itself is '.seen' by the verifier.
> >>>
> >>>>> I bet your variable is not 'const'.
> >>>>> Please see any of the progs in selftests that do:
> >>>>> const volatile int var = 123;
> >>>>> to express configs.
> >>>
> >>>> Yeah, I was testing against the following:
> >>>
> >>>>      extern int CONFIG_XYZ __kconfig __weak;
> >>>
> >>>> But ended up writing this small reproducer:
> >>>
> >>>>      struct __sk_buff;
> >>>
> >>>>      const volatile int CONFIG_DROP = 1; // volatile so it's not
> >>>>                                          // clang-optimized
> >>>
> >>>>      __attribute__((section("tc"), used))
> >>>>      int my_config(struct __sk_buff *skb)
> >>>>      {
> >>>>              int ret = 0; /*TC_ACT_OK*/
> >>>
> >>>>              if (CONFIG_DROP)
> >>>>                      ret = 2 /*TC_ACT_SHOT*/;
> >>>
> >>>>              return ret;
> >>>>      }
> >>>
> >>>> $ bpftool map dump name my_confi.rodata
> >>>
> >>>> [{
> >>>>           "value": {
> >>>>               ".rodata": [{
> >>>>                       "CONFIG_DROP": 1
> >>>>                   }
> >>>>               ]
> >>>>           }
> >>>>       }
> >>>> ]
> >>>
> >>>> $ bpftool prog dump xlated name my_config
> >>>
> >>>> int my_config(struct __sk_buff * skb):
> >>>> ; if (CONFIG_DROP)
> >>>>      0: (18) r1 = map[id:3][0]+0
> >>>>      2: (61) r1 = *(u32 *)(r1 +0)
> >>>>      3: (b4) w0 = 1
> >>>> ; if (CONFIG_DROP)
> >>>>      4: (64) w0 <<= 1
> >>>> ; return ret;
> >>>>      5: (95) exit
> >>>
> >>>> The branch is gone, but the map lookup is still there :-(
> >>>
> >>> Attached another RFC below which is doing the same but from the verifier
> >>> side. It seems we should be able to resolve LD+LDX if their dst_reg
> >>> is the same? If they are different, we should be able to pre-lookup
> >>> LDX value at least. Would something like this work (haven't run full
> >>> verifier/test_progs yet)?
> >>>
> >>> (note, in this case, with kconfig, I still see the branch)
> >>>
> >>>   test_fold_ro_ldx:PASS:open 0 nsec
> >>>   test_fold_ro_ldx:PASS:load 0 nsec
> >>>   test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
> >>>   int fold_ro_ldx(struct __sk_buff * skb):
> >>>   ; if (CONFIG_DROP)
> >>>      0: (b7) r1 = 1
> >>>      1: (b4) w0 = 1
> >>>   ; if (CONFIG_DROP)
> >>>      2: (16) if w1 == 0x0 goto pc+1
> >>>      3: (b4) w0 = 2
> >>>   ; return ret;
> >>>      4: (95) exit
> >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> >>>   #66      fold_ro_ldx:OK
> >>>
> >>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>> ---
> >>>   kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
> >>>   .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
> >>>   .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
> >>>   3 files changed, 144 insertions(+), 2 deletions(-)
> >>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
> >>>   create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index c59c3df0fea6..ffedd8234288 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
> >>> bpf_map *map)
> >>>                map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> >>>   }
> >>>
> >>> +/* if the map is read-only, we can try to fully resolve the load */
> >>> +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
> >>> +                                struct bpf_map *map,
> >>> +                                struct bpf_insn *insn)
> >>> +{
> >>> +     struct bpf_insn *ldx_insn = insn + 2;
> >>> +     int dst_reg = ldx_insn->dst_reg;
> >>> +     u64 val = 0;
> >>> +     int size;
> >>> +     int err;
> >>> +
> >>> +     if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
> >>> +             return false;
> >>> +
> >>> +     /* 0: BPF_LD  r=MAP
> >>> +      * 1: BPF_LD  r=MAP
> >>> +      * 2: BPF_LDX r=MAP->VAL
> >>> +      */
> >>> +
> >>> +     if (BPF_CLASS((insn+0)->code) != BPF_LD ||
> >>> +         BPF_CLASS((insn+1)->code) != BPF_LD ||
> >>> +         BPF_CLASS((insn+2)->code) != BPF_LDX)
> >>> +             return false;
> >>> +
> >>> +     if (BPF_MODE((insn+0)->code) != BPF_IMM ||
> >>> +         BPF_MODE((insn+1)->code) != BPF_IMM ||
> >>> +         BPF_MODE((insn+2)->code) != BPF_MEM)
> >>> +             return false;
> >>> +
> >>> +     if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
> >>> +         insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
> >>> +             return false;
> >>> +
> >>> +     if (insn->dst_reg != ldx_insn->src_reg)
> >>> +             return false;
> >>> +
> >>> +     if (ldx_insn->off != 0)
> >>> +             return false;
> >>> +
> >>> +     size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
> >>> +     if (size < 0 || size > 4)
> >>> +             return false;
> >>> +
> >>> +     err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
> >>> +     if (err)
> >>> +             return false;
> >>> +
> >>> +     if (insn->dst_reg == ldx_insn->dst_reg) {
> >>> +             /* LDX is using the same destination register as LD.
> >>> +              * This means we are not interested in the map
> >>> +              * pointer itself and can remove it.
> >>> +              */
> >>> +             *(insn + 0) = BPF_JMP_A(0);
> >>> +             *(insn + 1) = BPF_JMP_A(0);
> >>> +             *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> >> Have you figured out why the branch is not removed
> >> with BPF_ALU64_IMM(BPF_MOV) ?
> >
> > I do have an idea, yes, but I'm not 100% certain. The rewrite has
> > nothing to do with it.
> > If I change the default ret from 1 to 0, I get a different bytecode
> > where this branch is eventually removed by the verifier.
> >
> > I think it comes down to the following snippet:
> > r1 = 0 ll
> > r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> > w0 = 1
> > if w1 == 0 goto +1
> > w0 = 2
> > exit
> > Here, 'if w1 == 0' is never taken, but it has been 'seen' by the
> > verifier. There is no 'dead' code, it just jumps around 'w0 = 2' which
> > has seen=true.
> >
> > VS this one:
> > r1 = 0 ll
> > r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> > w0 = 1
> > if w1 != 0 goto +1
> > w0 = 0
> > w0 <<= 1
> > exit
> > Here, 'if w1 != 0' is seen, but the next insn has 'seen=false', so
> > this whole thing is ripped out.
> >
> > So basically, from my quick look, it seems like there should be some
> > real dead code to trigger the removal. If you simply have 'if
> > condition_that_never_happens goto +1' which doesn't lead to some dead
> > code, this single jump-over-some-seen-code is never gonna be removed.
> > So, IMO, that's something that should be addressed independently.
> >
> >> Can it also support 8 bytes (BPF_DW) ?  Is it because there
> >> is not enough space for ld_imm64?  so wonder if this
> >> patching can be done in do_misc_fixups() instead.
> >
> > Yeah, I've limited it to 4 bytes because of sizeof(imm) for now.
> > I think one complication might be that at do_misc_fixups point, the
> > immediate args of bpf_ld have been rewritten which might make it
> > harder to get the data offset.
> >
> > But I guess I'm still at the point where I'm trying to understand
> > whether what I'm doing makes sense or not :-) And whether we should do
> > it at the verifier level, in libbpf or if at all..
>
> I think the approach above is a little bit fragile.
> There is no guarantee that ldx immediately after ld.
> Also, after rewrite, some redundant instructions still
> left (e.g., map pointer load) although it can be rewritten
> as a nop.

Yeah, it's possible there is no ldx, or it's possible that map's ptr
in ld is used by some other instruction later on. I agree that all
this rewriting is still best case and might be a bit ugly :-(

> In general, 'const volatile int var' should be good enough
> although it still have one extra load instruction. How much
> did you see performance hit with this extra load?

I haven't tested anything yet. I think I'll try to convert my prog to
kconfig and then have:

#ifdef STATIC_CONFIG
const int CONFIG_XYZ = 1;
#else
extern const int CONFIG_XYZ __kconfig __weak;
#endif

And maybe roll that STATIC_CONFIG=true version somewhere for a week or
two to see whether there is any effect.

> If we truely want to generate best code (no libbpf/verifier
> rewrite with nop's) robustly, we can implement this in llvm as
> a CO-RE relocation. In such cases, compiler will be
> able to generate
>       r = <patchable_value>
>       ... using r ...
>
> So in the above code, the CO-RE based approach will generate:
>    r1 = <patchable value>  // we can make it always 64bit value
>    w0 = 1
>    if w1 != 0 goto +1
>    w0 = 0
>    w0 <<= 1
>    exit
>
> This will only work for upto 64bit int types.
> The kconfig itself supports array as well for which
> CO-RE won't work.

What is the criteria for emitting a relocation in this case? Something
like a __kconfig tag?
In my initial attempt, I've been doing these rewrites on the libbpf
side, so maybe that's the way to go if we can emit proper relocations
for ld+ldx?

[1] https://lore.kernel.org/bpf/7b2e6553-5bcc-1c70-2c4b-78e95593755b@fb.com/T/#m5efdc6672ff3da98f806375381d5e055060cbe54

> >
> >
> >
> >
> >>> +             return true;
> >>> +     }
> >>> +
> >>> +     *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> >>> +     /* Only LDX can be resolved, we still have to resolve LD address. */
> >>> +     return false;
> >>> +}
Andrii Nakryiko July 29, 2022, 6:29 p.m. UTC | #11
On Thu, Jul 21, 2022 at 4:16 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Thu, Jul 21, 2022 at 3:30 PM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 7/20/22 3:33 PM, Stanislav Fomichev wrote:
> > > On Wed, Jul 20, 2022 at 1:53 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >>
> > >> On Wed, Jul 20, 2022 at 11:04:59AM -0700, sdf@google.com wrote:
> > >>> On 07/19, sdf@google.com wrote:
> > >>>> On 07/19, Alexei Starovoitov wrote:
> > >>>>> On Tue, Jul 19, 2022 at 2:41 PM Stanislav Fomichev <sdf@google.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>> On Tue, Jul 19, 2022 at 1:33 PM Stanislav Fomichev <sdf@google.com>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>> On Tue, Jul 19, 2022 at 1:21 PM Alexei Starovoitov
> > >>>>>>> <alexei.starovoitov@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>> On Mon, Jul 18, 2022 at 12:07 PM Stanislav Fomichev
> > >>>>> <sdf@google.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Motivation:
> > >>>>>>>>>
> > >>>>>>>>> Our bpf programs have a bunch of options which are set at the
> > >>>>> loading
> > >>>>>>>>> time. After loading, they don't change. We currently use array
> > >>>> map
> > >>>>>>>>> to store them and bpf program does the following:
> > >>>>>>>>>
> > >>>>>>>>> val = bpf_map_lookup_elem(&config_map, &key);
> > >>>>>>>>> if (likely(val && *val)) {
> > >>>>>>>>>    // do some optional feature
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> Since the configuration is static and we have a lot of those
> > >>>>> features,
> > >>>>>>>>> I feel like we're wasting precious cycles doing dynamic lookups
> > >>>>>>>>> (and stalling on memory loads).
> > >>>>>>>>>
> > >>>>>>>>> I was assuming that converting those to some fake kconfig
> > >>>> options
> > >>>>>>>>> would solve it, but it still seems like kconfig is stored in the
> > >>>>>>>>> global map and kconfig entries are resolved dynamically.
> > >>>>>>>>>
> > >>>>>>>>> Proposal:
> > >>>>>>>>>
> > >>>>>>>>> Resolve kconfig options statically upon loading. Basically
> > >>>> rewrite
> > >>>>>>>>> ld+ldx to two nops and 'mov val, x'.
> > >>>>>>>>>
> > >>>>>>>>> I'm also trying to rewrite conditional jump when the condition
> > >>>> is
> > >>>>>>>>> !imm. This seems to be catching all the cases in my program, but
> > >>>>>>>>> it's probably too hacky.
> > >>>>>>>>>
> > >>>>>>>>> I've attached very raw RFC patch to demonstrate the idea.
> > >>>> Anything
> > >>>>>>>>> I'm missing? Any potential problems with this approach?
> > >>>>>>>>
> > >>>>>>>> Have you considered using global variables for that?
> > >>>>>>>> With skeleton the user space has a natural way to set
> > >>>>>>>> all of these knobs after doing skel_open and before skel_load.
> > >>>>>>>> Then the verifier sees them as readonly vars and
> > >>>>>>>> automatically converts LDX into fixed constants and if the code
> > >>>>>>>> looks like if (my_config_var) then the verifier will remove
> > >>>>>>>> all the dead code too.
> > >>>>>>>
> > >>>>>>> Hm, that's a good alternative, let me try it out. Thanks!
> > >>>>>>
> > >>>>>> Turns out we already freeze kconfig map in libbpf:
> > >>>>>> if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) {
> > >>>>>>          err = bpf_map_freeze(map->fd);
> > >>>>>>
> > >>>>>> And I've verified that I do hit bpf_map_direct_read in the verifier.
> > >>>>>>
> > >>>>>> But the code still stays the same (bpftool dump xlated):
> > >>>>>>    72: (18) r1 = map[id:24][0]+20
> > >>>>>>    74: (61) r1 = *(u32 *)(r1 +0)
> > >>>>>>    75: (bf) r2 = r9
> > >>>>>>    76: (b7) r0 = 0
> > >>>>>>    77: (15) if r1 == 0x0 goto pc+9
> > >>>>>>
> > >>>>>> I guess there is nothing for sanitize_dead_code to do because my
> > >>>>>> conditional is "if (likely(some_condition)) { do something }" and the
> > >>>>>> branch instruction itself is '.seen' by the verifier.
> > >>>
> > >>>>> I bet your variable is not 'const'.
> > >>>>> Please see any of the progs in selftests that do:
> > >>>>> const volatile int var = 123;
> > >>>>> to express configs.
> > >>>
> > >>>> Yeah, I was testing against the following:
> > >>>
> > >>>>      extern int CONFIG_XYZ __kconfig __weak;
> > >>>
> > >>>> But ended up writing this small reproducer:
> > >>>
> > >>>>      struct __sk_buff;
> > >>>
> > >>>>      const volatile int CONFIG_DROP = 1; // volatile so it's not
> > >>>>                                          // clang-optimized
> > >>>
> > >>>>      __attribute__((section("tc"), used))
> > >>>>      int my_config(struct __sk_buff *skb)
> > >>>>      {
> > >>>>              int ret = 0; /*TC_ACT_OK*/
> > >>>
> > >>>>              if (CONFIG_DROP)
> > >>>>                      ret = 2 /*TC_ACT_SHOT*/;
> > >>>
> > >>>>              return ret;
> > >>>>      }
> > >>>
> > >>>> $ bpftool map dump name my_confi.rodata
> > >>>
> > >>>> [{
> > >>>>           "value": {
> > >>>>               ".rodata": [{
> > >>>>                       "CONFIG_DROP": 1
> > >>>>                   }
> > >>>>               ]
> > >>>>           }
> > >>>>       }
> > >>>> ]
> > >>>
> > >>>> $ bpftool prog dump xlated name my_config
> > >>>
> > >>>> int my_config(struct __sk_buff * skb):
> > >>>> ; if (CONFIG_DROP)
> > >>>>      0: (18) r1 = map[id:3][0]+0
> > >>>>      2: (61) r1 = *(u32 *)(r1 +0)
> > >>>>      3: (b4) w0 = 1
> > >>>> ; if (CONFIG_DROP)
> > >>>>      4: (64) w0 <<= 1
> > >>>> ; return ret;
> > >>>>      5: (95) exit
> > >>>
> > >>>> The branch is gone, but the map lookup is still there :-(
> > >>>
> > >>> Attached another RFC below which is doing the same but from the verifier
> > >>> side. It seems we should be able to resolve LD+LDX if their dst_reg
> > >>> is the same? If they are different, we should be able to pre-lookup
> > >>> LDX value at least. Would something like this work (haven't run full
> > >>> verifier/test_progs yet)?
> > >>>
> > >>> (note, in this case, with kconfig, I still see the branch)
> > >>>
> > >>>   test_fold_ro_ldx:PASS:open 0 nsec
> > >>>   test_fold_ro_ldx:PASS:load 0 nsec
> > >>>   test_fold_ro_ldx:PASS:bpf_obj_get_info_by_fd 0 nsec
> > >>>   int fold_ro_ldx(struct __sk_buff * skb):
> > >>>   ; if (CONFIG_DROP)
> > >>>      0: (b7) r1 = 1
> > >>>      1: (b4) w0 = 1
> > >>>   ; if (CONFIG_DROP)
> > >>>      2: (16) if w1 == 0x0 goto pc+1
> > >>>      3: (b4) w0 = 2
> > >>>   ; return ret;
> > >>>      4: (95) exit
> > >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LD 0 nsec
> > >>>   test_fold_ro_ldx:PASS:found BPF_LDX 0 nsec
> > >>>   #66      fold_ro_ldx:OK
> > >>>
> > >>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > >>> ---
> > >>>   kernel/bpf/verifier.c                         | 74 ++++++++++++++++++-
> > >>>   .../selftests/bpf/prog_tests/fold_ro_ldx.c    | 52 +++++++++++++
> > >>>   .../testing/selftests/bpf/progs/fold_ro_ldx.c | 20 +++++
> > >>>   3 files changed, 144 insertions(+), 2 deletions(-)
> > >>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/fold_ro_ldx.c
> > >>>   create mode 100644 tools/testing/selftests/bpf/progs/fold_ro_ldx.c
> > >>>
> > >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > >>> index c59c3df0fea6..ffedd8234288 100644
> > >>> --- a/kernel/bpf/verifier.c
> > >>> +++ b/kernel/bpf/verifier.c
> > >>> @@ -12695,6 +12695,69 @@ static bool bpf_map_is_cgroup_storage(struct
> > >>> bpf_map *map)
> > >>>                map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > >>>   }
> > >>>
> > >>> +/* if the map is read-only, we can try to fully resolve the load */
> > >>> +static bool fold_ro_pseudo_ldimm64(struct bpf_verifier_env *env,
> > >>> +                                struct bpf_map *map,
> > >>> +                                struct bpf_insn *insn)
> > >>> +{
> > >>> +     struct bpf_insn *ldx_insn = insn + 2;
> > >>> +     int dst_reg = ldx_insn->dst_reg;
> > >>> +     u64 val = 0;
> > >>> +     int size;
> > >>> +     int err;
> > >>> +
> > >>> +     if (!bpf_map_is_rdonly(map) || !map->ops->map_direct_value_addr)
> > >>> +             return false;
> > >>> +
> > >>> +     /* 0: BPF_LD  r=MAP
> > >>> +      * 1: BPF_LD  r=MAP
> > >>> +      * 2: BPF_LDX r=MAP->VAL
> > >>> +      */
> > >>> +
> > >>> +     if (BPF_CLASS((insn+0)->code) != BPF_LD ||
> > >>> +         BPF_CLASS((insn+1)->code) != BPF_LD ||
> > >>> +         BPF_CLASS((insn+2)->code) != BPF_LDX)
> > >>> +             return false;
> > >>> +
> > >>> +     if (BPF_MODE((insn+0)->code) != BPF_IMM ||
> > >>> +         BPF_MODE((insn+1)->code) != BPF_IMM ||
> > >>> +         BPF_MODE((insn+2)->code) != BPF_MEM)
> > >>> +             return false;
> > >>> +
> > >>> +     if (insn->src_reg != BPF_PSEUDO_MAP_VALUE &&
> > >>> +         insn->src_reg != BPF_PSEUDO_MAP_IDX_VALUE)
> > >>> +             return false;
> > >>> +
> > >>> +     if (insn->dst_reg != ldx_insn->src_reg)
> > >>> +             return false;
> > >>> +
> > >>> +     if (ldx_insn->off != 0)
> > >>> +             return false;
> > >>> +
> > >>> +     size = bpf_size_to_bytes(BPF_SIZE(ldx_insn->code));
> > >>> +     if (size < 0 || size > 4)
> > >>> +             return false;
> > >>> +
> > >>> +     err = bpf_map_direct_read(map, (insn+1)->imm, size, &val);
> > >>> +     if (err)
> > >>> +             return false;
> > >>> +
> > >>> +     if (insn->dst_reg == ldx_insn->dst_reg) {
> > >>> +             /* LDX is using the same destination register as LD.
> > >>> +              * This means we are not interested in the map
> > >>> +              * pointer itself and can remove it.
> > >>> +              */
> > >>> +             *(insn + 0) = BPF_JMP_A(0);
> > >>> +             *(insn + 1) = BPF_JMP_A(0);
> > >>> +             *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> > >> Have you figured out why the branch is not removed
> > >> with BPF_ALU64_IMM(BPF_MOV) ?
> > >
> > > I do have an idea, yes, but I'm not 100% certain. The rewrite has
> > > nothing to do with it.
> > > If I change the default ret from 1 to 0, I get a different bytecode
> > > where this branch is eventually removed by the verifier.
> > >
> > > I think it comes down to the following snippet:
> > > r1 = 0 ll
> > > r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> > > w0 = 1
> > > if w1 == 0 goto +1
> > > w0 = 2
> > > exit
> > > Here, 'if w1 == 0' is never taken, but it has been 'seen' by the
> > > verifier. There is no 'dead' code, it just jumps around 'w0 = 2' which
> > > has seen=true.
> > >
> > > VS this one:
> > > r1 = 0 ll
> > > r1 = *(u32 *)(r1 + 0) <<< rodata, verifier is able to resolve to r1 = 1
> > > w0 = 1
> > > if w1 != 0 goto +1
> > > w0 = 0
> > > w0 <<= 1
> > > exit
> > > Here, 'if w1 != 0' is seen, but the next insn has 'seen=false', so
> > > this whole thing is ripped out.
> > >
> > > So basically, from my quick look, it seems like there should be some
> > > real dead code to trigger the removal. If you simply have 'if
> > > condition_that_never_happens goto +1' which doesn't lead to some dead
> > > code, this single jump-over-some-seen-code is never gonna be removed.
> > > So, IMO, that's something that should be addressed independently.
> > >
> > >> Can it also support 8 bytes (BPF_DW) ?  Is it because there
> > >> is not enough space for ld_imm64?  so wonder if this
> > >> patching can be done in do_misc_fixups() instead.
> > >
> > > Yeah, I've limited it to 4 bytes because of sizeof(imm) for now.
> > > I think one complication might be that at do_misc_fixups point, the
> > > immediate args of bpf_ld have been rewritten which might make it
> > > harder to get the data offset.
> > >
> > > But I guess I'm still at the point where I'm trying to understand
> > > whether what I'm doing makes sense or not :-) And whether we should do
> > > it at the verifier level, in libbpf or if at all..
> >
> > I think the approach above is a little bit fragile.
> > There is no guarantee that ldx immediately after ld.
> > Also, after rewrite, some redundant instructions still
> > left (e.g., map pointer load) although it can be rewritten
> > as a nop.
>
> Yeah, it's possible there is no ldx, or it's possible that map's ptr
> in ld is used by some other instruction later on. I agree that all
> this rewriting is still best case and might be a bit ugly :-(
>
> > In general, 'const volatile int var' should be good enough
> > although it still have one extra load instruction. How much
> > did you see performance hit with this extra load?
>
> I haven't tested anything yet. I think I'll try to convert my prog to
> kconfig and then have:
>
> #ifdef STATIC_CONFIG
> const int CONFIG_XYZ = 1;

make sure that this constant is really inlined in the code by
compiler, otherwise it might be left as global variable, which is
*exactly* what __kconfig is and so you won't see any difference at
all.


In general, these "left over" register assignments after dead code
elimination pass is more general than just __kconfig. So if we wanted
to actually solve this, we'd need to teach BPF verifier to do some
additional flow analysis and eliminate instructions that write to
registers that are never used. This would work for __kconfig, any
.rodata and more use cases.

But I'd start with measuring if there is any real performance benefit
to that in practical BPF programs before investing time and effort in
optimizing this away.

Teaching libbpf to do this hacky pattern matching seems wrong. This is
not purely __kconfig thing, it's not even any read-only variable
thing. There are more cases where we leave unnecessary instructions
(e.g., stuff like if (bpf_core_type_exists(...))) will leave useless
register assignment, because verifier will know that one of the other
branch of if is taken, but rX = 1 (or 0); will be left in BPF program
code anyway.

But whether it's justified for a BPF verifier to do optimization
passes to eliminate such tiny inefficiencies, that's what I'm curious
about.


> #else
> extern const int CONFIG_XYZ __kconfig __weak;
> #endif
>
> And maybe roll that STATIC_CONFIG=true version somewhere for a week or
> two to see whether there is any effect.
>
> > If we truely want to generate best code (no libbpf/verifier
> > rewrite with nop's) robustly, we can implement this in llvm as
> > a CO-RE relocation. In such cases, compiler will be
> > able to generate
> >       r = <patchable_value>
> >       ... using r ...
> >
> > So in the above code, the CO-RE based approach will generate:
> >    r1 = <patchable value>  // we can make it always 64bit value
> >    w0 = 1
> >    if w1 != 0 goto +1
> >    w0 = 0
> >    w0 <<= 1
> >    exit
> >
> > This will only work for upto 64bit int types.
> > The kconfig itself supports array as well for which
> > CO-RE won't work.
>
> What is the criteria for emitting a relocation in this case? Something
> like a __kconfig tag?
> In my initial attempt, I've been doing these rewrites on the libbpf
> side, so maybe that's the way to go if we can emit proper relocations
> for ld+ldx?
>
> [1] https://lore.kernel.org/bpf/7b2e6553-5bcc-1c70-2c4b-78e95593755b@fb.com/T/#m5efdc6672ff3da98f806375381d5e055060cbe54
>
> > >
> > >
> > >
> > >
> > >>> +             return true;
> > >>> +     }
> > >>> +
> > >>> +     *(insn + 2) = BPF_ALU64_IMM(BPF_MOV, dst_reg, val);
> > >>> +     /* Only LDX can be resolved, we still have to resolve LD address. */
> > >>> +     return false;
> > >>> +}
diff mbox series

Patch

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index f081de398b60..89d11b891735 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -1596,6 +1596,9 @@  static int load_with_options(int argc, char **argv, bool first_prog_only)
 		/* log_level1 + log_level2 + stats, but not stable UAPI */
 		open_opts.kernel_log_level = 1 + 2 + 4;
 
+	if (getenv("BPFTOOL_KCONFIG"))
+		open_opts.kconfig = getenv("BPFTOOL_KCONFIG");
+
 	obj = bpf_object__open_file(file, &open_opts);
 	if (libbpf_get_error(obj)) {
 		p_err("failed to open object file");
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 77ae83308199..58b45eb9a30a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5773,11 +5773,48 @@  bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 				if (obj->gen_loader) {
 					insn[0].src_reg = BPF_PSEUDO_MAP_IDX_VALUE;
 					insn[0].imm = obj->kconfig_map_idx;
+					insn[1].imm = ext->kcfg.data_off;
 				} else {
-					insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
-					insn[0].imm = obj->maps[obj->kconfig_map_idx].fd;
+					bool is_set = ext->is_weak && ext->is_set;
+					bool is_ld_ldx = (BPF_CLASS(insn[0].code) == BPF_LD) &&
+						         insn[1].code == 0 &&
+							 (BPF_CLASS(insn[2].code) == BPF_LDX);
+
+					if (is_set && is_ld_ldx) {
+						__u32 dst_reg = insn[2].dst_reg; /* dst_reg is unchanged */
+						__s32 imm = 1; /* TODO: put real config value here */
+						int j;
+
+						/* Resolve dynamic kconfig lookups. */
+						insn[0] = BPF_JMP_A(0);
+						insn[1] = BPF_JMP_A(0);
+						insn[2] = BPF_ALU64_IMM(BPF_MOV, dst_reg, imm);
+
+						/* Look ahead at 7 insns. */
+						for (j = 3; j < 10; j++) {
+							/* SKETCHY?
+							 *
+							 * We can also replace conditional jump when it triggers with the opposite value.
+							 * Scan a bunch of insns and stop as long we hit a jump or some other
+							 * operation on the dst_reg.
+							 */
+							if (BPF_CLASS(insn[j].code) == BPF_JMP) {
+								/* next insn is jump */
+								if (insn[j].dst_reg == dst_reg && /* that operates on the same dst_reg */
+								    insn[j].imm == !imm) /* and the imm is the opposite */
+									insn[j] = BPF_JMP_A(0);
+								break;
+							} else {
+								if (insn[j].dst_reg == dst_reg)
+									break;
+							}
+						}
+					} else {
+						insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
+						insn[0].imm = obj->maps[obj->kconfig_map_idx].fd;
+						insn[1].imm = ext->kcfg.data_off;
+					}
 				}
-				insn[1].imm = ext->kcfg.data_off;
 			} else /* EXT_KSYM */ {
 				if (ext->ksym.type_id && ext->is_set) { /* typed ksyms */
 					insn[0].src_reg = BPF_PSEUDO_BTF_ID;