[bpf-next,6/7] libbpf: BPF Static Keys support

Message ID	20231206141030.1478753-7-aspsk@isovalent.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=isovalent.com header.i=@isovalent.com header.b="HLrsHCqv" From: Anton Protopopov <aspsk@isovalent.com> To: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jiri Olsa <jolsa@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Stanislav Fomichev <sdf@google.com>, bpf@vger.kernel.org Cc: Anton Protopopov <aspsk@isovalent.com> Subject: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support Date: Wed, 6 Dec 2023 14:10:29 +0000 Message-Id: <20231206141030.1478753-7-aspsk@isovalent.com> In-Reply-To: <20231206141030.1478753-1-aspsk@isovalent.com> References: <20231206141030.1478753-1-aspsk@isovalent.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	BPF Static Keys \| expand [bpf-next,0/7] BPF Static Keys [bpf-next,1/7] bpf: extract bpf_prog_bind_map logic into an inline helper [bpf-next,2/7] bpf: rename and export a struct definition [bpf-next,3/7] bpf: adjust functions offsets when patching progs [bpf-next,4/7] bpf: implement BPF Static Keys support [bpf-next,5/7] bpf: x86: implement static keys support [bpf-next,6/7] libbpf: BPF Static Keys support [bpf-next,7/7] selftests/bpf: Add tests for BPF Static Keys

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/apply	fail	Patch does not apply to bpf-next
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for Lint
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-37	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-39	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-40	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-41	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-42	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-43	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-44	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-45	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-46	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-47	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-48	success	Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-49	success	Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-50	success	Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-51	success	Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-52	success	Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-53	success	Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-54	success	Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-55	success	Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-56	success	Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-57	success	Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-58	success	Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-next-VM_Test-59	success	Logs for x86_64-llvm-16 / veristat

Anton Protopopov Dec. 6, 2023, 2:10 p.m. UTC

Introduce the DEFINE_STATIC_KEY() and bpf_static_branch_{unlikely,likely}
macros to mimic Linux Kernel Static Keys API in BPF. Example of usage would
be as follows:

    DEFINE_STATIC_KEY(key);

    void prog(void)
    {
            if (bpf_static_branch_unlikely(&key))
                    /* rarely used code */
            else
                    /* default hot path code */
    }

or, using the likely variant:

    void prog2(void)
    {
            if (bpf_static_branch_likely(&key))
                    /* default hot path code */
            else
                    /* rarely used code */
    }

The "unlikely" version of macro compiles in the code where the else-branch
(key is off) is fall-through, the "likely" macro prioritises the if-branch.

Both macros push an entry in a new ".jump_table" section which contains the
following information:

               32 bits                   32 bits           64 bits
    offset of jump instruction | offset of jump target |    flags

The corresponding ".rel.jump_table" relocations table entry contains the
base section name and the static key (map) name. The bigger portion of this
patch works on parsing, relocating and sending this information to kernel
via the static_branches_info and static_branches_info_size attributes of
the BPF_PROG_LOAD syscall.

The same key may be used multiple times in one program and can be used by
multiple BPF programs. BPF doesn't provide guarantees on order in which
static branches controlled by one key are patched.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 tools/lib/bpf/bpf.c             |   5 +-
 tools/lib/bpf/bpf.h             |   4 +-
 tools/lib/bpf/bpf_helpers.h     |  64 ++++++++
 tools/lib/bpf/libbpf.c          | 273 +++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf_internal.h |   3 +
 tools/lib/bpf/linker.c          |   8 +-
 6 files changed, 351 insertions(+), 6 deletions(-)

Alexei Starovoitov Dec. 8, 2023, 3:45 a.m. UTC | #1

On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> +
> +static __always_inline int __bpf_static_branch_jump(void *static_key)
> +{
> +       asm goto("1:\n\t"
> +               "goto %l[l_yes]\n\t"
> +               ".pushsection .jump_table, \"aw\"\n\t"
> +               ".balign 8\n\t"
> +               ".long 1b - .\n\t"
> +               ".long %l[l_yes] - .\n\t"
> +               ".quad %c0 - . + 1\n\t"
> +               ".popsection\n\t"
> +               :: "i" (static_key)
> +               :: l_yes);
> +       return 0;
> +l_yes:
> +       return 1;
> +}

Could you add a test to patch 7 where the same subprog is
used by multiple main progs and another test where a prog
with static_keys is statically linked by libbpf into another prog?
I suspect these cases are not handled by libbpf in the series.
The adjustment of insn offsets is tricky and I don't see this logic
in patch 5.

The special handling of JA insn (if it's listed in
static_branches_info[]) is fragile. The check_cfg() and the verifier
main loop are two places so far, but JA is an unconditional jump.
Every tool that understands BPF ISA would have to treat JA as "maybe
it's not a jump in this case" and that is concerning.

I certainly see the appeal of copy-pasting kernel's static_branch logic,
but we can do better since we're not bound by x86 ISA.

How about we introduce a new insn JA_MAYBE insn, and check_cfg and
the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
The new instruction will indicate that it may jump or fall through.
Another bit could be used to indicate a "default" action (jump vs
fallthrough) which will be used by JITs to translate into nop or jmp.
Once it's a part of the instruction stream we can have bpf prog callable
kfunc that can flip JA_MAYBE target
(I think this feature is missing in the patch set. It's necessary
to add an ability for bpf prog to flip static_branch. Currently
you're proposing to do it from user space only),
and there will be no need to supply static_branches_info[] at prog load time.
The libbpf static linking will work as-is without extra code.

JA_MAYBE will also remove the need for extra bpf map type.
The bpf prog can _optionally_ do '.long 1b - .' asm trick and
store the address of JA_MAYBE insn into an arbitrary 8 byte value
(either in a global variable, a section or somewhere else).
I think it's necessary to decouple patching of JA_MAYBE vs naming
the location.
The ".pushsection .jump_table" should be optional.
The kernel's static_key approach hard codes them together, but
it's due to x86 limitations.
We can introduce JA_MAYBE and use it everywhere in the bpf prog and
do not add names or addresses next to them. Then 'bpftool prog dump' can
retrieve the insn stream and another command can patch a specific
instruction (because JA_MAYBE is easy to spot vs regular JA).
At the end it's just a text_poke_bp() to convert
a target of JA_MAYBE.
The bpf prog can be written with
 asm goto("go_maybe +0, %l[l_yes]")
without names and maps, and the jumps will follow the indicated
'default' branch (how about, the 1st listed is default, the 2nd is
maybe).
The kernel static keys cannot be flipped atomically anyway,
so multiple branches using the same static key is equivalent to an
array of addresses that are flipped one by one.

I suspect the main use case for isovalent is to compile a bpf prog
with debug code that is not running by default and then flip
various parts when debugging is necessary.
With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
but no extra map and the prog will load fine. Then you can patch
all of such insns or subset of them on demand.
(The verifier will allow patching of JA_MAYBE only between two targets,
so no safety issues).
I think it's more flexible than the new map type and static_branches_info[].
wdyt?

Anton Protopopov Dec. 8, 2023, 4:19 p.m. UTC | #2

On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > +
> > +static __always_inline int __bpf_static_branch_jump(void *static_key)
> > +{
> > +       asm goto("1:\n\t"
> > +               "goto %l[l_yes]\n\t"
> > +               ".pushsection .jump_table, \"aw\"\n\t"
> > +               ".balign 8\n\t"
> > +               ".long 1b - .\n\t"
> > +               ".long %l[l_yes] - .\n\t"
> > +               ".quad %c0 - . + 1\n\t"
> > +               ".popsection\n\t"
> > +               :: "i" (static_key)
> > +               :: l_yes);
> > +       return 0;
> > +l_yes:
> > +       return 1;
> > +}
> 
> Could you add a test to patch 7 where the same subprog is
> used by multiple main progs and another test where a prog
> with static_keys is statically linked by libbpf into another prog?
> I suspect these cases are not handled by libbpf in the series.
> The adjustment of insn offsets is tricky and I don't see this logic
> in patch 5.
> 
> The special handling of JA insn (if it's listed in
> static_branches_info[]) is fragile. The check_cfg() and the verifier
> main loop are two places so far, but JA is an unconditional jump.
> Every tool that understands BPF ISA would have to treat JA as "maybe
> it's not a jump in this case" and that is concerning.

Will do, thanks.

> I certainly see the appeal of copy-pasting kernel's static_branch logic,
> but we can do better since we're not bound by x86 ISA.
> 
> How about we introduce a new insn JA_MAYBE insn, and check_cfg and
> the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
> The new instruction will indicate that it may jump or fall through.
> Another bit could be used to indicate a "default" action (jump vs
> fallthrough) which will be used by JITs to translate into nop or jmp.
> Once it's a part of the instruction stream we can have bpf prog callable
> kfunc that can flip JA_MAYBE target
> (I think this feature is missing in the patch set. It's necessary
> to add an ability for bpf prog to flip static_branch. Currently
> you're proposing to do it from user space only),
> and there will be no need to supply static_branches_info[] at prog load time.
> The libbpf static linking will work as-is without extra code.
> 
> JA_MAYBE will also remove the need for extra bpf map type.
> The bpf prog can _optionally_ do '.long 1b - .' asm trick and
> store the address of JA_MAYBE insn into an arbitrary 8 byte value
> (either in a global variable, a section or somewhere else).
> I think it's necessary to decouple patching of JA_MAYBE vs naming
> the location.
> The ".pushsection .jump_table" should be optional.
> The kernel's static_key approach hard codes them together, but
> it's due to x86 limitations.
> We can introduce JA_MAYBE and use it everywhere in the bpf prog and
> do not add names or addresses next to them. Then 'bpftool prog dump' can
> retrieve the insn stream and another command can patch a specific
> instruction (because JA_MAYBE is easy to spot vs regular JA).
> At the end it's just a text_poke_bp() to convert
> a target of JA_MAYBE.
> The bpf prog can be written with
>  asm goto("go_maybe +0, %l[l_yes]")
> without names and maps, and the jumps will follow the indicated
> 'default' branch (how about, the 1st listed is default, the 2nd is
> maybe).
> The kernel static keys cannot be flipped atomically anyway,
> so multiple branches using the same static key is equivalent to an
> array of addresses that are flipped one by one.

Thanks for the detailed review. You're right, without adding a new
instruction non-kernel observers can't distinguish between a JA and a
"static branch JA". This also makes sense to encode direct/inverse flag as
well (and more, see below). I would call the new instruction something like
JA_CFG, emphasizing the fact that this is a JA which can be configured by
an external force.

We also can easily add a kfunc API, however, I am for keeping the "map API"
in place (in addition to more fine-grained API you've proposed). See my
considerations and examples below.

> I suspect the main use case for isovalent is to compile a bpf prog
> with debug code that is not running by default and then flip
> various parts when debugging is necessary.
> With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
> but no extra map and the prog will load fine. Then you can patch
> all of such insns or subset of them on demand.
> (The verifier will allow patching of JA_MAYBE only between two targets,
> so no safety issues).
> I think it's more flexible than the new map type and static_branches_info[].
> wdyt?

Here is how I think about API. Imagine that we have this new instruction,
JA_CFG, and that a program uses it in multiple places. If we don't mark
those instructions, then after compilation an external observer can't
distinguish between them. We don't know if this is supposed to be
instructions controlled by one key or another. We may care about this, say,
when a program uses key A for enabling/disabling debug and key B to
enable/disable another optional feature.

If we push offsets of jumps in a separate section via the "'.long 1b - .'
asm trick", then we will have all the same problems with relocations which
is fragile, as you've shown. What we can do instead is to encode "local key
id" inside the instruction. This local_id is local to the program where it
is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused
offset/imm, as one of them will be unused. As for me, 4 may be enough for
the initial implementation.) This way we can distinguish between different
keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)`
kfunc can be used to toggle this key on/off for a particular program. This
way we don't care about relocation, and API is straightforward.

However, if this is the only API we provide, then this makes user's life
hard, as they will have to keep track of ids, and programs used, and
mapping from "global" id to local ids for each program (when multiple
programs use the same static key, which is desirable). If we keep the
higher-level "map API", then this simplifies user's life: on a program load
a user can send a list of (local_id -> map) mappings, and then toggle all
the branches controlled by "a [global] static key" by either

    bpf(MAP_UPDATE_ELEM, map, value)

or

    kfunc bpf_static_key_set(map, value)

whatever is more useful. (I think that keeping the bpf(2) userspace API is
worth doing it, as otherwise this, again, makes life harder: users would
have to recompile/update iterator programs if new programs using a static
key are added, etc.)

Libbpf can simplify life even more by automatically allocating local ids
and passing mappings to kernel for a program from the
`bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
about this, if don't want to. Again, no relocations are required here.

So, to summarize:

  * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
    program, FLAGS is 0/1 for normal/inverse branches

  * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which
    toggles all the branches with ID=key_id in the given prog

  * Extend bpf_attr with a list of (local_id -> map) mappings. This is an
    optinal list if user doesn't want one or all branches to be controlled
    by a map

  * A new kfunc `bpf_static_key_set(map, value)` which toggles all the
    static branches in all programs which use `map` to control branches

  * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in
    a single syscall without the requirement to create iterators

Andrii Nakryiko Dec. 8, 2023, 10:04 p.m. UTC | #3

On Fri, Dec 8, 2023 at 8:22 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote:
> > On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > >
> > > +
> > > +static __always_inline int __bpf_static_branch_jump(void *static_key)
> > > +{
> > > +       asm goto("1:\n\t"
> > > +               "goto %l[l_yes]\n\t"
> > > +               ".pushsection .jump_table, \"aw\"\n\t"
> > > +               ".balign 8\n\t"
> > > +               ".long 1b - .\n\t"
> > > +               ".long %l[l_yes] - .\n\t"
> > > +               ".quad %c0 - . + 1\n\t"
> > > +               ".popsection\n\t"
> > > +               :: "i" (static_key)
> > > +               :: l_yes);
> > > +       return 0;
> > > +l_yes:
> > > +       return 1;
> > > +}
> >
> > Could you add a test to patch 7 where the same subprog is
> > used by multiple main progs and another test where a prog
> > with static_keys is statically linked by libbpf into another prog?
> > I suspect these cases are not handled by libbpf in the series.
> > The adjustment of insn offsets is tricky and I don't see this logic
> > in patch 5.
> >
> > The special handling of JA insn (if it's listed in
> > static_branches_info[]) is fragile. The check_cfg() and the verifier
> > main loop are two places so far, but JA is an unconditional jump.
> > Every tool that understands BPF ISA would have to treat JA as "maybe
> > it's not a jump in this case" and that is concerning.
>
> Will do, thanks.
>
> > I certainly see the appeal of copy-pasting kernel's static_branch logic,
> > but we can do better since we're not bound by x86 ISA.
> >
> > How about we introduce a new insn JA_MAYBE insn, and check_cfg and
> > the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
> > The new instruction will indicate that it may jump or fall through.
> > Another bit could be used to indicate a "default" action (jump vs
> > fallthrough) which will be used by JITs to translate into nop or jmp.
> > Once it's a part of the instruction stream we can have bpf prog callable
> > kfunc that can flip JA_MAYBE target
> > (I think this feature is missing in the patch set. It's necessary
> > to add an ability for bpf prog to flip static_branch. Currently
> > you're proposing to do it from user space only),
> > and there will be no need to supply static_branches_info[] at prog load time.
> > The libbpf static linking will work as-is without extra code.
> >
> > JA_MAYBE will also remove the need for extra bpf map type.
> > The bpf prog can _optionally_ do '.long 1b - .' asm trick and
> > store the address of JA_MAYBE insn into an arbitrary 8 byte value
> > (either in a global variable, a section or somewhere else).
> > I think it's necessary to decouple patching of JA_MAYBE vs naming
> > the location.
> > The ".pushsection .jump_table" should be optional.
> > The kernel's static_key approach hard codes them together, but
> > it's due to x86 limitations.
> > We can introduce JA_MAYBE and use it everywhere in the bpf prog and
> > do not add names or addresses next to them. Then 'bpftool prog dump' can
> > retrieve the insn stream and another command can patch a specific
> > instruction (because JA_MAYBE is easy to spot vs regular JA).
> > At the end it's just a text_poke_bp() to convert
> > a target of JA_MAYBE.
> > The bpf prog can be written with
> >  asm goto("go_maybe +0, %l[l_yes]")
> > without names and maps, and the jumps will follow the indicated
> > 'default' branch (how about, the 1st listed is default, the 2nd is
> > maybe).
> > The kernel static keys cannot be flipped atomically anyway,
> > so multiple branches using the same static key is equivalent to an
> > array of addresses that are flipped one by one.
>
> Thanks for the detailed review. You're right, without adding a new
> instruction non-kernel observers can't distinguish between a JA and a
> "static branch JA". This also makes sense to encode direct/inverse flag as
> well (and more, see below). I would call the new instruction something like
> JA_CFG, emphasizing the fact that this is a JA which can be configured by
> an external force.
>
> We also can easily add a kfunc API, however, I am for keeping the "map API"
> in place (in addition to more fine-grained API you've proposed). See my
> considerations and examples below.
>
> > I suspect the main use case for isovalent is to compile a bpf prog
> > with debug code that is not running by default and then flip
> > various parts when debugging is necessary.
> > With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
> > but no extra map and the prog will load fine. Then you can patch
> > all of such insns or subset of them on demand.
> > (The verifier will allow patching of JA_MAYBE only between two targets,
> > so no safety issues).
> > I think it's more flexible than the new map type and static_branches_info[].
> > wdyt?
>
> Here is how I think about API. Imagine that we have this new instruction,
> JA_CFG, and that a program uses it in multiple places. If we don't mark
> those instructions, then after compilation an external observer can't
> distinguish between them. We don't know if this is supposed to be
> instructions controlled by one key or another. We may care about this, say,
> when a program uses key A for enabling/disabling debug and key B to
> enable/disable another optional feature.
>
> If we push offsets of jumps in a separate section via the "'.long 1b - .'
> asm trick", then we will have all the same problems with relocations which
> is fragile, as you've shown. What we can do instead is to encode "local key
> id" inside the instruction. This local_id is local to the program where it
> is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused
> offset/imm, as one of them will be unused. As for me, 4 may be enough for
> the initial implementation.) This way we can distinguish between different
> keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)`
> kfunc can be used to toggle this key on/off for a particular program. This
> way we don't care about relocation, and API is straightforward.

I feel like embedding some sort of ID inside the instruction is very..
unusual, shall we say?

When I was reading commit messages and discussion, I couldn't shake
the impression that this got to be solved with the help of
relocations. How about something like below (not very well thought
out, sorry) for how this can be put together both from user-space and
kernel sides.

1. We can add a special SEC(".static_keys") section, in which we
define special global variables representing a static key. These
variables will be forced to be global to ensure unique names and stuff
like that.

2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
one such special global variable and and instructs compiler to emit
relocation between static key variable and JMP_CFG instruction.

Libbpf will properly update these relocations during static linking
and subprog rearrangement, just like we do it for map references
today.

3. libbpf takes care of creating a special map (could be ARRAY with a
special flag, but a dedicated map with multiple entries might be
cleaner), in which each static key variable is represented as a
separate key.

3.5 When libbpf loads BPF program, I guess it will have to upload a
multi-mapping from static_key_id to insn_off (probably we should allow
multiple inst_offs per one static_key_id), so that kernel can identify
which instructions are to be updated.

4. Similar to SEC(".kconfig") libbpf will actually also have a
read-only global variables map, where each variable's value is static
key integer ID that corresponds to a static key in that special map
from #3. This allows user-space to easily get this ID without
hard-coding or guessing anything, it's just

int static_key_id = skel->static_keys->my_static_key;

Then you can do bpf_map_update_elem(&skel->maps.static_key_map,
&static_key_id, 0/1) to flip the switch from user-space.

5. From the BPF side we use global variable semantics similarly to
obtain integer key, then we can use a dedicated special kfunc that
will accept prog, static_key_id, and desired state to flip that jump
whichever way.


So on BPF side it will look roughly like this:

int static_key1 SEC(".static_keys");

...

if (bpf_static_key_likely(&static_key1)) {
   ...
}

...

/* to switch it on or off */
bpf_static_key_switch(&static_key1, 1);


From user-space side I basically already showed:


bpf_map_update_elem(bpf_map__fd(skel->static_keys->static_key1),
                    &skel->static_keys->static_key1, 1);


Something along these lines.


>
> However, if this is the only API we provide, then this makes user's life
> hard, as they will have to keep track of ids, and programs used, and
> mapping from "global" id to local ids for each program (when multiple
> programs use the same static key, which is desirable). If we keep the
> higher-level "map API", then this simplifies user's life: on a program load
> a user can send a list of (local_id -> map) mappings, and then toggle all
> the branches controlled by "a [global] static key" by either
>
>     bpf(MAP_UPDATE_ELEM, map, value)
>
> or
>
>     kfunc bpf_static_key_set(map, value)
>
> whatever is more useful. (I think that keeping the bpf(2) userspace API is
> worth doing it, as otherwise this, again, makes life harder: users would
> have to recompile/update iterator programs if new programs using a static
> key are added, etc.)
>
> Libbpf can simplify life even more by automatically allocating local ids
> and passing mappings to kernel for a program from the
> `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
> about this, if don't want to. Again, no relocations are required here.
>
> So, to summarize:
>
>   * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
>     program, FLAGS is 0/1 for normal/inverse branches
>

+1 for a dedicated instruction


>   * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which
>     toggles all the branches with ID=key_id in the given prog
>
>   * Extend bpf_attr with a list of (local_id -> map) mappings. This is an
>     optinal list if user doesn't want one or all branches to be controlled
>     by a map
>
>   * A new kfunc `bpf_static_key_set(map, value)` which toggles all the
>     static branches in all programs which use `map` to control branches
>
>   * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in
>     a single syscall without the requirement to create iterators

Let's see if my above proposal fits this as a way to simplify
allocating static key IDs and keeping them in sync between BPF and
user-space sides.

Eduard Zingerman Dec. 8, 2023, 11:07 p.m. UTC | #4

On Fri, 2023-12-08 at 14:04 -0800, Andrii Nakryiko wrote:
[...]
> > However, if this is the only API we provide, then this makes user's life
> > hard, as they will have to keep track of ids, and programs used, and
> > mapping from "global" id to local ids for each program (when multiple
> > programs use the same static key, which is desirable). If we keep the
> > higher-level "map API", then this simplifies user's life: on a program load
> > a user can send a list of (local_id -> map) mappings, and then toggle all
> > the branches controlled by "a [global] static key" by either
> > 
> >     bpf(MAP_UPDATE_ELEM, map, value)
> > 
> > or
> > 
> >     kfunc bpf_static_key_set(map, value)
> > 
> > whatever is more useful. (I think that keeping the bpf(2) userspace API is
> > worth doing it, as otherwise this, again, makes life harder: users would
> > have to recompile/update iterator programs if new programs using a static
> > key are added, etc.)
> > 
> > Libbpf can simplify life even more by automatically allocating local ids
> > and passing mappings to kernel for a program from the
> > `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
> > about this, if don't want to. Again, no relocations are required here.
> > 
> > So, to summarize:
> > 
> >   * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
> >     program, FLAGS is 0/1 for normal/inverse branches
> > 
> 
> +1 for a dedicated instruction

fwiw, if relocations are used instead of IDs the new instruction does
not have to be a control flow. It might be a mov that sets target
register to a value that verifier treats as unknown. At runtime this
mov could be patched to assign different values. Granted it would be
three instructions:

  mov rax, 0;
  cmp rax, 0;
  je ...
  
instead of one, but I don't believe there would noticeable performance
difference. On a plus side: even simpler verification, likely/unlikely
for free, no need to track if branch is inverted.

Alexei Starovoitov Dec. 9, 2023, 4:05 a.m. UTC | #5

On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>
> I feel like embedding some sort of ID inside the instruction is very..
> unusual, shall we say?

yeah. no magic numbers inside insns pls.

I don't like JA_CFG name, since I read CFG as control flow graph,
while you probably meant CFG as configurable.
How about BPF_JA_OR_NOP ?
Then in combination with BPF_JMP or BPF_JMP32 modifier
the insn->off|imm will be used.
1st bit in src_reg can indicate the default action: nop or jmp.
In asm it may look like asm("goto_or_nop +5")

> 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
> one such special global variable and and instructs compiler to emit
> relocation between static key variable and JMP_CFG instruction.
>
> Libbpf will properly update these relocations during static linking
> and subprog rearrangement, just like we do it for map references
> today.

Right. libbpf has RELO_SUBPROG_ADDR.
This new relo will be pretty much that.
And we have proper C syntax for taking an address: &&label.
The bpf_static_branch macro can use it.
We wanted to add it for a long time to support proper
switch() and jmp tables.

I don't like IDs and new map type for this.
The macro can have 'branch_name' as one of the arguments and
it will populate addresses of insns into "name.static_branch" section.

From libbpf pov it will be yet another global section which
is represented as a traditional bpf array of one element.
No extra handling on the libbpf side.

The question is how to represent the "address" of the insn.
I think 4 byte prog_id + 4 byte insn_idx will do.

Then bpf prog can pass such "address" into bpf_static_branch_enable/disable
kfunc.

The user space can iterate over 8 byte "addresses"
in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE
syscall cmds.
We can have a helper on libbpf side for that.

I see no need to introduce a new map type just to reuse map_update_elem cmd.

Alexei Starovoitov Dec. 9, 2023, 4:07 a.m. UTC | #6

On Fri, Dec 8, 2023 at 3:07 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
>
> fwiw, if relocations are used instead of IDs the new instruction does
> not have to be a control flow. It might be a mov that sets target
> register to a value that verifier treats as unknown. At runtime this
> mov could be patched to assign different values. Granted it would be
> three instructions:
>
>   mov rax, 0;
>   cmp rax, 0;
>   je ...
>
> instead of one, but I don't believe there would noticeable performance
> difference. On a plus side: even simpler verification, likely/unlikely
> for free, no need to track if branch is inverted.

'je' is a conditional jmp. cpu will mispredict it sooner or later
and depending on the density of jmps around it the misprediction
can be severe.
It has to be an unconditional jmp to be fast.

Yonghong Song Dec. 9, 2023, 4:15 a.m. UTC | #7

On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> I feel like embedding some sort of ID inside the instruction is very..
>> unusual, shall we say?
> yeah. no magic numbers inside insns pls.
>
> I don't like JA_CFG name, since I read CFG as control flow graph,
> while you probably meant CFG as configurable.
> How about BPF_JA_OR_NOP ?
> Then in combination with BPF_JMP or BPF_JMP32 modifier
> the insn->off|imm will be used.
> 1st bit in src_reg can indicate the default action: nop or jmp.
> In asm it may look like asm("goto_or_nop +5")

How does the C source code looks like in order to generate
BPF_JA_OR_NOP insn? Any source examples?

>
>> 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
>> one such special global variable and and instructs compiler to emit
>> relocation between static key variable and JMP_CFG instruction.
>>
>> Libbpf will properly update these relocations during static linking
>> and subprog rearrangement, just like we do it for map references
>> today.
> Right. libbpf has RELO_SUBPROG_ADDR.
> This new relo will be pretty much that.
> And we have proper C syntax for taking an address: &&label.
> The bpf_static_branch macro can use it.
> We wanted to add it for a long time to support proper
> switch() and jmp tables.
>
> I don't like IDs and new map type for this.
> The macro can have 'branch_name' as one of the arguments and
> it will populate addresses of insns into "name.static_branch" section.
>
>  From libbpf pov it will be yet another global section which
> is represented as a traditional bpf array of one element.
> No extra handling on the libbpf side.
>
> The question is how to represent the "address" of the insn.
> I think 4 byte prog_id + 4 byte insn_idx will do.
>
> Then bpf prog can pass such "address" into bpf_static_branch_enable/disable
> kfunc.
>
> The user space can iterate over 8 byte "addresses"
> in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE
> syscall cmds.
> We can have a helper on libbpf side for that.
>
> I see no need to introduce a new map type just to reuse map_update_elem cmd.
>

Alexei Starovoitov Dec. 9, 2023, 4:25 a.m. UTC | #8

On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >>
> >> I feel like embedding some sort of ID inside the instruction is very..
> >> unusual, shall we say?
> > yeah. no magic numbers inside insns pls.
> >
> > I don't like JA_CFG name, since I read CFG as control flow graph,
> > while you probably meant CFG as configurable.
> > How about BPF_JA_OR_NOP ?
> > Then in combination with BPF_JMP or BPF_JMP32 modifier
> > the insn->off|imm will be used.
> > 1st bit in src_reg can indicate the default action: nop or jmp.
> > In asm it may look like asm("goto_or_nop +5")
>
> How does the C source code looks like in order to generate
> BPF_JA_OR_NOP insn? Any source examples?

It will be in inline asm only. The address of that insn will
be taken either via && or via asm (".long %l[label]").
From llvm pov both should go through the same relo creation logic. I hope :)

Yonghong Song Dec. 9, 2023, 5:04 a.m. UTC | #9

On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>> <andrii.nakryiko@gmail.com> wrote:
>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>> unusual, shall we say?
>>> yeah. no magic numbers inside insns pls.
>>>
>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>> while you probably meant CFG as configurable.
>>> How about BPF_JA_OR_NOP ?
>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>> the insn->off|imm will be used.
>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>> In asm it may look like asm("goto_or_nop +5")
>> How does the C source code looks like in order to generate
>> BPF_JA_OR_NOP insn? Any source examples?
> It will be in inline asm only. The address of that insn will
> be taken either via && or via asm (".long %l[label]").
>  From llvm pov both should go through the same relo creation logic. I hope :)

A hack in llvm below with an example, could you check whether the C 
syntax and object dump result
is what you want to see?

diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp 
b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
index 90697c6645be..38b1cbc31f9a 100644
--- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
+++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
@@ -231,6 +231,7 @@ public:
          .Case("call", true)
          .Case("goto", true)
          .Case("gotol", true)
+        .Case("goto_or_nop", true)
          .Case("*", true)
          .Case("exit", true)
          .Case("lock", true)
@@ -259,6 +260,7 @@ public:
          .Case("bswap64", true)
          .Case("goto", true)
          .Case("gotol", true)
+        .Case("goto_or_nop", true)
          .Case("ll", true)
          .Case("skb", true)
          .Case("s", true)
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td 
b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 5972c9d49c51..a953d10429bf 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, 
list<dag> Pattern>
    let BPFClass = BPF_JMP;
  }

+class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
+    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
+                   (outs),
+                   (ins brtarget:$BrDst),
+                   !strconcat(OpcodeStr, " $BrDst"),
+                   Pattern> {
+  bits<16> BrDst;
+
+  let Inst{47-32} = BrDst;
+  let Inst{31-0} = 1;
+  let BPFClass = BPF_JMP;
+}
+
  class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
      : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
                     (outs),
@@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
  let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
    def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
    def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
+  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
  }

  // Jump and link


And an example,

[ ~/tmp1/gotol]$ cat t.c
int bar(void);
int foo()
{
         int a, b;

         asm volatile goto ("r0 = 0; \
                             goto_or_nop %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
"::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
[ ~/tmp1/gotol]$ clang --target=bpf -O2 -S t.c
[ ~/tmp1/gotol]$ cat t.s
         .text
         .file   "t.c"
         .globl  foo                             # -- Begin function foo
         .p2align        3
         .type   foo,@function
foo:                                    # @foo
# %bb.0:                                # %entry
         r0 = 0
         #APP
         r0 = 0
         goto_or_nop LBB0_2
         r2 = 2
         r3 = 3

         #NO_APP
# %bb.1:                                # %asm.fallthrough
         call bar
         r0 *= 20
LBB0_2:                                 # Block address taken
                                         # %label
                                         # Label of block must be emitted
         exit
.Lfunc_end0:
         .size   foo, .Lfunc_end0-foo
                                         # -- End function
         .addrsig
[ ~/tmp1/gotol]$ clang --target=bpf -O2 -c t.c
[ ~/tmp1/gotol]$ llvm-objdump -dr t.o

t.o:    file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
        2:       05 00 04 00 01 00 00 00 goto_or_nop +0x4 <LBB0_2>
        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
        5:       85 10 00 00 ff ff ff ff call -0x1
                 0000000000000028:  R_BPF_64_32  bar
        6:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000038 <LBB0_2>:
        7:       95 00 00 00 00 00 00 00 exit
[ ~/tmp1/gotol]$

Alexei Starovoitov Dec. 9, 2023, 5:18 p.m. UTC | #10

On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> >>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> >>> <andrii.nakryiko@gmail.com> wrote:
> >>>> I feel like embedding some sort of ID inside the instruction is very..
> >>>> unusual, shall we say?
> >>> yeah. no magic numbers inside insns pls.
> >>>
> >>> I don't like JA_CFG name, since I read CFG as control flow graph,
> >>> while you probably meant CFG as configurable.
> >>> How about BPF_JA_OR_NOP ?
> >>> Then in combination with BPF_JMP or BPF_JMP32 modifier
> >>> the insn->off|imm will be used.
> >>> 1st bit in src_reg can indicate the default action: nop or jmp.
> >>> In asm it may look like asm("goto_or_nop +5")
> >> How does the C source code looks like in order to generate
> >> BPF_JA_OR_NOP insn? Any source examples?
> > It will be in inline asm only. The address of that insn will
> > be taken either via && or via asm (".long %l[label]").
> >  From llvm pov both should go through the same relo creation logic. I hope :)
>
> A hack in llvm below with an example, could you check whether the C
> syntax and object dump result
> is what you want to see?

Thank you for the ultra quick llvm diff!

>
> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> index 90697c6645be..38b1cbc31f9a 100644
> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> @@ -231,6 +231,7 @@ public:
>           .Case("call", true)
>           .Case("goto", true)
>           .Case("gotol", true)
> +        .Case("goto_or_nop", true)
>           .Case("*", true)
>           .Case("exit", true)
>           .Case("lock", true)
> @@ -259,6 +260,7 @@ public:
>           .Case("bswap64", true)
>           .Case("goto", true)
>           .Case("gotol", true)
> +        .Case("goto_or_nop", true)
>           .Case("ll", true)
>           .Case("skb", true)
>           .Case("s", true)
> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
> b/llvm/lib/Target/BPF/BPFInstrInfo.td
> index 5972c9d49c51..a953d10429bf 100644
> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
> list<dag> Pattern>
>     let BPFClass = BPF_JMP;
>   }
>
> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> +                   (outs),
> +                   (ins brtarget:$BrDst),
> +                   !strconcat(OpcodeStr, " $BrDst"),
> +                   Pattern> {
> +  bits<16> BrDst;
> +
> +  let Inst{47-32} = BrDst;
> +  let Inst{31-0} = 1;
> +  let BPFClass = BPF_JMP;
> +}
> +
>   class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>       : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>                      (outs),
> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>   let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>     def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>     def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;

I was thinking of burning the new 0xE opcode for it,
but you're right. It's a flavor of existing JA insn and it's indeed
better to just use src_reg=1 bit to indicate so.

We probably need to use the 2nd bit of src_reg to indicate its default state
(jmp or fallthrough).

>          asm volatile goto ("r0 = 0; \
>                              goto_or_nop %l[label]; \
>                              r2 = 2; \
>                              r3 = 3; \

Not sure how to represent the default state in assembly though.
"goto_or_nop" defaults to goto
"nop_or_goto" default to nop
?

Do we need "gotol" for imm32 or will it be automatic?

Yonghong Song Dec. 10, 2023, 6:32 a.m. UTC | #11

On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>>>> unusual, shall we say?
>>>>> yeah. no magic numbers inside insns pls.
>>>>>
>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>> while you probably meant CFG as configurable.
>>>>> How about BPF_JA_OR_NOP ?
>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>> the insn->off|imm will be used.
>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>> In asm it may look like asm("goto_or_nop +5")
>>>> How does the C source code looks like in order to generate
>>>> BPF_JA_OR_NOP insn? Any source examples?
>>> It will be in inline asm only. The address of that insn will
>>> be taken either via && or via asm (".long %l[label]").
>>>   From llvm pov both should go through the same relo creation logic. I hope :)
>> A hack in llvm below with an example, could you check whether the C
>> syntax and object dump result
>> is what you want to see?
> Thank you for the ultra quick llvm diff!
>
>> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> index 90697c6645be..38b1cbc31f9a 100644
>> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> @@ -231,6 +231,7 @@ public:
>>            .Case("call", true)
>>            .Case("goto", true)
>>            .Case("gotol", true)
>> +        .Case("goto_or_nop", true)
>>            .Case("*", true)
>>            .Case("exit", true)
>>            .Case("lock", true)
>> @@ -259,6 +260,7 @@ public:
>>            .Case("bswap64", true)
>>            .Case("goto", true)
>>            .Case("gotol", true)
>> +        .Case("goto_or_nop", true)
>>            .Case("ll", true)
>>            .Case("skb", true)
>>            .Case("s", true)
>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
>> b/llvm/lib/Target/BPF/BPFInstrInfo.td
>> index 5972c9d49c51..a953d10429bf 100644
>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
>> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
>> list<dag> Pattern>
>>      let BPFClass = BPF_JMP;
>>    }
>>
>> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>> +                   (outs),
>> +                   (ins brtarget:$BrDst),
>> +                   !strconcat(OpcodeStr, " $BrDst"),
>> +                   Pattern> {
>> +  bits<16> BrDst;
>> +
>> +  let Inst{47-32} = BrDst;
>> +  let Inst{31-0} = 1;
>> +  let BPFClass = BPF_JMP;
>> +}
>> +
>>    class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>        : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>                       (outs),
>> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>>    let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>>      def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>>      def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
>> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
> I was thinking of burning the new 0xE opcode for it,
> but you're right. It's a flavor of existing JA insn and it's indeed
> better to just use src_reg=1 bit to indicate so.

Right, using src_reg to indicate a new flavor of JA insn sounds
a good idea. My previously-used 'imm' field is a pure hack.

>
> We probably need to use the 2nd bit of src_reg to indicate its default state
> (jmp or fallthrough).

Good point.

>
>>           asm volatile goto ("r0 = 0; \
>>                               goto_or_nop %l[label]; \
>>                               r2 = 2; \
>>                               r3 = 3; \
> Not sure how to represent the default state in assembly though.
> "goto_or_nop" defaults to goto
> "nop_or_goto" default to nop
> ?
>
> Do we need "gotol" for imm32 or will it be automatic?

It won't be automatic.

At the end of this email, I will show the new change
to have gotol_or_nop and nop_or_gotol insn and an example
to show it in asm. But there is an issue here.
In my example, the compiler (more specifically
the InstCombine pass) moved some code after
the 'label' to before the 'label'. Not exactly
sure how to prevent this. Maybe current
'asm goto' already have a way to handle
this. Will investigate this later.


=========================

$ cat t.c
int bar(void);
int foo1()
{
         int a, b;
                                                                                                                                                                             
         asm volatile goto ("r0 = 0; \
                             gotol_or_nop %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
                            "::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
int foo2()
{
         int a, b;
                                                                                                                                                                             
         asm volatile goto ("r0 = 0; \
                             nop_or_gotol %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
                            "::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
$ clang --target=bpf -O2 -g -c t.c
$ llvm-objdump -S t.o

t.o:    file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo1>:
; {
        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
;       asm volatile goto ("r0 = 0; \
        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
        2:       06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2>
        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
;       a = bar();
        5:       85 10 00 00 ff ff ff ff call -0x1
;       b = 20 * a;
        6:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000038 <LBB0_2>:
;       return b;
        7:       95 00 00 00 00 00 00 00 exit

0000000000000040 <foo2>:
; {
        8:       b7 00 00 00 00 00 00 00 r0 = 0x0
;       asm volatile goto ("r0 = 0; \
        9:       b7 00 00 00 00 00 00 00 r0 = 0x0
       10:       06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2>
       11:       b7 02 00 00 02 00 00 00 r2 = 0x2
       12:       b7 03 00 00 03 00 00 00 r3 = 0x3
;       a = bar();
       13:       85 10 00 00 ff ff ff ff call -0x1
;       b = 20 * a;
       14:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000078 <LBB1_2>:
;       return b;
       15:       95 00 00 00 00 00 00 00 exit

Eduard Zingerman Dec. 10, 2023, 10:30 a.m. UTC | #12

How about a slightly different modification of the Anton's idea.
Suppose that, as before, there is a special map type:

    struct {
        __uint(type, BPF_MAP_TYPE_ARRAY);
        __type(key, __u32);
        __type(value, __u32);
        __uint(map_flags, BPF_F_STATIC_KEY);
        __uint(max_entries, 1);
    } skey1 SEC(".maps")

Which is used as below:

    __attribute__((naked))
    int foo(void) {
      asm volatile (
                    "r0 = %[skey1] ll;"
                    "if r0 != r0 goto 1f;"
                    "r1 = r10;"
                    "r1 += -8;"
                    "r2 = 1;"
                    "call %[bpf_trace_printk];"
            "1:"
                    "exit;"
                    :: __imm_addr(skey1),
                       __imm(bpf_trace_printk)
                    : __clobber_all
      );
    }

Disassembly of section .text:

0000000000000000 <foo>:
       0:   r0 = 0x0 ll
        0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
       2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
       3:   r1 = r10
       4:   r1 += -0x8
       5:   r2 = 0x1
       6:   call 0x6
       7:   exit

And suppose that verifier is modified in the following ways:
- treat instructions "if rX == rX" / "if rX != rX" (when rX points to
  static key map) in a special way:
  - when program is verified, the jump is considered non deterministic;
  - when program is jitted, the jump is compiled as nop for "!=" and as
    unconditional jump for "==";
- build a table of static keys based on a specific map referenced in
  condition, e.g. for the example above it can be inferred that insn 2
  associates with map skey1 because "r0" points to "skey1";
- jit "rX = <static key> ll;" as nop;

On the plus side:
- any kinds of jump tables are omitted from system call;
- no new instruction is needed;
- almost no modifications to libbpf are necessary (only a helper macro
  to convince clang to keep "if rX == rX");

What do you think?

Thanks,
Eduard

Alexei Starovoitov Dec. 11, 2023, 3:33 a.m. UTC | #13

On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> How about a slightly different modification of the Anton's idea.
> Suppose that, as before, there is a special map type:
>
>     struct {
>         __uint(type, BPF_MAP_TYPE_ARRAY);
>         __type(key, __u32);
>         __type(value, __u32);
>         __uint(map_flags, BPF_F_STATIC_KEY);
>         __uint(max_entries, 1);
>     } skey1 SEC(".maps")

Instead of special map that the kernel has to know about
the same intent can be expressed with:
int skey1;
r0 = %[skey1] ll;
and then the kernel needs no extra map type while the user space
can collect all static_branches that use &skey1 by
iterating insn stream and comparing addresses.

> Which is used as below:
>
>     __attribute__((naked))
>     int foo(void) {
>       asm volatile (
>                     "r0 = %[skey1] ll;"
>                     "if r0 != r0 goto 1f;"
>                     "r1 = r10;"
>                     "r1 += -8;"
>                     "r2 = 1;"
>                     "call %[bpf_trace_printk];"
>             "1:"
>                     "exit;"
>                     :: __imm_addr(skey1),
>                        __imm(bpf_trace_printk)
>                     : __clobber_all
>       );
>     }
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
>        0:   r0 = 0x0 ll
>         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
>        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
>        3:   r1 = r10
>        4:   r1 += -0x8
>        5:   r2 = 0x1
>        6:   call 0x6
>        7:   exit
>
> And suppose that verifier is modified in the following ways:
> - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
>   static key map) in a special way:
>   - when program is verified, the jump is considered non deterministic;
>   - when program is jitted, the jump is compiled as nop for "!=" and as
>     unconditional jump for "==";
> - build a table of static keys based on a specific map referenced in
>   condition, e.g. for the example above it can be inferred that insn 2
>   associates with map skey1 because "r0" points to "skey1";
> - jit "rX = <static key> ll;" as nop;
>
> On the plus side:
> - any kinds of jump tables are omitted from system call;
> - no new instruction is needed;
> - almost no modifications to libbpf are necessary (only a helper macro
>   to convince clang to keep "if rX == rX");

Reusing existing insn means that we're giving it new meaning
and that always comes with danger of breaking existing progs.
In this case if rX == rX isn't very meaningful and new semantics
shouldn't break anything, but it's a danger zone.

If we treat:
if r0 == r0
as JA
then we have to treat
if r1 == r1
as JA as well and it becomes ambiguous when prog_info needs
to return the insns back to user space.

If we go with rX == rX approach we should probably limit it
to one specific register. r0, r10, r11 can be considered
and they have their own pros and cons.

Additional:
r0 = %[skey1] ll
in front of JE/JNE is a waste. If we JIT it to useless native insn
we will be burning cpu for no reason. So we should probably
optimize it out. If we do so, then this inline insn becomes a nop and
it's effectively a relocation. The insn stream will carry this
rX = 64bit_const insn to indicate the scope of the next insn.
It's pretty much like Anton's idea of using extra bits in JA
to encode an integer key_id.
With ld_imm64 we will encode 64-bit key_id.
Another insn with more bits to burn that has no effect on execution.

It doesn't look clean to encode so much extra metadata into instructions
that JITs and the interpreter have to ignore.
If we go this route:
  r11 = 64bit_const
  if r11 == r11 goto
is a lesser evil.
Still, it's not as clean as JA with extra bits in src_reg.
We already optimize JA +0 into a nop. See opt_remove_nops().
So a flavor of JA insn looks the most natural fit for a selectable
JA +xx or JA +0.

And the special map really doesn't fit.
Whatever we do, let's keep text_poke-able insn logic separate
from bookkeeping of addresses of those insns.
I think a special prefixed section that is understood by libbpf
(like what I proposed with "name.static_branch") will do fine.
If it's not good enough we can add a "set" map type
that will be a generic set of values.
It can be a set of 8-byte addresses to keep locations of static_branches,
but let's keep it generic.
I think it's fine to add:
__uint(type, BPF_MAP_TYPE_SET)
and let libbpf populate it with addresses of insns,
or address of variables, or other values
when it prepares a program for loading.
But map_update_elem should never be doing text_poke on insns.
We added prog_array map type is the past, but that was done
during the early days. If we were designing bpf today we would have
gone a different route.

Anton Protopopov Dec. 11, 2023, 5:31 p.m. UTC | #14

On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
> 
> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > 
> > > On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> > > > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > > > On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> > > > > > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > I feel like embedding some sort of ID inside the instruction is very..
> > > > > > > unusual, shall we say?
> > > > > > yeah. no magic numbers inside insns pls.
> > > > > > 
> > > > > > I don't like JA_CFG name, since I read CFG as control flow graph,
> > > > > > while you probably meant CFG as configurable.
> > > > > > How about BPF_JA_OR_NOP ?
> > > > > > Then in combination with BPF_JMP or BPF_JMP32 modifier
> > > > > > the insn->off|imm will be used.
> > > > > > 1st bit in src_reg can indicate the default action: nop or jmp.
> > > > > > In asm it may look like asm("goto_or_nop +5")
> > > > > How does the C source code looks like in order to generate
> > > > > BPF_JA_OR_NOP insn? Any source examples?
> > > > It will be in inline asm only. The address of that insn will
> > > > be taken either via && or via asm (".long %l[label]").
> > > >   From llvm pov both should go through the same relo creation logic. I hope :)
> > > A hack in llvm below with an example, could you check whether the C
> > > syntax and object dump result
> > > is what you want to see?
> > Thank you for the ultra quick llvm diff!
> > 
> > > diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > index 90697c6645be..38b1cbc31f9a 100644
> > > --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > @@ -231,6 +231,7 @@ public:
> > >            .Case("call", true)
> > >            .Case("goto", true)
> > >            .Case("gotol", true)
> > > +        .Case("goto_or_nop", true)
> > >            .Case("*", true)
> > >            .Case("exit", true)
> > >            .Case("lock", true)
> > > @@ -259,6 +260,7 @@ public:
> > >            .Case("bswap64", true)
> > >            .Case("goto", true)
> > >            .Case("gotol", true)
> > > +        .Case("goto_or_nop", true)
> > >            .Case("ll", true)
> > >            .Case("skb", true)
> > >            .Case("s", true)
> > > diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > b/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > index 5972c9d49c51..a953d10429bf 100644
> > > --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
> > > list<dag> Pattern>
> > >      let BPFClass = BPF_JMP;
> > >    }
> > > 
> > > +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> > > +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> > > +                   (outs),
> > > +                   (ins brtarget:$BrDst),
> > > +                   !strconcat(OpcodeStr, " $BrDst"),
> > > +                   Pattern> {
> > > +  bits<16> BrDst;
> > > +
> > > +  let Inst{47-32} = BrDst;
> > > +  let Inst{31-0} = 1;
> > > +  let BPFClass = BPF_JMP;
> > > +}
> > > +
> > >    class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> > >        : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> > >                       (outs),
> > > @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
> > >    let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
> > >      def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
> > >      def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
> > > +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
> > I was thinking of burning the new 0xE opcode for it,
> > but you're right. It's a flavor of existing JA insn and it's indeed
> > better to just use src_reg=1 bit to indicate so.
> 
> Right, using src_reg to indicate a new flavor of JA insn sounds
> a good idea. My previously-used 'imm' field is a pure hack.
> 
> > 
> > We probably need to use the 2nd bit of src_reg to indicate its default state
> > (jmp or fallthrough).
> 
> Good point.
> 
> > 
> > >           asm volatile goto ("r0 = 0; \
> > >                               goto_or_nop %l[label]; \
> > >                               r2 = 2; \
> > >                               r3 = 3; \
> > Not sure how to represent the default state in assembly though.
> > "goto_or_nop" defaults to goto
> > "nop_or_goto" default to nop
> > ?
> > 
> > Do we need "gotol" for imm32 or will it be automatic?
> 
> It won't be automatic.
> 
> At the end of this email, I will show the new change
> to have gotol_or_nop and nop_or_gotol insn and an example

Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
(with gotol) so that I can test it?

Overall, I think that JA + flags in SRC_REG is indeed better than a
new instruction, as a new code is not used.

This looks for me that two bits aren't enough, and the third is
required, as the second bit seems to be overloaded:

  * bit 1 indicates that this is a "JA_MAYBE"
  * bit 2 indicates a jump or nop (i.e., the current state)

However, we also need another bit which indicates what to do with the
instruction when we issue [an abstract] command

  flip_branch_on_or_off(branch, 0/1)

Without this information (and in the absense of external meta-data on
how to patch the branch) we can't determine what a given (BPF, not
jitted) program currently does. For example, if we issue

  flip_branch_on_or_off(branch, 0)

then we can't reflect this in the xlated program by setting the second
bit to jmp/off. Again, JITted program is fine, but it will be
desynchronized from xlated in term of logic (some instructions will be
mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).

In my original patch we kept this triplet as

  (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)

> to show it in asm. But there is an issue here.
> In my example, the compiler (more specifically
> the InstCombine pass) moved some code after
> the 'label' to before the 'label'. Not exactly
> sure how to prevent this. Maybe current
> 'asm goto' already have a way to handle
> this. Will investigate this later.
> 
> 
> =========================
> 
> $ cat t.c
> int bar(void);
> int foo1()
> {
>         int a, b;
>         asm volatile goto ("r0 = 0; \
>                             gotol_or_nop %l[label]; \
>                             r2 = 2; \
>                             r3 = 3; \
>                            "::::label);
>         a = bar();
> label:
>         b = 20 * a;
>         return b;
> }
> int foo2()
> {
>         int a, b;
>         asm volatile goto ("r0 = 0; \
>                             nop_or_gotol %l[label]; \
>                             r2 = 2; \
>                             r3 = 3; \
>                            "::::label);
>         a = bar();
> label:
>         b = 20 * a;
>         return b;
> }
> $ clang --target=bpf -O2 -g -c t.c
> $ llvm-objdump -S t.o
> 
> t.o:    file format elf64-bpf
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo1>:
> ; {
>        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
> ;       asm volatile goto ("r0 = 0; \
>        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
>        2:       06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2>
>        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
>        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
> ;       a = bar();
>        5:       85 10 00 00 ff ff ff ff call -0x1
> ;       b = 20 * a;
>        6:       27 00 00 00 14 00 00 00 r0 *= 0x14
> 
> 0000000000000038 <LBB0_2>:
> ;       return b;
>        7:       95 00 00 00 00 00 00 00 exit
> 
> 0000000000000040 <foo2>:
> ; {
>        8:       b7 00 00 00 00 00 00 00 r0 = 0x0
> ;       asm volatile goto ("r0 = 0; \
>        9:       b7 00 00 00 00 00 00 00 r0 = 0x0
>       10:       06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2>
>       11:       b7 02 00 00 02 00 00 00 r2 = 0x2
>       12:       b7 03 00 00 03 00 00 00 r3 = 0x3
> ;       a = bar();
>       13:       85 10 00 00 ff ff ff ff call -0x1
> ;       b = 20 * a;
>       14:       27 00 00 00 14 00 00 00 r0 *= 0x14
> 
> 0000000000000078 <LBB1_2>:
> ;       return b;
>       15:       95 00 00 00 00 00 00 00 exit
>

Andrii Nakryiko Dec. 11, 2023, 6:49 p.m. UTC | #15

On Sun, Dec 10, 2023 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > How about a slightly different modification of the Anton's idea.
> > Suppose that, as before, there is a special map type:
> >
> >     struct {
> >         __uint(type, BPF_MAP_TYPE_ARRAY);
> >         __type(key, __u32);
> >         __type(value, __u32);
> >         __uint(map_flags, BPF_F_STATIC_KEY);
> >         __uint(max_entries, 1);
> >     } skey1 SEC(".maps")
>
> Instead of special map that the kernel has to know about
> the same intent can be expressed with:
> int skey1;
> r0 = %[skey1] ll;
> and then the kernel needs no extra map type while the user space
> can collect all static_branches that use &skey1 by
> iterating insn stream and comparing addresses.
>
> > Which is used as below:
> >
> >     __attribute__((naked))
> >     int foo(void) {
> >       asm volatile (
> >                     "r0 = %[skey1] ll;"
> >                     "if r0 != r0 goto 1f;"
> >                     "r1 = r10;"
> >                     "r1 += -8;"
> >                     "r2 = 1;"
> >                     "call %[bpf_trace_printk];"
> >             "1:"
> >                     "exit;"
> >                     :: __imm_addr(skey1),
> >                        __imm(bpf_trace_printk)
> >                     : __clobber_all
> >       );
> >     }
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >        0:   r0 = 0x0 ll
> >         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
> >        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
> >        3:   r1 = r10
> >        4:   r1 += -0x8
> >        5:   r2 = 0x1
> >        6:   call 0x6
> >        7:   exit
> >
> > And suppose that verifier is modified in the following ways:
> > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
> >   static key map) in a special way:
> >   - when program is verified, the jump is considered non deterministic;
> >   - when program is jitted, the jump is compiled as nop for "!=" and as
> >     unconditional jump for "==";
> > - build a table of static keys based on a specific map referenced in
> >   condition, e.g. for the example above it can be inferred that insn 2
> >   associates with map skey1 because "r0" points to "skey1";
> > - jit "rX = <static key> ll;" as nop;
> >
> > On the plus side:
> > - any kinds of jump tables are omitted from system call;
> > - no new instruction is needed;
> > - almost no modifications to libbpf are necessary (only a helper macro
> >   to convince clang to keep "if rX == rX");
>
> Reusing existing insn means that we're giving it new meaning
> and that always comes with danger of breaking existing progs.
> In this case if rX == rX isn't very meaningful and new semantics
> shouldn't break anything, but it's a danger zone.
>
> If we treat:
> if r0 == r0
> as JA
> then we have to treat
> if r1 == r1
> as JA as well and it becomes ambiguous when prog_info needs
> to return the insns back to user space.
>
> If we go with rX == rX approach we should probably limit it
> to one specific register. r0, r10, r11 can be considered
> and they have their own pros and cons.
>
> Additional:
> r0 = %[skey1] ll
> in front of JE/JNE is a waste. If we JIT it to useless native insn
> we will be burning cpu for no reason. So we should probably
> optimize it out. If we do so, then this inline insn becomes a nop and
> it's effectively a relocation. The insn stream will carry this
> rX = 64bit_const insn to indicate the scope of the next insn.
> It's pretty much like Anton's idea of using extra bits in JA
> to encode an integer key_id.
> With ld_imm64 we will encode 64-bit key_id.
> Another insn with more bits to burn that has no effect on execution.
>
> It doesn't look clean to encode so much extra metadata into instructions
> that JITs and the interpreter have to ignore.
> If we go this route:
>   r11 = 64bit_const
>   if r11 == r11 goto
> is a lesser evil.
> Still, it's not as clean as JA with extra bits in src_reg.
> We already optimize JA +0 into a nop. See opt_remove_nops().
> So a flavor of JA insn looks the most natural fit for a selectable
> JA +xx or JA +0.

Another point for special jump instruction is that libbpf can be
taught to patch it into a normal JA or NOP on old kernels, depending
on likely/unlikely bit. You won't be able to flip it, of course, but
at least you don't have to compile two separate versions of the BPF
object file. Instead of "jump maybe" it will transparently become
"jump/don't jump for sure". This should definitely help with backwards
compatibility.

>
> And the special map really doesn't fit.
> Whatever we do, let's keep text_poke-able insn logic separate
> from bookkeeping of addresses of those insns.
> I think a special prefixed section that is understood by libbpf
> (like what I proposed with "name.static_branch") will do fine.
> If it's not good enough we can add a "set" map type
> that will be a generic set of values.
> It can be a set of 8-byte addresses to keep locations of static_branches,
> but let's keep it generic.
> I think it's fine to add:
> __uint(type, BPF_MAP_TYPE_SET)
> and let libbpf populate it with addresses of insns,
> or address of variables, or other values
> when it prepares a program for loading.
> But map_update_elem should never be doing text_poke on insns.
> We added prog_array map type is the past, but that was done
> during the early days. If we were designing bpf today we would have
> gone a different route.

Alexei Starovoitov Dec. 11, 2023, 7:08 p.m. UTC | #16

On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
>
> This looks for me that two bits aren't enough, and the third is
> required, as the second bit seems to be overloaded:
>
>   * bit 1 indicates that this is a "JA_MAYBE"
>   * bit 2 indicates a jump or nop (i.e., the current state)
>
> However, we also need another bit which indicates what to do with the
> instruction when we issue [an abstract] command
>
>   flip_branch_on_or_off(branch, 0/1)
>
> Without this information (and in the absense of external meta-data on
> how to patch the branch) we can't determine what a given (BPF, not
> jitted) program currently does. For example, if we issue
>
>   flip_branch_on_or_off(branch, 0)
>
> then we can't reflect this in the xlated program by setting the second
> bit to jmp/off. Again, JITted program is fine, but it will be
> desynchronized from xlated in term of logic (some instructions will be
> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).

Not following the need for the 3rd bit.
The 2nd bit is not only the initial state, but the current state too.

when user space does static_branch_enable it will set the 2nd bit to 1
(if it wasn't set) and will text_poke_bp the code.
xlated will be always in sync with JITed.
No ambiguity.

An annoying part is that bpf insn stream is read only, so we cannot
really write that 2nd bit. We can virtually write it.
Seeing nop or jmp in JITed code would mean the bit is 0 or 1.
So xlated dump will report it.

Separately the kernel doesn't have static_branch_switch/flip command.
It's only enable or disable. I think it's better to keep it the same way.

Yonghong Song Dec. 11, 2023, 9:51 p.m. UTC | #17

On 12/11/23 9:31 AM, Anton Protopopov wrote:
> On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
>> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>>>>>> unusual, shall we say?
>>>>>>> yeah. no magic numbers inside insns pls.
>>>>>>>
>>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>>>> while you probably meant CFG as configurable.
>>>>>>> How about BPF_JA_OR_NOP ?
>>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>>>> the insn->off|imm will be used.
>>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>>>> In asm it may look like asm("goto_or_nop +5")
>>>>>> How does the C source code looks like in order to generate
>>>>>> BPF_JA_OR_NOP insn? Any source examples?
>>>>> It will be in inline asm only. The address of that insn will
>>>>> be taken either via && or via asm (".long %l[label]").
>>>>>    From llvm pov both should go through the same relo creation logic. I hope :)
>>>> A hack in llvm below with an example, could you check whether the C
>>>> syntax and object dump result
>>>> is what you want to see?
>>> Thank you for the ultra quick llvm diff!
>>>
>>>> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> index 90697c6645be..38b1cbc31f9a 100644
>>>> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> @@ -231,6 +231,7 @@ public:
>>>>             .Case("call", true)
>>>>             .Case("goto", true)
>>>>             .Case("gotol", true)
>>>> +        .Case("goto_or_nop", true)
>>>>             .Case("*", true)
>>>>             .Case("exit", true)
>>>>             .Case("lock", true)
>>>> @@ -259,6 +260,7 @@ public:
>>>>             .Case("bswap64", true)
>>>>             .Case("goto", true)
>>>>             .Case("gotol", true)
>>>> +        .Case("goto_or_nop", true)
>>>>             .Case("ll", true)
>>>>             .Case("skb", true)
>>>>             .Case("s", true)
>>>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> b/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> index 5972c9d49c51..a953d10429bf 100644
>>>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
>>>> list<dag> Pattern>
>>>>       let BPFClass = BPF_JMP;
>>>>     }
>>>>
>>>> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>>> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>>> +                   (outs),
>>>> +                   (ins brtarget:$BrDst),
>>>> +                   !strconcat(OpcodeStr, " $BrDst"),
>>>> +                   Pattern> {
>>>> +  bits<16> BrDst;
>>>> +
>>>> +  let Inst{47-32} = BrDst;
>>>> +  let Inst{31-0} = 1;
>>>> +  let BPFClass = BPF_JMP;
>>>> +}
>>>> +
>>>>     class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>>>         : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>>>                        (outs),
>>>> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>>>>     let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>>>>       def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>>>>       def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
>>>> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
>>> I was thinking of burning the new 0xE opcode for it,
>>> but you're right. It's a flavor of existing JA insn and it's indeed
>>> better to just use src_reg=1 bit to indicate so.
>> Right, using src_reg to indicate a new flavor of JA insn sounds
>> a good idea. My previously-used 'imm' field is a pure hack.
>>
>>> We probably need to use the 2nd bit of src_reg to indicate its default state
>>> (jmp or fallthrough).
>> Good point.
>>
>>>>            asm volatile goto ("r0 = 0; \
>>>>                                goto_or_nop %l[label]; \
>>>>                                r2 = 2; \
>>>>                                r3 = 3; \
>>> Not sure how to represent the default state in assembly though.
>>> "goto_or_nop" defaults to goto
>>> "nop_or_goto" default to nop
>>> ?
>>>
>>> Do we need "gotol" for imm32 or will it be automatic?
>> It won't be automatic.
>>
>> At the end of this email, I will show the new change
>> to have gotol_or_nop and nop_or_gotol insn and an example
> Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
> (with gotol) so that I can test it?

Okay, I will send a RFC patch to llvm-project so you can do 'git fetch'
to get the patch into your local llvm-project repo and build a compiler
to test.

>
> Overall, I think that JA + flags in SRC_REG is indeed better than a
> new instruction, as a new code is not used.
>
> This looks for me that two bits aren't enough, and the third is
> required, as the second bit seems to be overloaded:
>
>    * bit 1 indicates that this is a "JA_MAYBE"
>    * bit 2 indicates a jump or nop (i.e., the current state)
>
> However, we also need another bit which indicates what to do with the
> instruction when we issue [an abstract] command
>
>    flip_branch_on_or_off(branch, 0/1)
>
> Without this information (and in the absense of external meta-data on
> how to patch the branch) we can't determine what a given (BPF, not
> jitted) program currently does. For example, if we issue
>
>    flip_branch_on_or_off(branch, 0)
>
> then we can't reflect this in the xlated program by setting the second
> bit to jmp/off. Again, JITted program is fine, but it will be
> desynchronized from xlated in term of logic (some instructions will be
> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
>
> In my original patch we kept this triplet as
>
>    (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)
[...]

Yonghong Song Dec. 11, 2023, 10:52 p.m. UTC | #18

On 12/11/23 1:51 PM, Yonghong Song wrote:
>
> On 12/11/23 9:31 AM, Anton Protopopov wrote:
>> On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
>>> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
>>>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song 
>>>> <yonghong.song@linux.dev> wrote:
>>>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song 
>>>>>> <yonghong.song@linux.dev> wrote:
>>>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>>>>> I feel like embedding some sort of ID inside the instruction 
>>>>>>>>> is very..
>>>>>>>>> unusual, shall we say?
>>>>>>>> yeah. no magic numbers inside insns pls.
>>>>>>>>
>>>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>>>>> while you probably meant CFG as configurable.
>>>>>>>> How about BPF_JA_OR_NOP ?
>>>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>>>>> the insn->off|imm will be used.
>>>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>>>>> In asm it may look like asm("goto_or_nop +5")
>>>>>>> How does the C source code looks like in order to generate
>>>>>>> BPF_JA_OR_NOP insn? Any source examples?
>>>>>> It will be in inline asm only. The address of that insn will
>>>>>> be taken either via && or via asm (".long %l[label]").
>>>>>>    From llvm pov both should go through the same relo creation 
>>>>>> logic. I hope :)
>>>>> A hack in llvm below with an example, could you check whether the C
>>>>> syntax and object dump result
>>>>> is what you want to see?
[...]
>>>> Thank you for the ultra quick llvm diff!
>>>> I was thinking of burning the new 0xE opcode for it,
>>>> but you're right. It's a flavor of existing JA insn and it's indeed
>>>> better to just use src_reg=1 bit to indicate so.
>>> Right, using src_reg to indicate a new flavor of JA insn sounds
>>> a good idea. My previously-used 'imm' field is a pure hack.
>>>
>>>> We probably need to use the 2nd bit of src_reg to indicate its 
>>>> default state
>>>> (jmp or fallthrough).
>>> Good point.
>>>
>>>>>            asm volatile goto ("r0 = 0; \
>>>>>                                goto_or_nop %l[label]; \
>>>>>                                r2 = 2; \
>>>>>                                r3 = 3; \
>>>> Not sure how to represent the default state in assembly though.
>>>> "goto_or_nop" defaults to goto
>>>> "nop_or_goto" default to nop
>>>> ?
>>>>
>>>> Do we need "gotol" for imm32 or will it be automatic?
>>> It won't be automatic.
>>>
>>> At the end of this email, I will show the new change
>>> to have gotol_or_nop and nop_or_gotol insn and an example
>> Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
>> (with gotol) so that I can test it?
>
> Okay, I will send a RFC patch to llvm-project so you can do 'git fetch'
> to get the patch into your local llvm-project repo and build a compiler
> to test.

Okay, the following is the llvm-project diff:

https://github.com/llvm/llvm-project/pull/75110

I added a label in inline asm like below:
|asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop 
%l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label); to 
identify the location of a gotol_or_nop/nop_or_gotol insn and the label '||||static_key_loc_1|' is in ELF symbol table
can be easily searched and validated (the location is
a gotol_or_nop/nop_or_gotol insn).

>
>>
>> Overall, I think that JA + flags in SRC_REG is indeed better than a
>> new instruction, as a new code is not used.
>>
>> This looks for me that two bits aren't enough, and the third is
>> required, as the second bit seems to be overloaded:
>>
>>    * bit 1 indicates that this is a "JA_MAYBE"
>>    * bit 2 indicates a jump or nop (i.e., the current state)
>>
>> However, we also need another bit which indicates what to do with the
>> instruction when we issue [an abstract] command
>>
>>    flip_branch_on_or_off(branch, 0/1)
>>
>> Without this information (and in the absense of external meta-data on
>> how to patch the branch) we can't determine what a given (BPF, not
>> jitted) program currently does. For example, if we issue
>>
>>    flip_branch_on_or_off(branch, 0)
>>
>> then we can't reflect this in the xlated program by setting the second
>> bit to jmp/off. Again, JITted program is fine, but it will be
>> desynchronized from xlated in term of logic (some instructions will be
>> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
>>
>> In my original patch we kept this triplet as
>>
>>    (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)
> [...]
>

Anton Protopopov Dec. 12, 2023, 9:06 a.m. UTC | #19

On Mon, Dec 11, 2023 at 11:08:59AM -0800, Alexei Starovoitov wrote:
> On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> >
> > This looks for me that two bits aren't enough, and the third is
> > required, as the second bit seems to be overloaded:
> >
> >   * bit 1 indicates that this is a "JA_MAYBE"
> >   * bit 2 indicates a jump or nop (i.e., the current state)
> >
> > However, we also need another bit which indicates what to do with the
> > instruction when we issue [an abstract] command
> >
> >   flip_branch_on_or_off(branch, 0/1)
> >
> > Without this information (and in the absense of external meta-data on
> > how to patch the branch) we can't determine what a given (BPF, not
> > jitted) program currently does. For example, if we issue
> >
> >   flip_branch_on_or_off(branch, 0)
> >
> > then we can't reflect this in the xlated program by setting the second
> > bit to jmp/off. Again, JITted program is fine, but it will be
> > desynchronized from xlated in term of logic (some instructions will be
> > mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
> 
> Not following the need for the 3rd bit.
> The 2nd bit is not only the initial state, but the current state too.
> 
> when user space does static_branch_enable it will set the 2nd bit to 1
> (if it wasn't set) and will text_poke_bp the code.
> xlated will be always in sync with JITed.
> No ambiguity.

Ok, from BPF arch perspective this can work with two bits (not for
practical purposes though, IMO, see my next e-mail). On the lowest
level we have this magic jump instruction

  JA[SRC_REG=1] +OFF    # jits to a NOP
  JA[SRC_REG=3] +OFF    # jits to a JUMP

Then we have a primitive kfunc

  static_branch_set(prog, branch, bool on)

Which sets the second bit and pokes jitted code.  (Maybe it doesn't
set the actual bit in xlated due to read-only memory, as you've
mentioned below.) You're right that this is unambiguous.

> An annoying part is that bpf insn stream is read only, so we cannot
> really write that 2nd bit. We can virtually write it.
> Seeing nop or jmp in JITed code would mean the bit is 0 or 1.
> So xlated dump will report it.

If we can poke jitted x86 code, then we might poke prog->insnsi in the
same way as well? Besides, on architectures where static keys may be
of interest we don't use interpreter, so maybe this is ok to poke
insnsi (and make it rw) after all?

> Separately the kernel doesn't have static_branch_switch/flip command.
> It's only enable or disable. I think it's better to keep it the same way.

Anton Protopopov Dec. 12, 2023, 10:25 a.m. UTC | #20

On Sun, Dec 10, 2023 at 07:33:31PM -0800, Alexei Starovoitov wrote:
> On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > How about a slightly different modification of the Anton's idea.
> > Suppose that, as before, there is a special map type:
> >
> >     struct {
> >         __uint(type, BPF_MAP_TYPE_ARRAY);
> >         __type(key, __u32);
> >         __type(value, __u32);
> >         __uint(map_flags, BPF_F_STATIC_KEY);
> >         __uint(max_entries, 1);
> >     } skey1 SEC(".maps")
> 
> Instead of special map that the kernel has to know about
> the same intent can be expressed with:
> int skey1;
> r0 = %[skey1] ll;
> and then the kernel needs no extra map type while the user space
> can collect all static_branches that use &skey1 by
> iterating insn stream and comparing addresses.
> 
> > Which is used as below:
> >
> >     __attribute__((naked))
> >     int foo(void) {
> >       asm volatile (
> >                     "r0 = %[skey1] ll;"
> >                     "if r0 != r0 goto 1f;"
> >                     "r1 = r10;"
> >                     "r1 += -8;"
> >                     "r2 = 1;"
> >                     "call %[bpf_trace_printk];"
> >             "1:"
> >                     "exit;"
> >                     :: __imm_addr(skey1),
> >                        __imm(bpf_trace_printk)
> >                     : __clobber_all
> >       );
> >     }
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >        0:   r0 = 0x0 ll
> >         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
> >        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
> >        3:   r1 = r10
> >        4:   r1 += -0x8
> >        5:   r2 = 0x1
> >        6:   call 0x6
> >        7:   exit
> >
> > And suppose that verifier is modified in the following ways:
> > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
> >   static key map) in a special way:
> >   - when program is verified, the jump is considered non deterministic;
> >   - when program is jitted, the jump is compiled as nop for "!=" and as
> >     unconditional jump for "==";
> > - build a table of static keys based on a specific map referenced in
> >   condition, e.g. for the example above it can be inferred that insn 2
> >   associates with map skey1 because "r0" points to "skey1";
> > - jit "rX = <static key> ll;" as nop;
> >
> > On the plus side:
> > - any kinds of jump tables are omitted from system call;
> > - no new instruction is needed;
> > - almost no modifications to libbpf are necessary (only a helper macro
> >   to convince clang to keep "if rX == rX");
> 
> Reusing existing insn means that we're giving it new meaning
> and that always comes with danger of breaking existing progs.
> In this case if rX == rX isn't very meaningful and new semantics
> shouldn't break anything, but it's a danger zone.
> 
> If we treat:
> if r0 == r0
> as JA
> then we have to treat
> if r1 == r1
> as JA as well and it becomes ambiguous when prog_info needs
> to return the insns back to user space.
> 
> If we go with rX == rX approach we should probably limit it
> to one specific register. r0, r10, r11 can be considered
> and they have their own pros and cons.
> 
> Additional:
> r0 = %[skey1] ll
> in front of JE/JNE is a waste. If we JIT it to useless native insn
> we will be burning cpu for no reason. So we should probably
> optimize it out. If we do so, then this inline insn becomes a nop and
> it's effectively a relocation. The insn stream will carry this
> rX = 64bit_const insn to indicate the scope of the next insn.
> It's pretty much like Anton's idea of using extra bits in JA
> to encode an integer key_id.
> With ld_imm64 we will encode 64-bit key_id.
> Another insn with more bits to burn that has no effect on execution.
> 
> It doesn't look clean to encode so much extra metadata into instructions
> that JITs and the interpreter have to ignore.
> If we go this route:
>   r11 = 64bit_const
>   if r11 == r11 goto
> is a lesser evil.
> Still, it's not as clean as JA with extra bits in src_reg.
> We already optimize JA +0 into a nop. See opt_remove_nops().
> So a flavor of JA insn looks the most natural fit for a selectable
> JA +xx or JA +0.

This seems to have a benefit that there is no back compatibility issue
(if we use r1, because r0/r11 will be rejected by old verifiers). We
can have

    r1 = 64bit_const
    if r1 == r1 goto

and

    r1 = 64bit_const
    if r1 != r1 goto

and translate it on prog load to new instruction as JUMP_OF_NOP and
NOP_OR_JUMP, correspondingly. On older kernels it will have the
default (key is off) behaviour.

> And the special map really doesn't fit.
> Whatever we do, let's keep text_poke-able insn logic separate
> from bookkeeping of addresses of those insns.
> I think a special prefixed section that is understood by libbpf
> (like what I proposed with "name.static_branch") will do fine.
> If it's not good enough we can add a "set" map type
> that will be a generic set of values.
> It can be a set of 8-byte addresses to keep locations of static_branches,
> but let's keep it generic.
> I think it's fine to add:
> __uint(type, BPF_MAP_TYPE_SET)
> and let libbpf populate it with addresses of insns,
> or address of variables, or other values
> when it prepares a program for loading.

What is the higher-level API in this case? The static_branch_set(branch,
bool on) is not enough because we want to distinguish between "normal"
and "inverse" branches (for unlikely/likely cases). We can implement
this using something like this:

static_key_set(key, bool new_value)
{
    /* true if we change key value */
    bool key_changed = key->old_value ^ new_value;
 
    for_each_prog(prog, key)
        for_each_branch(branch, prog, key)
            static_branch_flip(prog, branch, key_changed)
}

where static_branch_flip flips the second bit of SRC_REG. We need to
keep track of prog->branches and key->progs. How is this different
from what my patch implements?

If this is implemented in userspace, then how we prevent synchronous
updates of the key (and a relocation variant doesn't seem to work from
userspace)? Or is this a new kfunc? If yes, then how is it
executed, 

> But map_update_elem should never be doing text_poke on insns.
> We added prog_array map type is the past, but that was done
> during the early days. If we were designing bpf today we would have
> gone a different route.

What is the interface to toggle a key? If this is a map or an index of
a global variable and we can't do this via a syscall, then this means
that we need to write and compile an iterator to go through all maps,
select a proper map, then code should be generated to go through all
programs of interest and to update all static branches there (and we
will have to build this iterator every time, and keep track of all
mappings from userspace), while an alternative is to just issue one
syscall.

If this is just a `kfunc set_static_key(key, on/off)` then this is
simpler, but again, we will have to load an iterator and then iterate
through all maps to find the key vs. one syscall.

Alexei Starovoitov Dec. 14, 2023, 2:15 a.m. UTC | #21

On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
>
> This seems to have a benefit that there is no back compatibility issue
> (if we use r1, because r0/r11 will be rejected by old verifiers). We
> can have
>
>     r1 = 64bit_const
>     if r1 == r1 goto
>
> and
>
>     r1 = 64bit_const
>     if r1 != r1 goto
>
> and translate it on prog load to new instruction as JUMP_OF_NOP and
> NOP_OR_JUMP, correspondingly. On older kernels it will have the
> default (key is off) behaviour.

As Andrii pointed out any new insn either JA with extra bits
or special meaning if rX == rX can be sanitized by libbpf
into plain JA.
There will be no backward compat issues.

> Ok, from BPF arch perspective this can work with two bits (not for
> practical purposes though, IMO, see my next e-mail).

I read this email and I still don't understand why you need a 3rd bit.

>
> > And the special map really doesn't fit.
> > Whatever we do, let's keep text_poke-able insn logic separate
> > from bookkeeping of addresses of those insns.
> > I think a special prefixed section that is understood by libbpf
> > (like what I proposed with "name.static_branch") will do fine.
> > If it's not good enough we can add a "set" map type
> > that will be a generic set of values.
> > It can be a set of 8-byte addresses to keep locations of static_branches,
> > but let's keep it generic.
> > I think it's fine to add:
> > __uint(type, BPF_MAP_TYPE_SET)
> > and let libbpf populate it with addresses of insns,
> > or address of variables, or other values
> > when it prepares a program for loading.
>
> What is the higher-level API in this case? The static_branch_set(branch,
> bool on) is not enough because we want to distinguish between "normal"
> and "inverse" branches (for unlikely/likely cases).

What is "likely/unlikely cases" ?
likely() is a hint to the compiler to order basic blocks in
a certain way. There is no likely/unlikely bit in the binary code
after compilation on x86 or other architectures.

There used to be a special bit on sparc64 that would mean
a default jmp|fallthrough action for a conditional jmp.
But that was before sparc became out of order and gained proper
branch predictor in HW.

>  We can implement
> this using something like this:
>
> static_key_set(key, bool new_value)
> {
>     /* true if we change key value */
>     bool key_changed = key->old_value ^ new_value;
>
>     for_each_prog(prog, key)
>         for_each_branch(branch, prog, key)
>             static_branch_flip(prog, branch, key_changed)
> }
>
> where static_branch_flip flips the second bit of SRC_REG.

I don't understand why you keep bringing up 'flip' use case.
The kernel doesn't have such an operation on static branches.
Which makes me believe that it wasn't necessary.
Why do we need one for the bpf static branch?

> We need to
> keep track of prog->branches and key->progs. How is this different
> from what my patch implements?

What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
and by naming convention libbpf will populate it with addresses
of JA_OR_NOP from all progs.
In asm it could be:
asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
(and libbpf will remove ld_imm64 from the prog before loading.)

or via
asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
(and libbpf will copy from the special section into a set and remove
special section).

It will be a libbpf convention and the kernel doesn't need
to know about a special static branch map type or array of addresses
in prog_load cmd.
Only JA insn is relevant to the verifier and JITs.

Ideally we don't need to introduce SET map type and
libbpf wouldn't need to populate it.
If we can make it work with an array of values that .pushsection + .long
automatically populates and libbpf treats it as a normal global data array
that would be ideal.
Insn addresses from all progs will be in that array after loading.
Sort of like ".kconfig" section that libbpf populates,
but it's a normal array underneath.

> If this is implemented in userspace, then how we prevent synchronous
> updates of the key (and a relocation variant doesn't seem to work from
> userspace)? Or is this a new kfunc? If yes, then how is it
> executed,

then user space can have small helper in libbpf that iterates
over SET (or array) and
calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)

Similar in the kernel. When bpf progs want to enable a key it does
bpf_for_each(set) { // open coded iterator
   bpf_static_branch_enable(addr); // kfunc call
}

Yonghong Song Dec. 14, 2023, 3:04 a.m. UTC | #22

On 12/13/23 6:15 PM, Alexei Starovoitov wrote:
> On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>>
>> This seems to have a benefit that there is no back compatibility issue
>> (if we use r1, because r0/r11 will be rejected by old verifiers). We
>> can have
>>
>>      r1 = 64bit_const
>>      if r1 == r1 goto
>>
>> and
>>
>>      r1 = 64bit_const
>>      if r1 != r1 goto
>>
>> and translate it on prog load to new instruction as JUMP_OF_NOP and
>> NOP_OR_JUMP, correspondingly. On older kernels it will have the
>> default (key is off) behaviour.
> As Andrii pointed out any new insn either JA with extra bits
> or special meaning if rX == rX can be sanitized by libbpf
> into plain JA.
> There will be no backward compat issues.
>
>> Ok, from BPF arch perspective this can work with two bits (not for
>> practical purposes though, IMO, see my next e-mail).
> I read this email and I still don't understand why you need a 3rd bit.
>
>>> And the special map really doesn't fit.
>>> Whatever we do, let's keep text_poke-able insn logic separate
>>> from bookkeeping of addresses of those insns.
>>> I think a special prefixed section that is understood by libbpf
>>> (like what I proposed with "name.static_branch") will do fine.
>>> If it's not good enough we can add a "set" map type
>>> that will be a generic set of values.
>>> It can be a set of 8-byte addresses to keep locations of static_branches,
>>> but let's keep it generic.
>>> I think it's fine to add:
>>> __uint(type, BPF_MAP_TYPE_SET)
>>> and let libbpf populate it with addresses of insns,
>>> or address of variables, or other values
>>> when it prepares a program for loading.
>> What is the higher-level API in this case? The static_branch_set(branch,
>> bool on) is not enough because we want to distinguish between "normal"
>> and "inverse" branches (for unlikely/likely cases).
> What is "likely/unlikely cases" ?
> likely() is a hint to the compiler to order basic blocks in
> a certain way. There is no likely/unlikely bit in the binary code
> after compilation on x86 or other architectures.
>
> There used to be a special bit on sparc64 that would mean
> a default jmp|fallthrough action for a conditional jmp.
> But that was before sparc became out of order and gained proper
> branch predictor in HW.
>
>>   We can implement
>> this using something like this:
>>
>> static_key_set(key, bool new_value)
>> {
>>      /* true if we change key value */
>>      bool key_changed = key->old_value ^ new_value;
>>
>>      for_each_prog(prog, key)
>>          for_each_branch(branch, prog, key)
>>              static_branch_flip(prog, branch, key_changed)
>> }
>>
>> where static_branch_flip flips the second bit of SRC_REG.
> I don't understand why you keep bringing up 'flip' use case.
> The kernel doesn't have such an operation on static branches.
> Which makes me believe that it wasn't necessary.
> Why do we need one for the bpf static branch?
>
>> We need to
>> keep track of prog->branches and key->progs. How is this different
>> from what my patch implements?
> What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
> and by naming convention libbpf will populate it with addresses
> of JA_OR_NOP from all progs.
> In asm it could be:
> asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
> (and libbpf will remove ld_imm64 from the prog before loading.)
>
> or via
> asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
> (and libbpf will copy from the special section into a set and remove
> special section).

This is one more alternative.

|asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop 
%l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label);|

User code can use libbpf API static_key_enable("|static_key_loc_1|")
to enable the above gotol_or_nop. static_key_disable("static_key_loc_1")
or static_key_enabled("static_key_loc_1") can be similary defined.

Inside the libbpf, it can look at ELF file, find label "static_key_loc_1",
and also find label's corresponding insn index and verify that
the insn with label "static_key_loc_1" must be a gotol_or_nop
or nop_or_gotol insn. Eventually libbpf can call a bpf syscall
e.g. sys_bpf(cmd=STATIC_KEY_ENABLE, prog_fd, insn_offset)
to actually change the "current" state to gotol or nop depending
on what ENABLE means.

For enable/disable static keys in the bpf program itself,
similary, bpf program can have bpf_static_key_enable("static_key_loc_1"),
the libbpf needs to change it to bpf_static_key_enable(insn_offset)
and kernel verifier should process it properly.

Slightly different from what Alexei proposed, but another approach
for consideration and discussion.

>
> It will be a libbpf convention and the kernel doesn't need
> to know about a special static branch map type or array of addresses
> in prog_load cmd.
> Only JA insn is relevant to the verifier and JITs.
>
> Ideally we don't need to introduce SET map type and
> libbpf wouldn't need to populate it.
> If we can make it work with an array of values that .pushsection + .long
> automatically populates and libbpf treats it as a normal global data array
> that would be ideal.
> Insn addresses from all progs will be in that array after loading.
> Sort of like ".kconfig" section that libbpf populates,
> but it's a normal array underneath.
>
>> If this is implemented in userspace, then how we prevent synchronous
>> updates of the key (and a relocation variant doesn't seem to work from
>> userspace)? Or is this a new kfunc? If yes, then how is it
>> executed,
> then user space can have small helper in libbpf that iterates
> over SET (or array) and
> calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)
>
> Similar in the kernel. When bpf progs want to enable a key it does
> bpf_for_each(set) { // open coded iterator
>     bpf_static_branch_enable(addr); // kfunc call
> }

Anton Protopopov Dec. 14, 2023, 4:33 p.m. UTC | #23

On Wed, Dec 13, 2023 at 06:15:20PM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> >
> > This seems to have a benefit that there is no back compatibility issue
> > (if we use r1, because r0/r11 will be rejected by old verifiers). We
> > can have
> >
> >     r1 = 64bit_const
> >     if r1 == r1 goto
> >
> > and
> >
> >     r1 = 64bit_const
> >     if r1 != r1 goto
> >
> > and translate it on prog load to new instruction as JUMP_OF_NOP and
> > NOP_OR_JUMP, correspondingly. On older kernels it will have the
> > default (key is off) behaviour.
> 
> As Andrii pointed out any new insn either JA with extra bits
> or special meaning if rX == rX can be sanitized by libbpf
> into plain JA.
> There will be no backward compat issues.
> 
> > Ok, from BPF arch perspective this can work with two bits (not for
> > practical purposes though, IMO, see my next e-mail).
> 
> I read this email and I still don't understand why you need a 3rd bit.
> 
> >
> > > And the special map really doesn't fit.
> > > Whatever we do, let's keep text_poke-able insn logic separate
> > > from bookkeeping of addresses of those insns.
> > > I think a special prefixed section that is understood by libbpf
> > > (like what I proposed with "name.static_branch") will do fine.
> > > If it's not good enough we can add a "set" map type
> > > that will be a generic set of values.
> > > It can be a set of 8-byte addresses to keep locations of static_branches,
> > > but let's keep it generic.
> > > I think it's fine to add:
> > > __uint(type, BPF_MAP_TYPE_SET)
> > > and let libbpf populate it with addresses of insns,
> > > or address of variables, or other values
> > > when it prepares a program for loading.
> >
> > What is the higher-level API in this case? The static_branch_set(branch,
> > bool on) is not enough because we want to distinguish between "normal"
> > and "inverse" branches (for unlikely/likely cases).
> 
> What is "likely/unlikely cases" ?
> likely() is a hint to the compiler to order basic blocks in
> a certain way. There is no likely/unlikely bit in the binary code
> after compilation on x86 or other architectures.
> 
> There used to be a special bit on sparc64 that would mean
> a default jmp|fallthrough action for a conditional jmp.
> But that was before sparc became out of order and gained proper
> branch predictor in HW.

Consider this code:

    int foo(void)
    {
    	if (static_branch_likely(&key))
    		return 33;
    	return 55;
    }

    int hoo(void)
    {
    	if (static_branch_unlikely(&key))
    		return 33;
    	return 55;
    }

When the key is disabled the corresponding code is:

    0000000000000010 <foo>:                                                            
      19:   eb 0b                   jmp    26 <foo+0x16> <-------- likely(static branch), key off
      1b:   b8 21 00 00 00          mov    $0x21,%eax                                  
      20:   5d                      pop    %rbp                                        
      21:   e9 00 00 00 00          jmp    26 <foo+0x16>                               
      26:   b8 37 00 00 00          mov    $0x37,%eax                                  
      2b:   5d                      pop    %rbp                                        
      2c:   e9 00 00 00 00          jmp    31 <foo+0x21>                               
      31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)                    
      38:   00 00 00 00                                                                
      3c:   0f 1f 40 00             nopl   0x0(%rax)                                   
    
    0000000000000050 <hoo>:                                                            
    <hoo>:                                                                             
      59:   66 90                   xchg   %ax,%ax <-------------- unlikely(static branch), key off                
      5b:   b8 37 00 00 00          mov    $0x37,%eax                                  
      60:   5d                      pop    %rbp                                        
      61:   e9 00 00 00 00          jmp    66 <hoo+0x16>                               
      66:   b8 21 00 00 00          mov    $0x21,%eax                                  
      6b:   5d                      pop    %rbp                                        
      6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>            

When the key is enabled, the code is:

    0000000000000010 <foo>:
     19:   66 90                   xchg   %ax,%ax <--------------- likely(static branch), key on
     1b:   b8 21 00 00 00          mov    $0x21,%eax
     20:   5d                      pop    %rbp
     21:   e9 00 00 00 00          jmp    26 <foo+0x16>
     26:   b8 37 00 00 00          mov    $0x37,%eax
     2b:   5d                      pop    %rbp
     2c:   e9 00 00 00 00          jmp    31 <foo+0x21>
     31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
     38:   00 00 00 00
     3c:   0f 1f 40 00             nopl   0x0(%rax)
                                                                           
    
    0000000000000050 <hoo>:
     59:   eb 0b                   jmp    66 <hoo+0x16> <--------- unlikely(static branch), key on
     5b:   b8 37 00 00 00          mov    $0x37,%eax
     60:   5d                      pop    %rbp
     61:   e9 00 00 00 00          jmp    66 <hoo+0x16>
     66:   b8 21 00 00 00          mov    $0x21,%eax
     6b:   5d                      pop    %rbp
     6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>

So, for the likely case we set branch to JUMP/NOP when key is OFF/ON.
And for the unlikely case we set branch to NOP/JUMP when key is
OFF/ON. The kernel keeps this information, and when
static_key_enable(key) is executed, it does [simplified for reading]

	for (entry = key->start; entry < stop; entry++)
	        arch_jump_label_transform(entry, jump_label_type(entry));

this jump_label_type() contains this bit of information: are we
writing NOP or JUMP for an enabled key. Same for
static_key_disable(key).

Now, for BPF we do static_branch_enable(branch). To generate proper
JITed code, we have enough of information (NOP=0, JUMP=2):

    static_branch_enable(JA[SRC=1|NOP]) jits to ARCH_JUMP
    static_branch_enable(JA[SRC=1|JUMP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|NOP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|JUMP]) jits to ARCH_JUMP

But how do we represent this in xlated code to user? Do we patch the
xlated code? If we do, then

    static_branch_enable changes JA[SRC=1|NOP] to JA[SRC=1|JUMP], ARCH_JUMP generated
    static_branch_disable sees JA[SRC=1|JUMP], changes it to JA[SRC=1|NOP], but ARCH_JUMP is generated

or what about two static_branch_enable on the same branch? By flipping
I meant that we always do

    JA[SRC=1|NOP]  jits to ARCH_NOP
    JA[SRC=1|JUMP] jits to ARCH_JUMP

the primitive operation is static_branch_flip which changes
JA[SRC=1|NOP] to JA[SRC=1|JUMP] and vice versa. Then for
static_key_enable(key) we flip all the branches if key was disabled
and do nothing otherwise. Same for static_key_enable.

What you've proposed before is to keep this "third bit" in
xlated+jitted form.  Basically, we check the state of the branch
"on/off" like this: say, we see that xlated/jitted state of a branch
is JA[SRC=1|NOP] and ARCH_JUMP, then we can say that this branch is
on. How do we report it to user in PROG_INFO to we set the instruction
to JA[SRC=1|JUMP] in output, specifying that its current stae is to
jump? This would work, I think.

> >  We can implement
> > this using something like this:
> >
> > static_key_set(key, bool new_value)
> > {
> >     /* true if we change key value */
> >     bool key_changed = key->old_value ^ new_value;
> >
> >     for_each_prog(prog, key)
> >         for_each_branch(branch, prog, key)
> >             static_branch_flip(prog, branch, key_changed)
> > }
> >
> > where static_branch_flip flips the second bit of SRC_REG.
> 
> I don't understand why you keep bringing up 'flip' use case.
> The kernel doesn't have such an operation on static branches.
> Which makes me believe that it wasn't necessary.
> Why do we need one for the bpf static branch?

See above.

> > We need to
> > keep track of prog->branches and key->progs. How is this different
> > from what my patch implements?
> 
> What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
> and by naming convention libbpf will populate it with addresses
> of JA_OR_NOP from all progs.
> In asm it could be:
> asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
> (and libbpf will remove ld_imm64 from the prog before loading.)
> 
> or via
> asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
> (and libbpf will copy from the special section into a set and remove
> special section).
> 
> It will be a libbpf convention and the kernel doesn't need
> to know about a special static branch map type or array of addresses
> in prog_load cmd.
> Only JA insn is relevant to the verifier and JITs.
> 
> Ideally we don't need to introduce SET map type and
> libbpf wouldn't need to populate it.
> If we can make it work with an array of values that .pushsection + .long
> automatically populates and libbpf treats it as a normal global data array
> that would be ideal.
> Insn addresses from all progs will be in that array after loading.
> Sort of like ".kconfig" section that libbpf populates,
> but it's a normal array underneath.

This doesn't work without the kernel side, as we change instructions
offsets inside the verifier when, e.g., converting helper calls to
individual instructions, etc.

Otherwise, this is more-or-less what my patch does: get a list of
offsets of banches inside the program, associate them with a map (or
ID of any kind), and then push to kernel.

And if you need to keep this list in kernel, then it looks like we can
keep it and access per key, not in a loop...

Alternative is to encode key ID in the instruction in some form so
that it is visible in xlated code after the program is loaded. But in
this case this makes users responsible to writing iterators and
bookkeeping all relationships vs. just loading a program with a map
fd, which is a well-established api in BPF.

> > If this is implemented in userspace, then how we prevent synchronous
> > updates of the key (and a relocation variant doesn't seem to work from
> > userspace)? Or is this a new kfunc? If yes, then how is it
> > executed,
> 
> then user space can have small helper in libbpf that iterates
> over SET (or array) and
> calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)
>
> Similar in the kernel. When bpf progs want to enable a key it does
> bpf_for_each(set) { // open coded iterator
>    bpf_static_branch_enable(addr); // kfunc call
> }

Is this possible to use map like it is used in my patch, but do
sys_bpf(cmd=STATIC_KEY_ENABLE, attr.map_fd) so that it specifically
separated from map_update_value? (If it was not a map, but
just some "bpf object"?)

Eduard Zingerman Dec. 14, 2023, 4:56 p.m. UTC | #24

On Wed, 2023-12-13 at 19:04 -0800, Yonghong Song wrote:
> Slightly different from what Alexei proposed, but another approach
> for consideration and discussion.

It looks like Alexei / Yonghong focus on individual addresses,
while as far as I understand Anton's original proposal is more about
program wide switches. E.g. there is some "DEBUG" flag and from
application point of view it does not matter how many static branches
are enabled or disabled by this flag.
In a sense, there is one-to-one vs one-to-many control granularity.

Here is an additional variation for one-to-many approach, basically a
hybrid of Anton's original idea and Andrii's suggestion to use
relocations.

Suppose there is a special type of maps :) 
(it does not really matter if it is a map or not, what matters is that
 this object has an FD and libbpf knows how to relocate it when
 preparing program for load):

    struct {
        __uint(type, BPF_MAP_TYPE_ARRAY);
        __type(key, __u32);
        __type(value, __u32);
        __uint(map_flags, BPF_F_STATIC_KEY);
        __uint(max_entries, 1);
    } skey1 SEC(".maps")

And a new instruction, that accepts an fd/pointer corresponding to
this map as a parameter + a label to which to jump if static key is
enabled. E.g., as below:

    __attribute__((naked))
    int foo(void) {
      asm volatile (
                    "if static %[skey1] goto 1f;" // hypthetical syntax
                    "r1 = r10;"
                    "r1 += -8;"
                    "r2 = 1;"
                    "call %[bpf_trace_printk];"
            "1:"
                    "exit;"
                    :: __imm_addr(skey1),
                       __imm(bpf_trace_printk)
                    : __clobber_all
      );
    }

This "if static" is relocated by libbpf in order to get the correct
map fd, when program is loaded.

In effect, each static key is a single entry map with one to many
relationship to instructions / programs it controls.
It can be enabled or disabled using either map update functions
or new system call commands.

What I think is a plus with such approach:
- The application is free to decide how it wants to organize these FDs:
  - use one fd per condition;
  - have an fd per program;
  - have an fd per set of programs.
- Embedding fd with instruction removes need to communicate mapping
  information back and forth, or track special section kinds.

[bpf-next,6/7] libbpf: BPF Static Keys support

Checks

Commit Message

Comments

Patch