Message ID | 20231206141030.1478753-7-aspsk@isovalent.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | BPF Static Keys | expand |
On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > + > +static __always_inline int __bpf_static_branch_jump(void *static_key) > +{ > + asm goto("1:\n\t" > + "goto %l[l_yes]\n\t" > + ".pushsection .jump_table, \"aw\"\n\t" > + ".balign 8\n\t" > + ".long 1b - .\n\t" > + ".long %l[l_yes] - .\n\t" > + ".quad %c0 - . + 1\n\t" > + ".popsection\n\t" > + :: "i" (static_key) > + :: l_yes); > + return 0; > +l_yes: > + return 1; > +} Could you add a test to patch 7 where the same subprog is used by multiple main progs and another test where a prog with static_keys is statically linked by libbpf into another prog? I suspect these cases are not handled by libbpf in the series. The adjustment of insn offsets is tricky and I don't see this logic in patch 5. The special handling of JA insn (if it's listed in static_branches_info[]) is fragile. The check_cfg() and the verifier main loop are two places so far, but JA is an unconditional jump. Every tool that understands BPF ISA would have to treat JA as "maybe it's not a jump in this case" and that is concerning. I certainly see the appeal of copy-pasting kernel's static_branch logic, but we can do better since we're not bound by x86 ISA. How about we introduce a new insn JA_MAYBE insn, and check_cfg and the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1. The new instruction will indicate that it may jump or fall through. Another bit could be used to indicate a "default" action (jump vs fallthrough) which will be used by JITs to translate into nop or jmp. Once it's a part of the instruction stream we can have bpf prog callable kfunc that can flip JA_MAYBE target (I think this feature is missing in the patch set. It's necessary to add an ability for bpf prog to flip static_branch. Currently you're proposing to do it from user space only), and there will be no need to supply static_branches_info[] at prog load time. The libbpf static linking will work as-is without extra code. JA_MAYBE will also remove the need for extra bpf map type. The bpf prog can _optionally_ do '.long 1b - .' asm trick and store the address of JA_MAYBE insn into an arbitrary 8 byte value (either in a global variable, a section or somewhere else). I think it's necessary to decouple patching of JA_MAYBE vs naming the location. The ".pushsection .jump_table" should be optional. The kernel's static_key approach hard codes them together, but it's due to x86 limitations. We can introduce JA_MAYBE and use it everywhere in the bpf prog and do not add names or addresses next to them. Then 'bpftool prog dump' can retrieve the insn stream and another command can patch a specific instruction (because JA_MAYBE is easy to spot vs regular JA). At the end it's just a text_poke_bp() to convert a target of JA_MAYBE. The bpf prog can be written with asm goto("go_maybe +0, %l[l_yes]") without names and maps, and the jumps will follow the indicated 'default' branch (how about, the 1st listed is default, the 2nd is maybe). The kernel static keys cannot be flipped atomically anyway, so multiple branches using the same static key is equivalent to an array of addresses that are flipped one by one. I suspect the main use case for isovalent is to compile a bpf prog with debug code that is not running by default and then flip various parts when debugging is necessary. With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(), but no extra map and the prog will load fine. Then you can patch all of such insns or subset of them on demand. (The verifier will allow patching of JA_MAYBE only between two targets, so no safety issues). I think it's more flexible than the new map type and static_branches_info[]. wdyt?
On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote: > On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > > + > > +static __always_inline int __bpf_static_branch_jump(void *static_key) > > +{ > > + asm goto("1:\n\t" > > + "goto %l[l_yes]\n\t" > > + ".pushsection .jump_table, \"aw\"\n\t" > > + ".balign 8\n\t" > > + ".long 1b - .\n\t" > > + ".long %l[l_yes] - .\n\t" > > + ".quad %c0 - . + 1\n\t" > > + ".popsection\n\t" > > + :: "i" (static_key) > > + :: l_yes); > > + return 0; > > +l_yes: > > + return 1; > > +} > > Could you add a test to patch 7 where the same subprog is > used by multiple main progs and another test where a prog > with static_keys is statically linked by libbpf into another prog? > I suspect these cases are not handled by libbpf in the series. > The adjustment of insn offsets is tricky and I don't see this logic > in patch 5. > > The special handling of JA insn (if it's listed in > static_branches_info[]) is fragile. The check_cfg() and the verifier > main loop are two places so far, but JA is an unconditional jump. > Every tool that understands BPF ISA would have to treat JA as "maybe > it's not a jump in this case" and that is concerning. Will do, thanks. > I certainly see the appeal of copy-pasting kernel's static_branch logic, > but we can do better since we're not bound by x86 ISA. > > How about we introduce a new insn JA_MAYBE insn, and check_cfg and > the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1. > The new instruction will indicate that it may jump or fall through. > Another bit could be used to indicate a "default" action (jump vs > fallthrough) which will be used by JITs to translate into nop or jmp. > Once it's a part of the instruction stream we can have bpf prog callable > kfunc that can flip JA_MAYBE target > (I think this feature is missing in the patch set. It's necessary > to add an ability for bpf prog to flip static_branch. Currently > you're proposing to do it from user space only), > and there will be no need to supply static_branches_info[] at prog load time. > The libbpf static linking will work as-is without extra code. > > JA_MAYBE will also remove the need for extra bpf map type. > The bpf prog can _optionally_ do '.long 1b - .' asm trick and > store the address of JA_MAYBE insn into an arbitrary 8 byte value > (either in a global variable, a section or somewhere else). > I think it's necessary to decouple patching of JA_MAYBE vs naming > the location. > The ".pushsection .jump_table" should be optional. > The kernel's static_key approach hard codes them together, but > it's due to x86 limitations. > We can introduce JA_MAYBE and use it everywhere in the bpf prog and > do not add names or addresses next to them. Then 'bpftool prog dump' can > retrieve the insn stream and another command can patch a specific > instruction (because JA_MAYBE is easy to spot vs regular JA). > At the end it's just a text_poke_bp() to convert > a target of JA_MAYBE. > The bpf prog can be written with > asm goto("go_maybe +0, %l[l_yes]") > without names and maps, and the jumps will follow the indicated > 'default' branch (how about, the 1st listed is default, the 2nd is > maybe). > The kernel static keys cannot be flipped atomically anyway, > so multiple branches using the same static key is equivalent to an > array of addresses that are flipped one by one. Thanks for the detailed review. You're right, without adding a new instruction non-kernel observers can't distinguish between a JA and a "static branch JA". This also makes sense to encode direct/inverse flag as well (and more, see below). I would call the new instruction something like JA_CFG, emphasizing the fact that this is a JA which can be configured by an external force. We also can easily add a kfunc API, however, I am for keeping the "map API" in place (in addition to more fine-grained API you've proposed). See my considerations and examples below. > I suspect the main use case for isovalent is to compile a bpf prog > with debug code that is not running by default and then flip > various parts when debugging is necessary. > With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(), > but no extra map and the prog will load fine. Then you can patch > all of such insns or subset of them on demand. > (The verifier will allow patching of JA_MAYBE only between two targets, > so no safety issues). > I think it's more flexible than the new map type and static_branches_info[]. > wdyt? Here is how I think about API. Imagine that we have this new instruction, JA_CFG, and that a program uses it in multiple places. If we don't mark those instructions, then after compilation an external observer can't distinguish between them. We don't know if this is supposed to be instructions controlled by one key or another. We may care about this, say, when a program uses key A for enabling/disabling debug and key B to enable/disable another optional feature. If we push offsets of jumps in a separate section via the "'.long 1b - .' asm trick", then we will have all the same problems with relocations which is fragile, as you've shown. What we can do instead is to encode "local key id" inside the instruction. This local_id is local to the program where it is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused offset/imm, as one of them will be unused. As for me, 4 may be enough for the initial implementation.) This way we can distinguish between different keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)` kfunc can be used to toggle this key on/off for a particular program. This way we don't care about relocation, and API is straightforward. However, if this is the only API we provide, then this makes user's life hard, as they will have to keep track of ids, and programs used, and mapping from "global" id to local ids for each program (when multiple programs use the same static key, which is desirable). If we keep the higher-level "map API", then this simplifies user's life: on a program load a user can send a list of (local_id -> map) mappings, and then toggle all the branches controlled by "a [global] static key" by either bpf(MAP_UPDATE_ELEM, map, value) or kfunc bpf_static_key_set(map, value) whatever is more useful. (I think that keeping the bpf(2) userspace API is worth doing it, as otherwise this, again, makes life harder: users would have to recompile/update iterator programs if new programs using a static key are added, etc.) Libbpf can simplify life even more by automatically allocating local ids and passing mappings to kernel for a program from the `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing about this, if don't want to. Again, no relocations are required here. So, to summarize: * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the program, FLAGS is 0/1 for normal/inverse branches * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which toggles all the branches with ID=key_id in the given prog * Extend bpf_attr with a list of (local_id -> map) mappings. This is an optinal list if user doesn't want one or all branches to be controlled by a map * A new kfunc `bpf_static_key_set(map, value)` which toggles all the static branches in all programs which use `map` to control branches * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in a single syscall without the requirement to create iterators
On Fri, Dec 8, 2023 at 8:22 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote: > > On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > > > > + > > > +static __always_inline int __bpf_static_branch_jump(void *static_key) > > > +{ > > > + asm goto("1:\n\t" > > > + "goto %l[l_yes]\n\t" > > > + ".pushsection .jump_table, \"aw\"\n\t" > > > + ".balign 8\n\t" > > > + ".long 1b - .\n\t" > > > + ".long %l[l_yes] - .\n\t" > > > + ".quad %c0 - . + 1\n\t" > > > + ".popsection\n\t" > > > + :: "i" (static_key) > > > + :: l_yes); > > > + return 0; > > > +l_yes: > > > + return 1; > > > +} > > > > Could you add a test to patch 7 where the same subprog is > > used by multiple main progs and another test where a prog > > with static_keys is statically linked by libbpf into another prog? > > I suspect these cases are not handled by libbpf in the series. > > The adjustment of insn offsets is tricky and I don't see this logic > > in patch 5. > > > > The special handling of JA insn (if it's listed in > > static_branches_info[]) is fragile. The check_cfg() and the verifier > > main loop are two places so far, but JA is an unconditional jump. > > Every tool that understands BPF ISA would have to treat JA as "maybe > > it's not a jump in this case" and that is concerning. > > Will do, thanks. > > > I certainly see the appeal of copy-pasting kernel's static_branch logic, > > but we can do better since we're not bound by x86 ISA. > > > > How about we introduce a new insn JA_MAYBE insn, and check_cfg and > > the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1. > > The new instruction will indicate that it may jump or fall through. > > Another bit could be used to indicate a "default" action (jump vs > > fallthrough) which will be used by JITs to translate into nop or jmp. > > Once it's a part of the instruction stream we can have bpf prog callable > > kfunc that can flip JA_MAYBE target > > (I think this feature is missing in the patch set. It's necessary > > to add an ability for bpf prog to flip static_branch. Currently > > you're proposing to do it from user space only), > > and there will be no need to supply static_branches_info[] at prog load time. > > The libbpf static linking will work as-is without extra code. > > > > JA_MAYBE will also remove the need for extra bpf map type. > > The bpf prog can _optionally_ do '.long 1b - .' asm trick and > > store the address of JA_MAYBE insn into an arbitrary 8 byte value > > (either in a global variable, a section or somewhere else). > > I think it's necessary to decouple patching of JA_MAYBE vs naming > > the location. > > The ".pushsection .jump_table" should be optional. > > The kernel's static_key approach hard codes them together, but > > it's due to x86 limitations. > > We can introduce JA_MAYBE and use it everywhere in the bpf prog and > > do not add names or addresses next to them. Then 'bpftool prog dump' can > > retrieve the insn stream and another command can patch a specific > > instruction (because JA_MAYBE is easy to spot vs regular JA). > > At the end it's just a text_poke_bp() to convert > > a target of JA_MAYBE. > > The bpf prog can be written with > > asm goto("go_maybe +0, %l[l_yes]") > > without names and maps, and the jumps will follow the indicated > > 'default' branch (how about, the 1st listed is default, the 2nd is > > maybe). > > The kernel static keys cannot be flipped atomically anyway, > > so multiple branches using the same static key is equivalent to an > > array of addresses that are flipped one by one. > > Thanks for the detailed review. You're right, without adding a new > instruction non-kernel observers can't distinguish between a JA and a > "static branch JA". This also makes sense to encode direct/inverse flag as > well (and more, see below). I would call the new instruction something like > JA_CFG, emphasizing the fact that this is a JA which can be configured by > an external force. > > We also can easily add a kfunc API, however, I am for keeping the "map API" > in place (in addition to more fine-grained API you've proposed). See my > considerations and examples below. > > > I suspect the main use case for isovalent is to compile a bpf prog > > with debug code that is not running by default and then flip > > various parts when debugging is necessary. > > With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(), > > but no extra map and the prog will load fine. Then you can patch > > all of such insns or subset of them on demand. > > (The verifier will allow patching of JA_MAYBE only between two targets, > > so no safety issues). > > I think it's more flexible than the new map type and static_branches_info[]. > > wdyt? > > Here is how I think about API. Imagine that we have this new instruction, > JA_CFG, and that a program uses it in multiple places. If we don't mark > those instructions, then after compilation an external observer can't > distinguish between them. We don't know if this is supposed to be > instructions controlled by one key or another. We may care about this, say, > when a program uses key A for enabling/disabling debug and key B to > enable/disable another optional feature. > > If we push offsets of jumps in a separate section via the "'.long 1b - .' > asm trick", then we will have all the same problems with relocations which > is fragile, as you've shown. What we can do instead is to encode "local key > id" inside the instruction. This local_id is local to the program where it > is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused > offset/imm, as one of them will be unused. As for me, 4 may be enough for > the initial implementation.) This way we can distinguish between different > keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)` > kfunc can be used to toggle this key on/off for a particular program. This > way we don't care about relocation, and API is straightforward. I feel like embedding some sort of ID inside the instruction is very.. unusual, shall we say? When I was reading commit messages and discussion, I couldn't shake the impression that this got to be solved with the help of relocations. How about something like below (not very well thought out, sorry) for how this can be put together both from user-space and kernel sides. 1. We can add a special SEC(".static_keys") section, in which we define special global variables representing a static key. These variables will be forced to be global to ensure unique names and stuff like that. 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to one such special global variable and and instructs compiler to emit relocation between static key variable and JMP_CFG instruction. Libbpf will properly update these relocations during static linking and subprog rearrangement, just like we do it for map references today. 3. libbpf takes care of creating a special map (could be ARRAY with a special flag, but a dedicated map with multiple entries might be cleaner), in which each static key variable is represented as a separate key. 3.5 When libbpf loads BPF program, I guess it will have to upload a multi-mapping from static_key_id to insn_off (probably we should allow multiple inst_offs per one static_key_id), so that kernel can identify which instructions are to be updated. 4. Similar to SEC(".kconfig") libbpf will actually also have a read-only global variables map, where each variable's value is static key integer ID that corresponds to a static key in that special map from #3. This allows user-space to easily get this ID without hard-coding or guessing anything, it's just int static_key_id = skel->static_keys->my_static_key; Then you can do bpf_map_update_elem(&skel->maps.static_key_map, &static_key_id, 0/1) to flip the switch from user-space. 5. From the BPF side we use global variable semantics similarly to obtain integer key, then we can use a dedicated special kfunc that will accept prog, static_key_id, and desired state to flip that jump whichever way. So on BPF side it will look roughly like this: int static_key1 SEC(".static_keys"); ... if (bpf_static_key_likely(&static_key1)) { ... } ... /* to switch it on or off */ bpf_static_key_switch(&static_key1, 1); From user-space side I basically already showed: bpf_map_update_elem(bpf_map__fd(skel->static_keys->static_key1), &skel->static_keys->static_key1, 1); Something along these lines. > > However, if this is the only API we provide, then this makes user's life > hard, as they will have to keep track of ids, and programs used, and > mapping from "global" id to local ids for each program (when multiple > programs use the same static key, which is desirable). If we keep the > higher-level "map API", then this simplifies user's life: on a program load > a user can send a list of (local_id -> map) mappings, and then toggle all > the branches controlled by "a [global] static key" by either > > bpf(MAP_UPDATE_ELEM, map, value) > > or > > kfunc bpf_static_key_set(map, value) > > whatever is more useful. (I think that keeping the bpf(2) userspace API is > worth doing it, as otherwise this, again, makes life harder: users would > have to recompile/update iterator programs if new programs using a static > key are added, etc.) > > Libbpf can simplify life even more by automatically allocating local ids > and passing mappings to kernel for a program from the > `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing > about this, if don't want to. Again, no relocations are required here. > > So, to summarize: > > * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the > program, FLAGS is 0/1 for normal/inverse branches > +1 for a dedicated instruction > * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which > toggles all the branches with ID=key_id in the given prog > > * Extend bpf_attr with a list of (local_id -> map) mappings. This is an > optinal list if user doesn't want one or all branches to be controlled > by a map > > * A new kfunc `bpf_static_key_set(map, value)` which toggles all the > static branches in all programs which use `map` to control branches > > * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in > a single syscall without the requirement to create iterators Let's see if my above proposal fits this as a way to simplify allocating static key IDs and keeping them in sync between BPF and user-space sides.
On Fri, 2023-12-08 at 14:04 -0800, Andrii Nakryiko wrote: [...] > > However, if this is the only API we provide, then this makes user's life > > hard, as they will have to keep track of ids, and programs used, and > > mapping from "global" id to local ids for each program (when multiple > > programs use the same static key, which is desirable). If we keep the > > higher-level "map API", then this simplifies user's life: on a program load > > a user can send a list of (local_id -> map) mappings, and then toggle all > > the branches controlled by "a [global] static key" by either > > > > bpf(MAP_UPDATE_ELEM, map, value) > > > > or > > > > kfunc bpf_static_key_set(map, value) > > > > whatever is more useful. (I think that keeping the bpf(2) userspace API is > > worth doing it, as otherwise this, again, makes life harder: users would > > have to recompile/update iterator programs if new programs using a static > > key are added, etc.) > > > > Libbpf can simplify life even more by automatically allocating local ids > > and passing mappings to kernel for a program from the > > `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing > > about this, if don't want to. Again, no relocations are required here. > > > > So, to summarize: > > > > * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the > > program, FLAGS is 0/1 for normal/inverse branches > > > > +1 for a dedicated instruction fwiw, if relocations are used instead of IDs the new instruction does not have to be a control flow. It might be a mov that sets target register to a value that verifier treats as unknown. At runtime this mov could be patched to assign different values. Granted it would be three instructions: mov rax, 0; cmp rax, 0; je ... instead of one, but I don't believe there would noticeable performance difference. On a plus side: even simpler verification, likely/unlikely for free, no need to track if branch is inverted.
On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > I feel like embedding some sort of ID inside the instruction is very.. > unusual, shall we say? yeah. no magic numbers inside insns pls. I don't like JA_CFG name, since I read CFG as control flow graph, while you probably meant CFG as configurable. How about BPF_JA_OR_NOP ? Then in combination with BPF_JMP or BPF_JMP32 modifier the insn->off|imm will be used. 1st bit in src_reg can indicate the default action: nop or jmp. In asm it may look like asm("goto_or_nop +5") > 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to > one such special global variable and and instructs compiler to emit > relocation between static key variable and JMP_CFG instruction. > > Libbpf will properly update these relocations during static linking > and subprog rearrangement, just like we do it for map references > today. Right. libbpf has RELO_SUBPROG_ADDR. This new relo will be pretty much that. And we have proper C syntax for taking an address: &&label. The bpf_static_branch macro can use it. We wanted to add it for a long time to support proper switch() and jmp tables. I don't like IDs and new map type for this. The macro can have 'branch_name' as one of the arguments and it will populate addresses of insns into "name.static_branch" section. From libbpf pov it will be yet another global section which is represented as a traditional bpf array of one element. No extra handling on the libbpf side. The question is how to represent the "address" of the insn. I think 4 byte prog_id + 4 byte insn_idx will do. Then bpf prog can pass such "address" into bpf_static_branch_enable/disable kfunc. The user space can iterate over 8 byte "addresses" in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE syscall cmds. We can have a helper on libbpf side for that. I see no need to introduce a new map type just to reuse map_update_elem cmd.
On Fri, Dec 8, 2023 at 3:07 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > fwiw, if relocations are used instead of IDs the new instruction does > not have to be a control flow. It might be a mov that sets target > register to a value that verifier treats as unknown. At runtime this > mov could be patched to assign different values. Granted it would be > three instructions: > > mov rax, 0; > cmp rax, 0; > je ... > > instead of one, but I don't believe there would noticeable performance > difference. On a plus side: even simpler verification, likely/unlikely > for free, no need to track if branch is inverted. 'je' is a conditional jmp. cpu will mispredict it sooner or later and depending on the density of jmps around it the misprediction can be severe. It has to be an unconditional jmp to be fast.
On 12/8/23 8:05 PM, Alexei Starovoitov wrote: > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> >> I feel like embedding some sort of ID inside the instruction is very.. >> unusual, shall we say? > yeah. no magic numbers inside insns pls. > > I don't like JA_CFG name, since I read CFG as control flow graph, > while you probably meant CFG as configurable. > How about BPF_JA_OR_NOP ? > Then in combination with BPF_JMP or BPF_JMP32 modifier > the insn->off|imm will be used. > 1st bit in src_reg can indicate the default action: nop or jmp. > In asm it may look like asm("goto_or_nop +5") How does the C source code looks like in order to generate BPF_JA_OR_NOP insn? Any source examples? > >> 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to >> one such special global variable and and instructs compiler to emit >> relocation between static key variable and JMP_CFG instruction. >> >> Libbpf will properly update these relocations during static linking >> and subprog rearrangement, just like we do it for map references >> today. > Right. libbpf has RELO_SUBPROG_ADDR. > This new relo will be pretty much that. > And we have proper C syntax for taking an address: &&label. > The bpf_static_branch macro can use it. > We wanted to add it for a long time to support proper > switch() and jmp tables. > > I don't like IDs and new map type for this. > The macro can have 'branch_name' as one of the arguments and > it will populate addresses of insns into "name.static_branch" section. > > From libbpf pov it will be yet another global section which > is represented as a traditional bpf array of one element. > No extra handling on the libbpf side. > > The question is how to represent the "address" of the insn. > I think 4 byte prog_id + 4 byte insn_idx will do. > > Then bpf prog can pass such "address" into bpf_static_branch_enable/disable > kfunc. > > The user space can iterate over 8 byte "addresses" > in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE > syscall cmds. > We can have a helper on libbpf side for that. > > I see no need to introduce a new map type just to reuse map_update_elem cmd. >
On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 12/8/23 8:05 PM, Alexei Starovoitov wrote: > > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> > >> I feel like embedding some sort of ID inside the instruction is very.. > >> unusual, shall we say? > > yeah. no magic numbers inside insns pls. > > > > I don't like JA_CFG name, since I read CFG as control flow graph, > > while you probably meant CFG as configurable. > > How about BPF_JA_OR_NOP ? > > Then in combination with BPF_JMP or BPF_JMP32 modifier > > the insn->off|imm will be used. > > 1st bit in src_reg can indicate the default action: nop or jmp. > > In asm it may look like asm("goto_or_nop +5") > > How does the C source code looks like in order to generate > BPF_JA_OR_NOP insn? Any source examples? It will be in inline asm only. The address of that insn will be taken either via && or via asm (".long %l[label]"). From llvm pov both should go through the same relo creation logic. I hope :)
On 12/8/23 8:25 PM, Alexei Starovoitov wrote: > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> On 12/8/23 8:05 PM, Alexei Starovoitov wrote: >>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko >>> <andrii.nakryiko@gmail.com> wrote: >>>> I feel like embedding some sort of ID inside the instruction is very.. >>>> unusual, shall we say? >>> yeah. no magic numbers inside insns pls. >>> >>> I don't like JA_CFG name, since I read CFG as control flow graph, >>> while you probably meant CFG as configurable. >>> How about BPF_JA_OR_NOP ? >>> Then in combination with BPF_JMP or BPF_JMP32 modifier >>> the insn->off|imm will be used. >>> 1st bit in src_reg can indicate the default action: nop or jmp. >>> In asm it may look like asm("goto_or_nop +5") >> How does the C source code looks like in order to generate >> BPF_JA_OR_NOP insn? Any source examples? > It will be in inline asm only. The address of that insn will > be taken either via && or via asm (".long %l[label]"). > From llvm pov both should go through the same relo creation logic. I hope :) A hack in llvm below with an example, could you check whether the C syntax and object dump result is what you want to see? diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp index 90697c6645be..38b1cbc31f9a 100644 --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp @@ -231,6 +231,7 @@ public: .Case("call", true) .Case("goto", true) .Case("gotol", true) + .Case("goto_or_nop", true) .Case("*", true) .Case("exit", true) .Case("lock", true) @@ -259,6 +260,7 @@ public: .Case("bswap64", true) .Case("goto", true) .Case("gotol", true) + .Case("goto_or_nop", true) .Case("ll", true) .Case("skb", true) .Case("s", true) diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td index 5972c9d49c51..a953d10429bf 100644 --- a/llvm/lib/Target/BPF/BPFInstrInfo.td +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> let BPFClass = BPF_JMP; } +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> + : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, + (outs), + (ins brtarget:$BrDst), + !strconcat(OpcodeStr, " $BrDst"), + Pattern> { + bits<16> BrDst; + + let Inst{47-32} = BrDst; + let Inst{31-0} = 1; + let BPFClass = BPF_JMP; +} + class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, (outs), @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr> let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in { def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>; def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>; + def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>; } // Jump and link And an example, [ ~/tmp1/gotol]$ cat t.c int bar(void); int foo() { int a, b; asm volatile goto ("r0 = 0; \ goto_or_nop %l[label]; \ r2 = 2; \ r3 = 3; \ "::::label); a = bar(); label: b = 20 * a; return b; } [ ~/tmp1/gotol]$ clang --target=bpf -O2 -S t.c [ ~/tmp1/gotol]$ cat t.s .text .file "t.c" .globl foo # -- Begin function foo .p2align 3 .type foo,@function foo: # @foo # %bb.0: # %entry r0 = 0 #APP r0 = 0 goto_or_nop LBB0_2 r2 = 2 r3 = 3 #NO_APP # %bb.1: # %asm.fallthrough call bar r0 *= 20 LBB0_2: # Block address taken # %label # Label of block must be emitted exit .Lfunc_end0: .size foo, .Lfunc_end0-foo # -- End function .addrsig [ ~/tmp1/gotol]$ clang --target=bpf -O2 -c t.c [ ~/tmp1/gotol]$ llvm-objdump -dr t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: b7 00 00 00 00 00 00 00 r0 = 0x0 1: b7 00 00 00 00 00 00 00 r0 = 0x0 2: 05 00 04 00 01 00 00 00 goto_or_nop +0x4 <LBB0_2> 3: b7 02 00 00 02 00 00 00 r2 = 0x2 4: b7 03 00 00 03 00 00 00 r3 = 0x3 5: 85 10 00 00 ff ff ff ff call -0x1 0000000000000028: R_BPF_64_32 bar 6: 27 00 00 00 14 00 00 00 r0 *= 0x14 0000000000000038 <LBB0_2>: 7: 95 00 00 00 00 00 00 00 exit [ ~/tmp1/gotol]$
On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 12/8/23 8:25 PM, Alexei Starovoitov wrote: > > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: > >> > >> On 12/8/23 8:05 PM, Alexei Starovoitov wrote: > >>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko > >>> <andrii.nakryiko@gmail.com> wrote: > >>>> I feel like embedding some sort of ID inside the instruction is very.. > >>>> unusual, shall we say? > >>> yeah. no magic numbers inside insns pls. > >>> > >>> I don't like JA_CFG name, since I read CFG as control flow graph, > >>> while you probably meant CFG as configurable. > >>> How about BPF_JA_OR_NOP ? > >>> Then in combination with BPF_JMP or BPF_JMP32 modifier > >>> the insn->off|imm will be used. > >>> 1st bit in src_reg can indicate the default action: nop or jmp. > >>> In asm it may look like asm("goto_or_nop +5") > >> How does the C source code looks like in order to generate > >> BPF_JA_OR_NOP insn? Any source examples? > > It will be in inline asm only. The address of that insn will > > be taken either via && or via asm (".long %l[label]"). > > From llvm pov both should go through the same relo creation logic. I hope :) > > A hack in llvm below with an example, could you check whether the C > syntax and object dump result > is what you want to see? Thank you for the ultra quick llvm diff! > > diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > index 90697c6645be..38b1cbc31f9a 100644 > --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > @@ -231,6 +231,7 @@ public: > .Case("call", true) > .Case("goto", true) > .Case("gotol", true) > + .Case("goto_or_nop", true) > .Case("*", true) > .Case("exit", true) > .Case("lock", true) > @@ -259,6 +260,7 @@ public: > .Case("bswap64", true) > .Case("goto", true) > .Case("gotol", true) > + .Case("goto_or_nop", true) > .Case("ll", true) > .Case("skb", true) > .Case("s", true) > diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td > b/llvm/lib/Target/BPF/BPFInstrInfo.td > index 5972c9d49c51..a953d10429bf 100644 > --- a/llvm/lib/Target/BPF/BPFInstrInfo.td > +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td > @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, > list<dag> Pattern> > let BPFClass = BPF_JMP; > } > > +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> > + : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, > + (outs), > + (ins brtarget:$BrDst), > + !strconcat(OpcodeStr, " $BrDst"), > + Pattern> { > + bits<16> BrDst; > + > + let Inst{47-32} = BrDst; > + let Inst{31-0} = 1; > + let BPFClass = BPF_JMP; > +} > + > class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> > : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, > (outs), > @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr> > let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in { > def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>; > def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>; > + def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>; I was thinking of burning the new 0xE opcode for it, but you're right. It's a flavor of existing JA insn and it's indeed better to just use src_reg=1 bit to indicate so. We probably need to use the 2nd bit of src_reg to indicate its default state (jmp or fallthrough). > asm volatile goto ("r0 = 0; \ > goto_or_nop %l[label]; \ > r2 = 2; \ > r3 = 3; \ Not sure how to represent the default state in assembly though. "goto_or_nop" defaults to goto "nop_or_goto" default to nop ? Do we need "gotol" for imm32 or will it be automatic?
On 12/9/23 9:18 AM, Alexei Starovoitov wrote: > On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> On 12/8/23 8:25 PM, Alexei Starovoitov wrote: >>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote: >>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko >>>>> <andrii.nakryiko@gmail.com> wrote: >>>>>> I feel like embedding some sort of ID inside the instruction is very.. >>>>>> unusual, shall we say? >>>>> yeah. no magic numbers inside insns pls. >>>>> >>>>> I don't like JA_CFG name, since I read CFG as control flow graph, >>>>> while you probably meant CFG as configurable. >>>>> How about BPF_JA_OR_NOP ? >>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier >>>>> the insn->off|imm will be used. >>>>> 1st bit in src_reg can indicate the default action: nop or jmp. >>>>> In asm it may look like asm("goto_or_nop +5") >>>> How does the C source code looks like in order to generate >>>> BPF_JA_OR_NOP insn? Any source examples? >>> It will be in inline asm only. The address of that insn will >>> be taken either via && or via asm (".long %l[label]"). >>> From llvm pov both should go through the same relo creation logic. I hope :) >> A hack in llvm below with an example, could you check whether the C >> syntax and object dump result >> is what you want to see? > Thank you for the ultra quick llvm diff! > >> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >> index 90697c6645be..38b1cbc31f9a 100644 >> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >> @@ -231,6 +231,7 @@ public: >> .Case("call", true) >> .Case("goto", true) >> .Case("gotol", true) >> + .Case("goto_or_nop", true) >> .Case("*", true) >> .Case("exit", true) >> .Case("lock", true) >> @@ -259,6 +260,7 @@ public: >> .Case("bswap64", true) >> .Case("goto", true) >> .Case("gotol", true) >> + .Case("goto_or_nop", true) >> .Case("ll", true) >> .Case("skb", true) >> .Case("s", true) >> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td >> b/llvm/lib/Target/BPF/BPFInstrInfo.td >> index 5972c9d49c51..a953d10429bf 100644 >> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td >> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td >> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, >> list<dag> Pattern> >> let BPFClass = BPF_JMP; >> } >> >> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> >> + : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, >> + (outs), >> + (ins brtarget:$BrDst), >> + !strconcat(OpcodeStr, " $BrDst"), >> + Pattern> { >> + bits<16> BrDst; >> + >> + let Inst{47-32} = BrDst; >> + let Inst{31-0} = 1; >> + let BPFClass = BPF_JMP; >> +} >> + >> class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> >> : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, >> (outs), >> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr> >> let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in { >> def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>; >> def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>; >> + def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>; > I was thinking of burning the new 0xE opcode for it, > but you're right. It's a flavor of existing JA insn and it's indeed > better to just use src_reg=1 bit to indicate so. Right, using src_reg to indicate a new flavor of JA insn sounds a good idea. My previously-used 'imm' field is a pure hack. > > We probably need to use the 2nd bit of src_reg to indicate its default state > (jmp or fallthrough). Good point. > >> asm volatile goto ("r0 = 0; \ >> goto_or_nop %l[label]; \ >> r2 = 2; \ >> r3 = 3; \ > Not sure how to represent the default state in assembly though. > "goto_or_nop" defaults to goto > "nop_or_goto" default to nop > ? > > Do we need "gotol" for imm32 or will it be automatic? It won't be automatic. At the end of this email, I will show the new change to have gotol_or_nop and nop_or_gotol insn and an example to show it in asm. But there is an issue here. In my example, the compiler (more specifically the InstCombine pass) moved some code after the 'label' to before the 'label'. Not exactly sure how to prevent this. Maybe current 'asm goto' already have a way to handle this. Will investigate this later. ========================= $ cat t.c int bar(void); int foo1() { int a, b; asm volatile goto ("r0 = 0; \ gotol_or_nop %l[label]; \ r2 = 2; \ r3 = 3; \ "::::label); a = bar(); label: b = 20 * a; return b; } int foo2() { int a, b; asm volatile goto ("r0 = 0; \ nop_or_gotol %l[label]; \ r2 = 2; \ r3 = 3; \ "::::label); a = bar(); label: b = 20 * a; return b; } $ clang --target=bpf -O2 -g -c t.c $ llvm-objdump -S t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo1>: ; { 0: b7 00 00 00 00 00 00 00 r0 = 0x0 ; asm volatile goto ("r0 = 0; \ 1: b7 00 00 00 00 00 00 00 r0 = 0x0 2: 06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2> 3: b7 02 00 00 02 00 00 00 r2 = 0x2 4: b7 03 00 00 03 00 00 00 r3 = 0x3 ; a = bar(); 5: 85 10 00 00 ff ff ff ff call -0x1 ; b = 20 * a; 6: 27 00 00 00 14 00 00 00 r0 *= 0x14 0000000000000038 <LBB0_2>: ; return b; 7: 95 00 00 00 00 00 00 00 exit 0000000000000040 <foo2>: ; { 8: b7 00 00 00 00 00 00 00 r0 = 0x0 ; asm volatile goto ("r0 = 0; \ 9: b7 00 00 00 00 00 00 00 r0 = 0x0 10: 06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2> 11: b7 02 00 00 02 00 00 00 r2 = 0x2 12: b7 03 00 00 03 00 00 00 r3 = 0x3 ; a = bar(); 13: 85 10 00 00 ff ff ff ff call -0x1 ; b = 20 * a; 14: 27 00 00 00 14 00 00 00 r0 *= 0x14 0000000000000078 <LBB1_2>: ; return b; 15: 95 00 00 00 00 00 00 00 exit
How about a slightly different modification of the Anton's idea. Suppose that, as before, there is a special map type: struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, __u32); __type(value, __u32); __uint(map_flags, BPF_F_STATIC_KEY); __uint(max_entries, 1); } skey1 SEC(".maps") Which is used as below: __attribute__((naked)) int foo(void) { asm volatile ( "r0 = %[skey1] ll;" "if r0 != r0 goto 1f;" "r1 = r10;" "r1 += -8;" "r2 = 1;" "call %[bpf_trace_printk];" "1:" "exit;" :: __imm_addr(skey1), __imm(bpf_trace_printk) : __clobber_all ); } Disassembly of section .text: 0000000000000000 <foo>: 0: r0 = 0x0 ll 0000000000000000: R_BPF_64_64 skey1 ;; <---- Map relocation as usual 2: if r0 == r0 goto +0x4 <foo+0x38> ;; <---- Note condition 3: r1 = r10 4: r1 += -0x8 5: r2 = 0x1 6: call 0x6 7: exit And suppose that verifier is modified in the following ways: - treat instructions "if rX == rX" / "if rX != rX" (when rX points to static key map) in a special way: - when program is verified, the jump is considered non deterministic; - when program is jitted, the jump is compiled as nop for "!=" and as unconditional jump for "=="; - build a table of static keys based on a specific map referenced in condition, e.g. for the example above it can be inferred that insn 2 associates with map skey1 because "r0" points to "skey1"; - jit "rX = <static key> ll;" as nop; On the plus side: - any kinds of jump tables are omitted from system call; - no new instruction is needed; - almost no modifications to libbpf are necessary (only a helper macro to convince clang to keep "if rX == rX"); What do you think? Thanks, Eduard
On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > How about a slightly different modification of the Anton's idea. > Suppose that, as before, there is a special map type: > > struct { > __uint(type, BPF_MAP_TYPE_ARRAY); > __type(key, __u32); > __type(value, __u32); > __uint(map_flags, BPF_F_STATIC_KEY); > __uint(max_entries, 1); > } skey1 SEC(".maps") Instead of special map that the kernel has to know about the same intent can be expressed with: int skey1; r0 = %[skey1] ll; and then the kernel needs no extra map type while the user space can collect all static_branches that use &skey1 by iterating insn stream and comparing addresses. > Which is used as below: > > __attribute__((naked)) > int foo(void) { > asm volatile ( > "r0 = %[skey1] ll;" > "if r0 != r0 goto 1f;" > "r1 = r10;" > "r1 += -8;" > "r2 = 1;" > "call %[bpf_trace_printk];" > "1:" > "exit;" > :: __imm_addr(skey1), > __imm(bpf_trace_printk) > : __clobber_all > ); > } > > Disassembly of section .text: > > 0000000000000000 <foo>: > 0: r0 = 0x0 ll > 0000000000000000: R_BPF_64_64 skey1 ;; <---- Map relocation as usual > 2: if r0 == r0 goto +0x4 <foo+0x38> ;; <---- Note condition > 3: r1 = r10 > 4: r1 += -0x8 > 5: r2 = 0x1 > 6: call 0x6 > 7: exit > > And suppose that verifier is modified in the following ways: > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to > static key map) in a special way: > - when program is verified, the jump is considered non deterministic; > - when program is jitted, the jump is compiled as nop for "!=" and as > unconditional jump for "=="; > - build a table of static keys based on a specific map referenced in > condition, e.g. for the example above it can be inferred that insn 2 > associates with map skey1 because "r0" points to "skey1"; > - jit "rX = <static key> ll;" as nop; > > On the plus side: > - any kinds of jump tables are omitted from system call; > - no new instruction is needed; > - almost no modifications to libbpf are necessary (only a helper macro > to convince clang to keep "if rX == rX"); Reusing existing insn means that we're giving it new meaning and that always comes with danger of breaking existing progs. In this case if rX == rX isn't very meaningful and new semantics shouldn't break anything, but it's a danger zone. If we treat: if r0 == r0 as JA then we have to treat if r1 == r1 as JA as well and it becomes ambiguous when prog_info needs to return the insns back to user space. If we go with rX == rX approach we should probably limit it to one specific register. r0, r10, r11 can be considered and they have their own pros and cons. Additional: r0 = %[skey1] ll in front of JE/JNE is a waste. If we JIT it to useless native insn we will be burning cpu for no reason. So we should probably optimize it out. If we do so, then this inline insn becomes a nop and it's effectively a relocation. The insn stream will carry this rX = 64bit_const insn to indicate the scope of the next insn. It's pretty much like Anton's idea of using extra bits in JA to encode an integer key_id. With ld_imm64 we will encode 64-bit key_id. Another insn with more bits to burn that has no effect on execution. It doesn't look clean to encode so much extra metadata into instructions that JITs and the interpreter have to ignore. If we go this route: r11 = 64bit_const if r11 == r11 goto is a lesser evil. Still, it's not as clean as JA with extra bits in src_reg. We already optimize JA +0 into a nop. See opt_remove_nops(). So a flavor of JA insn looks the most natural fit for a selectable JA +xx or JA +0. And the special map really doesn't fit. Whatever we do, let's keep text_poke-able insn logic separate from bookkeeping of addresses of those insns. I think a special prefixed section that is understood by libbpf (like what I proposed with "name.static_branch") will do fine. If it's not good enough we can add a "set" map type that will be a generic set of values. It can be a set of 8-byte addresses to keep locations of static_branches, but let's keep it generic. I think it's fine to add: __uint(type, BPF_MAP_TYPE_SET) and let libbpf populate it with addresses of insns, or address of variables, or other values when it prepares a program for loading. But map_update_elem should never be doing text_poke on insns. We added prog_array map type is the past, but that was done during the early days. If we were designing bpf today we would have gone a different route.
On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote: > > On 12/9/23 9:18 AM, Alexei Starovoitov wrote: > > On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > On 12/8/23 8:25 PM, Alexei Starovoitov wrote: > > > > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > On 12/8/23 8:05 PM, Alexei Starovoitov wrote: > > > > > > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko > > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > I feel like embedding some sort of ID inside the instruction is very.. > > > > > > > unusual, shall we say? > > > > > > yeah. no magic numbers inside insns pls. > > > > > > > > > > > > I don't like JA_CFG name, since I read CFG as control flow graph, > > > > > > while you probably meant CFG as configurable. > > > > > > How about BPF_JA_OR_NOP ? > > > > > > Then in combination with BPF_JMP or BPF_JMP32 modifier > > > > > > the insn->off|imm will be used. > > > > > > 1st bit in src_reg can indicate the default action: nop or jmp. > > > > > > In asm it may look like asm("goto_or_nop +5") > > > > > How does the C source code looks like in order to generate > > > > > BPF_JA_OR_NOP insn? Any source examples? > > > > It will be in inline asm only. The address of that insn will > > > > be taken either via && or via asm (".long %l[label]"). > > > > From llvm pov both should go through the same relo creation logic. I hope :) > > > A hack in llvm below with an example, could you check whether the C > > > syntax and object dump result > > > is what you want to see? > > Thank you for the ultra quick llvm diff! > > > > > diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > > > b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > > > index 90697c6645be..38b1cbc31f9a 100644 > > > --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > > > +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp > > > @@ -231,6 +231,7 @@ public: > > > .Case("call", true) > > > .Case("goto", true) > > > .Case("gotol", true) > > > + .Case("goto_or_nop", true) > > > .Case("*", true) > > > .Case("exit", true) > > > .Case("lock", true) > > > @@ -259,6 +260,7 @@ public: > > > .Case("bswap64", true) > > > .Case("goto", true) > > > .Case("gotol", true) > > > + .Case("goto_or_nop", true) > > > .Case("ll", true) > > > .Case("skb", true) > > > .Case("s", true) > > > diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td > > > b/llvm/lib/Target/BPF/BPFInstrInfo.td > > > index 5972c9d49c51..a953d10429bf 100644 > > > --- a/llvm/lib/Target/BPF/BPFInstrInfo.td > > > +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td > > > @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, > > > list<dag> Pattern> > > > let BPFClass = BPF_JMP; > > > } > > > > > > +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> > > > + : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, > > > + (outs), > > > + (ins brtarget:$BrDst), > > > + !strconcat(OpcodeStr, " $BrDst"), > > > + Pattern> { > > > + bits<16> BrDst; > > > + > > > + let Inst{47-32} = BrDst; > > > + let Inst{31-0} = 1; > > > + let BPFClass = BPF_JMP; > > > +} > > > + > > > class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> > > > : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, > > > (outs), > > > @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr> > > > let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in { > > > def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>; > > > def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>; > > > + def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>; > > I was thinking of burning the new 0xE opcode for it, > > but you're right. It's a flavor of existing JA insn and it's indeed > > better to just use src_reg=1 bit to indicate so. > > Right, using src_reg to indicate a new flavor of JA insn sounds > a good idea. My previously-used 'imm' field is a pure hack. > > > > > We probably need to use the 2nd bit of src_reg to indicate its default state > > (jmp or fallthrough). > > Good point. > > > > > > asm volatile goto ("r0 = 0; \ > > > goto_or_nop %l[label]; \ > > > r2 = 2; \ > > > r3 = 3; \ > > Not sure how to represent the default state in assembly though. > > "goto_or_nop" defaults to goto > > "nop_or_goto" default to nop > > ? > > > > Do we need "gotol" for imm32 or will it be automatic? > > It won't be automatic. > > At the end of this email, I will show the new change > to have gotol_or_nop and nop_or_gotol insn and an example Thanks a lot Yonghong! May I ask you to send a full patch for LLVM (with gotol) so that I can test it? Overall, I think that JA + flags in SRC_REG is indeed better than a new instruction, as a new code is not used. This looks for me that two bits aren't enough, and the third is required, as the second bit seems to be overloaded: * bit 1 indicates that this is a "JA_MAYBE" * bit 2 indicates a jump or nop (i.e., the current state) However, we also need another bit which indicates what to do with the instruction when we issue [an abstract] command flip_branch_on_or_off(branch, 0/1) Without this information (and in the absense of external meta-data on how to patch the branch) we can't determine what a given (BPF, not jitted) program currently does. For example, if we issue flip_branch_on_or_off(branch, 0) then we can't reflect this in the xlated program by setting the second bit to jmp/off. Again, JITted program is fine, but it will be desynchronized from xlated in term of logic (some instructions will be mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP). In my original patch we kept this triplet as (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse) > to show it in asm. But there is an issue here. > In my example, the compiler (more specifically > the InstCombine pass) moved some code after > the 'label' to before the 'label'. Not exactly > sure how to prevent this. Maybe current > 'asm goto' already have a way to handle > this. Will investigate this later. > > > ========================= > > $ cat t.c > int bar(void); > int foo1() > { > int a, b; > asm volatile goto ("r0 = 0; \ > gotol_or_nop %l[label]; \ > r2 = 2; \ > r3 = 3; \ > "::::label); > a = bar(); > label: > b = 20 * a; > return b; > } > int foo2() > { > int a, b; > asm volatile goto ("r0 = 0; \ > nop_or_gotol %l[label]; \ > r2 = 2; \ > r3 = 3; \ > "::::label); > a = bar(); > label: > b = 20 * a; > return b; > } > $ clang --target=bpf -O2 -g -c t.c > $ llvm-objdump -S t.o > > t.o: file format elf64-bpf > > Disassembly of section .text: > > 0000000000000000 <foo1>: > ; { > 0: b7 00 00 00 00 00 00 00 r0 = 0x0 > ; asm volatile goto ("r0 = 0; \ > 1: b7 00 00 00 00 00 00 00 r0 = 0x0 > 2: 06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2> > 3: b7 02 00 00 02 00 00 00 r2 = 0x2 > 4: b7 03 00 00 03 00 00 00 r3 = 0x3 > ; a = bar(); > 5: 85 10 00 00 ff ff ff ff call -0x1 > ; b = 20 * a; > 6: 27 00 00 00 14 00 00 00 r0 *= 0x14 > > 0000000000000038 <LBB0_2>: > ; return b; > 7: 95 00 00 00 00 00 00 00 exit > > 0000000000000040 <foo2>: > ; { > 8: b7 00 00 00 00 00 00 00 r0 = 0x0 > ; asm volatile goto ("r0 = 0; \ > 9: b7 00 00 00 00 00 00 00 r0 = 0x0 > 10: 06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2> > 11: b7 02 00 00 02 00 00 00 r2 = 0x2 > 12: b7 03 00 00 03 00 00 00 r3 = 0x3 > ; a = bar(); > 13: 85 10 00 00 ff ff ff ff call -0x1 > ; b = 20 * a; > 14: 27 00 00 00 14 00 00 00 r0 *= 0x14 > > 0000000000000078 <LBB1_2>: > ; return b; > 15: 95 00 00 00 00 00 00 00 exit >
On Sun, Dec 10, 2023 at 7:33 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > How about a slightly different modification of the Anton's idea. > > Suppose that, as before, there is a special map type: > > > > struct { > > __uint(type, BPF_MAP_TYPE_ARRAY); > > __type(key, __u32); > > __type(value, __u32); > > __uint(map_flags, BPF_F_STATIC_KEY); > > __uint(max_entries, 1); > > } skey1 SEC(".maps") > > Instead of special map that the kernel has to know about > the same intent can be expressed with: > int skey1; > r0 = %[skey1] ll; > and then the kernel needs no extra map type while the user space > can collect all static_branches that use &skey1 by > iterating insn stream and comparing addresses. > > > Which is used as below: > > > > __attribute__((naked)) > > int foo(void) { > > asm volatile ( > > "r0 = %[skey1] ll;" > > "if r0 != r0 goto 1f;" > > "r1 = r10;" > > "r1 += -8;" > > "r2 = 1;" > > "call %[bpf_trace_printk];" > > "1:" > > "exit;" > > :: __imm_addr(skey1), > > __imm(bpf_trace_printk) > > : __clobber_all > > ); > > } > > > > Disassembly of section .text: > > > > 0000000000000000 <foo>: > > 0: r0 = 0x0 ll > > 0000000000000000: R_BPF_64_64 skey1 ;; <---- Map relocation as usual > > 2: if r0 == r0 goto +0x4 <foo+0x38> ;; <---- Note condition > > 3: r1 = r10 > > 4: r1 += -0x8 > > 5: r2 = 0x1 > > 6: call 0x6 > > 7: exit > > > > And suppose that verifier is modified in the following ways: > > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to > > static key map) in a special way: > > - when program is verified, the jump is considered non deterministic; > > - when program is jitted, the jump is compiled as nop for "!=" and as > > unconditional jump for "=="; > > - build a table of static keys based on a specific map referenced in > > condition, e.g. for the example above it can be inferred that insn 2 > > associates with map skey1 because "r0" points to "skey1"; > > - jit "rX = <static key> ll;" as nop; > > > > On the plus side: > > - any kinds of jump tables are omitted from system call; > > - no new instruction is needed; > > - almost no modifications to libbpf are necessary (only a helper macro > > to convince clang to keep "if rX == rX"); > > Reusing existing insn means that we're giving it new meaning > and that always comes with danger of breaking existing progs. > In this case if rX == rX isn't very meaningful and new semantics > shouldn't break anything, but it's a danger zone. > > If we treat: > if r0 == r0 > as JA > then we have to treat > if r1 == r1 > as JA as well and it becomes ambiguous when prog_info needs > to return the insns back to user space. > > If we go with rX == rX approach we should probably limit it > to one specific register. r0, r10, r11 can be considered > and they have their own pros and cons. > > Additional: > r0 = %[skey1] ll > in front of JE/JNE is a waste. If we JIT it to useless native insn > we will be burning cpu for no reason. So we should probably > optimize it out. If we do so, then this inline insn becomes a nop and > it's effectively a relocation. The insn stream will carry this > rX = 64bit_const insn to indicate the scope of the next insn. > It's pretty much like Anton's idea of using extra bits in JA > to encode an integer key_id. > With ld_imm64 we will encode 64-bit key_id. > Another insn with more bits to burn that has no effect on execution. > > It doesn't look clean to encode so much extra metadata into instructions > that JITs and the interpreter have to ignore. > If we go this route: > r11 = 64bit_const > if r11 == r11 goto > is a lesser evil. > Still, it's not as clean as JA with extra bits in src_reg. > We already optimize JA +0 into a nop. See opt_remove_nops(). > So a flavor of JA insn looks the most natural fit for a selectable > JA +xx or JA +0. Another point for special jump instruction is that libbpf can be taught to patch it into a normal JA or NOP on old kernels, depending on likely/unlikely bit. You won't be able to flip it, of course, but at least you don't have to compile two separate versions of the BPF object file. Instead of "jump maybe" it will transparently become "jump/don't jump for sure". This should definitely help with backwards compatibility. > > And the special map really doesn't fit. > Whatever we do, let's keep text_poke-able insn logic separate > from bookkeeping of addresses of those insns. > I think a special prefixed section that is understood by libbpf > (like what I proposed with "name.static_branch") will do fine. > If it's not good enough we can add a "set" map type > that will be a generic set of values. > It can be a set of 8-byte addresses to keep locations of static_branches, > but let's keep it generic. > I think it's fine to add: > __uint(type, BPF_MAP_TYPE_SET) > and let libbpf populate it with addresses of insns, > or address of variables, or other values > when it prepares a program for loading. > But map_update_elem should never be doing text_poke on insns. > We added prog_array map type is the past, but that was done > during the early days. If we were designing bpf today we would have > gone a different route.
On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > This looks for me that two bits aren't enough, and the third is > required, as the second bit seems to be overloaded: > > * bit 1 indicates that this is a "JA_MAYBE" > * bit 2 indicates a jump or nop (i.e., the current state) > > However, we also need another bit which indicates what to do with the > instruction when we issue [an abstract] command > > flip_branch_on_or_off(branch, 0/1) > > Without this information (and in the absense of external meta-data on > how to patch the branch) we can't determine what a given (BPF, not > jitted) program currently does. For example, if we issue > > flip_branch_on_or_off(branch, 0) > > then we can't reflect this in the xlated program by setting the second > bit to jmp/off. Again, JITted program is fine, but it will be > desynchronized from xlated in term of logic (some instructions will be > mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP). Not following the need for the 3rd bit. The 2nd bit is not only the initial state, but the current state too. when user space does static_branch_enable it will set the 2nd bit to 1 (if it wasn't set) and will text_poke_bp the code. xlated will be always in sync with JITed. No ambiguity. An annoying part is that bpf insn stream is read only, so we cannot really write that 2nd bit. We can virtually write it. Seeing nop or jmp in JITed code would mean the bit is 0 or 1. So xlated dump will report it. Separately the kernel doesn't have static_branch_switch/flip command. It's only enable or disable. I think it's better to keep it the same way.
On 12/11/23 9:31 AM, Anton Protopopov wrote: > On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote: >> On 12/9/23 9:18 AM, Alexei Starovoitov wrote: >>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote: >>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote: >>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko >>>>>>> <andrii.nakryiko@gmail.com> wrote: >>>>>>>> I feel like embedding some sort of ID inside the instruction is very.. >>>>>>>> unusual, shall we say? >>>>>>> yeah. no magic numbers inside insns pls. >>>>>>> >>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph, >>>>>>> while you probably meant CFG as configurable. >>>>>>> How about BPF_JA_OR_NOP ? >>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier >>>>>>> the insn->off|imm will be used. >>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp. >>>>>>> In asm it may look like asm("goto_or_nop +5") >>>>>> How does the C source code looks like in order to generate >>>>>> BPF_JA_OR_NOP insn? Any source examples? >>>>> It will be in inline asm only. The address of that insn will >>>>> be taken either via && or via asm (".long %l[label]"). >>>>> From llvm pov both should go through the same relo creation logic. I hope :) >>>> A hack in llvm below with an example, could you check whether the C >>>> syntax and object dump result >>>> is what you want to see? >>> Thank you for the ultra quick llvm diff! >>> >>>> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >>>> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >>>> index 90697c6645be..38b1cbc31f9a 100644 >>>> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >>>> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp >>>> @@ -231,6 +231,7 @@ public: >>>> .Case("call", true) >>>> .Case("goto", true) >>>> .Case("gotol", true) >>>> + .Case("goto_or_nop", true) >>>> .Case("*", true) >>>> .Case("exit", true) >>>> .Case("lock", true) >>>> @@ -259,6 +260,7 @@ public: >>>> .Case("bswap64", true) >>>> .Case("goto", true) >>>> .Case("gotol", true) >>>> + .Case("goto_or_nop", true) >>>> .Case("ll", true) >>>> .Case("skb", true) >>>> .Case("s", true) >>>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td >>>> b/llvm/lib/Target/BPF/BPFInstrInfo.td >>>> index 5972c9d49c51..a953d10429bf 100644 >>>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td >>>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td >>>> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, >>>> list<dag> Pattern> >>>> let BPFClass = BPF_JMP; >>>> } >>>> >>>> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> >>>> + : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, >>>> + (outs), >>>> + (ins brtarget:$BrDst), >>>> + !strconcat(OpcodeStr, " $BrDst"), >>>> + Pattern> { >>>> + bits<16> BrDst; >>>> + >>>> + let Inst{47-32} = BrDst; >>>> + let Inst{31-0} = 1; >>>> + let BPFClass = BPF_JMP; >>>> +} >>>> + >>>> class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern> >>>> : TYPE_ALU_JMP<Opc.Value, BPF_K.Value, >>>> (outs), >>>> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr> >>>> let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in { >>>> def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>; >>>> def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>; >>>> + def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>; >>> I was thinking of burning the new 0xE opcode for it, >>> but you're right. It's a flavor of existing JA insn and it's indeed >>> better to just use src_reg=1 bit to indicate so. >> Right, using src_reg to indicate a new flavor of JA insn sounds >> a good idea. My previously-used 'imm' field is a pure hack. >> >>> We probably need to use the 2nd bit of src_reg to indicate its default state >>> (jmp or fallthrough). >> Good point. >> >>>> asm volatile goto ("r0 = 0; \ >>>> goto_or_nop %l[label]; \ >>>> r2 = 2; \ >>>> r3 = 3; \ >>> Not sure how to represent the default state in assembly though. >>> "goto_or_nop" defaults to goto >>> "nop_or_goto" default to nop >>> ? >>> >>> Do we need "gotol" for imm32 or will it be automatic? >> It won't be automatic. >> >> At the end of this email, I will show the new change >> to have gotol_or_nop and nop_or_gotol insn and an example > Thanks a lot Yonghong! May I ask you to send a full patch for LLVM > (with gotol) so that I can test it? Okay, I will send a RFC patch to llvm-project so you can do 'git fetch' to get the patch into your local llvm-project repo and build a compiler to test. > > Overall, I think that JA + flags in SRC_REG is indeed better than a > new instruction, as a new code is not used. > > This looks for me that two bits aren't enough, and the third is > required, as the second bit seems to be overloaded: > > * bit 1 indicates that this is a "JA_MAYBE" > * bit 2 indicates a jump or nop (i.e., the current state) > > However, we also need another bit which indicates what to do with the > instruction when we issue [an abstract] command > > flip_branch_on_or_off(branch, 0/1) > > Without this information (and in the absense of external meta-data on > how to patch the branch) we can't determine what a given (BPF, not > jitted) program currently does. For example, if we issue > > flip_branch_on_or_off(branch, 0) > > then we can't reflect this in the xlated program by setting the second > bit to jmp/off. Again, JITted program is fine, but it will be > desynchronized from xlated in term of logic (some instructions will be > mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP). > > In my original patch we kept this triplet as > > (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse) [...]
On 12/11/23 1:51 PM, Yonghong Song wrote: > > On 12/11/23 9:31 AM, Anton Protopopov wrote: >> On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote: >>> On 12/9/23 9:18 AM, Alexei Starovoitov wrote: >>>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song >>>> <yonghong.song@linux.dev> wrote: >>>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote: >>>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song >>>>>> <yonghong.song@linux.dev> wrote: >>>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote: >>>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko >>>>>>>> <andrii.nakryiko@gmail.com> wrote: >>>>>>>>> I feel like embedding some sort of ID inside the instruction >>>>>>>>> is very.. >>>>>>>>> unusual, shall we say? >>>>>>>> yeah. no magic numbers inside insns pls. >>>>>>>> >>>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph, >>>>>>>> while you probably meant CFG as configurable. >>>>>>>> How about BPF_JA_OR_NOP ? >>>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier >>>>>>>> the insn->off|imm will be used. >>>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp. >>>>>>>> In asm it may look like asm("goto_or_nop +5") >>>>>>> How does the C source code looks like in order to generate >>>>>>> BPF_JA_OR_NOP insn? Any source examples? >>>>>> It will be in inline asm only. The address of that insn will >>>>>> be taken either via && or via asm (".long %l[label]"). >>>>>> From llvm pov both should go through the same relo creation >>>>>> logic. I hope :) >>>>> A hack in llvm below with an example, could you check whether the C >>>>> syntax and object dump result >>>>> is what you want to see? [...] >>>> Thank you for the ultra quick llvm diff! >>>> I was thinking of burning the new 0xE opcode for it, >>>> but you're right. It's a flavor of existing JA insn and it's indeed >>>> better to just use src_reg=1 bit to indicate so. >>> Right, using src_reg to indicate a new flavor of JA insn sounds >>> a good idea. My previously-used 'imm' field is a pure hack. >>> >>>> We probably need to use the 2nd bit of src_reg to indicate its >>>> default state >>>> (jmp or fallthrough). >>> Good point. >>> >>>>> asm volatile goto ("r0 = 0; \ >>>>> goto_or_nop %l[label]; \ >>>>> r2 = 2; \ >>>>> r3 = 3; \ >>>> Not sure how to represent the default state in assembly though. >>>> "goto_or_nop" defaults to goto >>>> "nop_or_goto" default to nop >>>> ? >>>> >>>> Do we need "gotol" for imm32 or will it be automatic? >>> It won't be automatic. >>> >>> At the end of this email, I will show the new change >>> to have gotol_or_nop and nop_or_gotol insn and an example >> Thanks a lot Yonghong! May I ask you to send a full patch for LLVM >> (with gotol) so that I can test it? > > Okay, I will send a RFC patch to llvm-project so you can do 'git fetch' > to get the patch into your local llvm-project repo and build a compiler > to test. Okay, the following is the llvm-project diff: https://github.com/llvm/llvm-project/pull/75110 I added a label in inline asm like below: |asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop %l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label); to identify the location of a gotol_or_nop/nop_or_gotol insn and the label '||||static_key_loc_1|' is in ELF symbol table can be easily searched and validated (the location is a gotol_or_nop/nop_or_gotol insn). > >> >> Overall, I think that JA + flags in SRC_REG is indeed better than a >> new instruction, as a new code is not used. >> >> This looks for me that two bits aren't enough, and the third is >> required, as the second bit seems to be overloaded: >> >> * bit 1 indicates that this is a "JA_MAYBE" >> * bit 2 indicates a jump or nop (i.e., the current state) >> >> However, we also need another bit which indicates what to do with the >> instruction when we issue [an abstract] command >> >> flip_branch_on_or_off(branch, 0/1) >> >> Without this information (and in the absense of external meta-data on >> how to patch the branch) we can't determine what a given (BPF, not >> jitted) program currently does. For example, if we issue >> >> flip_branch_on_or_off(branch, 0) >> >> then we can't reflect this in the xlated program by setting the second >> bit to jmp/off. Again, JITted program is fine, but it will be >> desynchronized from xlated in term of logic (some instructions will be >> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP). >> >> In my original patch we kept this triplet as >> >> (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse) > [...] >
On Mon, Dec 11, 2023 at 11:08:59AM -0800, Alexei Starovoitov wrote: > On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > > > > This looks for me that two bits aren't enough, and the third is > > required, as the second bit seems to be overloaded: > > > > * bit 1 indicates that this is a "JA_MAYBE" > > * bit 2 indicates a jump or nop (i.e., the current state) > > > > However, we also need another bit which indicates what to do with the > > instruction when we issue [an abstract] command > > > > flip_branch_on_or_off(branch, 0/1) > > > > Without this information (and in the absense of external meta-data on > > how to patch the branch) we can't determine what a given (BPF, not > > jitted) program currently does. For example, if we issue > > > > flip_branch_on_or_off(branch, 0) > > > > then we can't reflect this in the xlated program by setting the second > > bit to jmp/off. Again, JITted program is fine, but it will be > > desynchronized from xlated in term of logic (some instructions will be > > mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP). > > Not following the need for the 3rd bit. > The 2nd bit is not only the initial state, but the current state too. > > when user space does static_branch_enable it will set the 2nd bit to 1 > (if it wasn't set) and will text_poke_bp the code. > xlated will be always in sync with JITed. > No ambiguity. Ok, from BPF arch perspective this can work with two bits (not for practical purposes though, IMO, see my next e-mail). On the lowest level we have this magic jump instruction JA[SRC_REG=1] +OFF # jits to a NOP JA[SRC_REG=3] +OFF # jits to a JUMP Then we have a primitive kfunc static_branch_set(prog, branch, bool on) Which sets the second bit and pokes jitted code. (Maybe it doesn't set the actual bit in xlated due to read-only memory, as you've mentioned below.) You're right that this is unambiguous. > An annoying part is that bpf insn stream is read only, so we cannot > really write that 2nd bit. We can virtually write it. > Seeing nop or jmp in JITed code would mean the bit is 0 or 1. > So xlated dump will report it. If we can poke jitted x86 code, then we might poke prog->insnsi in the same way as well? Besides, on architectures where static keys may be of interest we don't use interpreter, so maybe this is ok to poke insnsi (and make it rw) after all? > Separately the kernel doesn't have static_branch_switch/flip command. > It's only enable or disable. I think it's better to keep it the same way.
On Sun, Dec 10, 2023 at 07:33:31PM -0800, Alexei Starovoitov wrote: > On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > How about a slightly different modification of the Anton's idea. > > Suppose that, as before, there is a special map type: > > > > struct { > > __uint(type, BPF_MAP_TYPE_ARRAY); > > __type(key, __u32); > > __type(value, __u32); > > __uint(map_flags, BPF_F_STATIC_KEY); > > __uint(max_entries, 1); > > } skey1 SEC(".maps") > > Instead of special map that the kernel has to know about > the same intent can be expressed with: > int skey1; > r0 = %[skey1] ll; > and then the kernel needs no extra map type while the user space > can collect all static_branches that use &skey1 by > iterating insn stream and comparing addresses. > > > Which is used as below: > > > > __attribute__((naked)) > > int foo(void) { > > asm volatile ( > > "r0 = %[skey1] ll;" > > "if r0 != r0 goto 1f;" > > "r1 = r10;" > > "r1 += -8;" > > "r2 = 1;" > > "call %[bpf_trace_printk];" > > "1:" > > "exit;" > > :: __imm_addr(skey1), > > __imm(bpf_trace_printk) > > : __clobber_all > > ); > > } > > > > Disassembly of section .text: > > > > 0000000000000000 <foo>: > > 0: r0 = 0x0 ll > > 0000000000000000: R_BPF_64_64 skey1 ;; <---- Map relocation as usual > > 2: if r0 == r0 goto +0x4 <foo+0x38> ;; <---- Note condition > > 3: r1 = r10 > > 4: r1 += -0x8 > > 5: r2 = 0x1 > > 6: call 0x6 > > 7: exit > > > > And suppose that verifier is modified in the following ways: > > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to > > static key map) in a special way: > > - when program is verified, the jump is considered non deterministic; > > - when program is jitted, the jump is compiled as nop for "!=" and as > > unconditional jump for "=="; > > - build a table of static keys based on a specific map referenced in > > condition, e.g. for the example above it can be inferred that insn 2 > > associates with map skey1 because "r0" points to "skey1"; > > - jit "rX = <static key> ll;" as nop; > > > > On the plus side: > > - any kinds of jump tables are omitted from system call; > > - no new instruction is needed; > > - almost no modifications to libbpf are necessary (only a helper macro > > to convince clang to keep "if rX == rX"); > > Reusing existing insn means that we're giving it new meaning > and that always comes with danger of breaking existing progs. > In this case if rX == rX isn't very meaningful and new semantics > shouldn't break anything, but it's a danger zone. > > If we treat: > if r0 == r0 > as JA > then we have to treat > if r1 == r1 > as JA as well and it becomes ambiguous when prog_info needs > to return the insns back to user space. > > If we go with rX == rX approach we should probably limit it > to one specific register. r0, r10, r11 can be considered > and they have their own pros and cons. > > Additional: > r0 = %[skey1] ll > in front of JE/JNE is a waste. If we JIT it to useless native insn > we will be burning cpu for no reason. So we should probably > optimize it out. If we do so, then this inline insn becomes a nop and > it's effectively a relocation. The insn stream will carry this > rX = 64bit_const insn to indicate the scope of the next insn. > It's pretty much like Anton's idea of using extra bits in JA > to encode an integer key_id. > With ld_imm64 we will encode 64-bit key_id. > Another insn with more bits to burn that has no effect on execution. > > It doesn't look clean to encode so much extra metadata into instructions > that JITs and the interpreter have to ignore. > If we go this route: > r11 = 64bit_const > if r11 == r11 goto > is a lesser evil. > Still, it's not as clean as JA with extra bits in src_reg. > We already optimize JA +0 into a nop. See opt_remove_nops(). > So a flavor of JA insn looks the most natural fit for a selectable > JA +xx or JA +0. This seems to have a benefit that there is no back compatibility issue (if we use r1, because r0/r11 will be rejected by old verifiers). We can have r1 = 64bit_const if r1 == r1 goto and r1 = 64bit_const if r1 != r1 goto and translate it on prog load to new instruction as JUMP_OF_NOP and NOP_OR_JUMP, correspondingly. On older kernels it will have the default (key is off) behaviour. > And the special map really doesn't fit. > Whatever we do, let's keep text_poke-able insn logic separate > from bookkeeping of addresses of those insns. > I think a special prefixed section that is understood by libbpf > (like what I proposed with "name.static_branch") will do fine. > If it's not good enough we can add a "set" map type > that will be a generic set of values. > It can be a set of 8-byte addresses to keep locations of static_branches, > but let's keep it generic. > I think it's fine to add: > __uint(type, BPF_MAP_TYPE_SET) > and let libbpf populate it with addresses of insns, > or address of variables, or other values > when it prepares a program for loading. What is the higher-level API in this case? The static_branch_set(branch, bool on) is not enough because we want to distinguish between "normal" and "inverse" branches (for unlikely/likely cases). We can implement this using something like this: static_key_set(key, bool new_value) { /* true if we change key value */ bool key_changed = key->old_value ^ new_value; for_each_prog(prog, key) for_each_branch(branch, prog, key) static_branch_flip(prog, branch, key_changed) } where static_branch_flip flips the second bit of SRC_REG. We need to keep track of prog->branches and key->progs. How is this different from what my patch implements? If this is implemented in userspace, then how we prevent synchronous updates of the key (and a relocation variant doesn't seem to work from userspace)? Or is this a new kfunc? If yes, then how is it executed, > But map_update_elem should never be doing text_poke on insns. > We added prog_array map type is the past, but that was done > during the early days. If we were designing bpf today we would have > gone a different route. What is the interface to toggle a key? If this is a map or an index of a global variable and we can't do this via a syscall, then this means that we need to write and compile an iterator to go through all maps, select a proper map, then code should be generated to go through all programs of interest and to update all static branches there (and we will have to build this iterator every time, and keep track of all mappings from userspace), while an alternative is to just issue one syscall. If this is just a `kfunc set_static_key(key, on/off)` then this is simpler, but again, we will have to load an iterator and then iterate through all maps to find the key vs. one syscall.
On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > This seems to have a benefit that there is no back compatibility issue > (if we use r1, because r0/r11 will be rejected by old verifiers). We > can have > > r1 = 64bit_const > if r1 == r1 goto > > and > > r1 = 64bit_const > if r1 != r1 goto > > and translate it on prog load to new instruction as JUMP_OF_NOP and > NOP_OR_JUMP, correspondingly. On older kernels it will have the > default (key is off) behaviour. As Andrii pointed out any new insn either JA with extra bits or special meaning if rX == rX can be sanitized by libbpf into plain JA. There will be no backward compat issues. > Ok, from BPF arch perspective this can work with two bits (not for > practical purposes though, IMO, see my next e-mail). I read this email and I still don't understand why you need a 3rd bit. > > > And the special map really doesn't fit. > > Whatever we do, let's keep text_poke-able insn logic separate > > from bookkeeping of addresses of those insns. > > I think a special prefixed section that is understood by libbpf > > (like what I proposed with "name.static_branch") will do fine. > > If it's not good enough we can add a "set" map type > > that will be a generic set of values. > > It can be a set of 8-byte addresses to keep locations of static_branches, > > but let's keep it generic. > > I think it's fine to add: > > __uint(type, BPF_MAP_TYPE_SET) > > and let libbpf populate it with addresses of insns, > > or address of variables, or other values > > when it prepares a program for loading. > > What is the higher-level API in this case? The static_branch_set(branch, > bool on) is not enough because we want to distinguish between "normal" > and "inverse" branches (for unlikely/likely cases). What is "likely/unlikely cases" ? likely() is a hint to the compiler to order basic blocks in a certain way. There is no likely/unlikely bit in the binary code after compilation on x86 or other architectures. There used to be a special bit on sparc64 that would mean a default jmp|fallthrough action for a conditional jmp. But that was before sparc became out of order and gained proper branch predictor in HW. > We can implement > this using something like this: > > static_key_set(key, bool new_value) > { > /* true if we change key value */ > bool key_changed = key->old_value ^ new_value; > > for_each_prog(prog, key) > for_each_branch(branch, prog, key) > static_branch_flip(prog, branch, key_changed) > } > > where static_branch_flip flips the second bit of SRC_REG. I don't understand why you keep bringing up 'flip' use case. The kernel doesn't have such an operation on static branches. Which makes me believe that it wasn't necessary. Why do we need one for the bpf static branch? > We need to > keep track of prog->branches and key->progs. How is this different > from what my patch implements? What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET) and by naming convention libbpf will populate it with addresses of JA_OR_NOP from all progs. In asm it could be: asm volatile ("r0 = %[set_A] ll; goto_or_nop ..."); (and libbpf will remove ld_imm64 from the prog before loading.) or via asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long"); (and libbpf will copy from the special section into a set and remove special section). It will be a libbpf convention and the kernel doesn't need to know about a special static branch map type or array of addresses in prog_load cmd. Only JA insn is relevant to the verifier and JITs. Ideally we don't need to introduce SET map type and libbpf wouldn't need to populate it. If we can make it work with an array of values that .pushsection + .long automatically populates and libbpf treats it as a normal global data array that would be ideal. Insn addresses from all progs will be in that array after loading. Sort of like ".kconfig" section that libbpf populates, but it's a normal array underneath. > If this is implemented in userspace, then how we prevent synchronous > updates of the key (and a relocation variant doesn't seem to work from > userspace)? Or is this a new kfunc? If yes, then how is it > executed, then user space can have small helper in libbpf that iterates over SET (or array) and calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set) Similar in the kernel. When bpf progs want to enable a key it does bpf_for_each(set) { // open coded iterator bpf_static_branch_enable(addr); // kfunc call }
On 12/13/23 6:15 PM, Alexei Starovoitov wrote: > On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote: >> >> This seems to have a benefit that there is no back compatibility issue >> (if we use r1, because r0/r11 will be rejected by old verifiers). We >> can have >> >> r1 = 64bit_const >> if r1 == r1 goto >> >> and >> >> r1 = 64bit_const >> if r1 != r1 goto >> >> and translate it on prog load to new instruction as JUMP_OF_NOP and >> NOP_OR_JUMP, correspondingly. On older kernels it will have the >> default (key is off) behaviour. > As Andrii pointed out any new insn either JA with extra bits > or special meaning if rX == rX can be sanitized by libbpf > into plain JA. > There will be no backward compat issues. > >> Ok, from BPF arch perspective this can work with two bits (not for >> practical purposes though, IMO, see my next e-mail). > I read this email and I still don't understand why you need a 3rd bit. > >>> And the special map really doesn't fit. >>> Whatever we do, let's keep text_poke-able insn logic separate >>> from bookkeeping of addresses of those insns. >>> I think a special prefixed section that is understood by libbpf >>> (like what I proposed with "name.static_branch") will do fine. >>> If it's not good enough we can add a "set" map type >>> that will be a generic set of values. >>> It can be a set of 8-byte addresses to keep locations of static_branches, >>> but let's keep it generic. >>> I think it's fine to add: >>> __uint(type, BPF_MAP_TYPE_SET) >>> and let libbpf populate it with addresses of insns, >>> or address of variables, or other values >>> when it prepares a program for loading. >> What is the higher-level API in this case? The static_branch_set(branch, >> bool on) is not enough because we want to distinguish between "normal" >> and "inverse" branches (for unlikely/likely cases). > What is "likely/unlikely cases" ? > likely() is a hint to the compiler to order basic blocks in > a certain way. There is no likely/unlikely bit in the binary code > after compilation on x86 or other architectures. > > There used to be a special bit on sparc64 that would mean > a default jmp|fallthrough action for a conditional jmp. > But that was before sparc became out of order and gained proper > branch predictor in HW. > >> We can implement >> this using something like this: >> >> static_key_set(key, bool new_value) >> { >> /* true if we change key value */ >> bool key_changed = key->old_value ^ new_value; >> >> for_each_prog(prog, key) >> for_each_branch(branch, prog, key) >> static_branch_flip(prog, branch, key_changed) >> } >> >> where static_branch_flip flips the second bit of SRC_REG. > I don't understand why you keep bringing up 'flip' use case. > The kernel doesn't have such an operation on static branches. > Which makes me believe that it wasn't necessary. > Why do we need one for the bpf static branch? > >> We need to >> keep track of prog->branches and key->progs. How is this different >> from what my patch implements? > What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET) > and by naming convention libbpf will populate it with addresses > of JA_OR_NOP from all progs. > In asm it could be: > asm volatile ("r0 = %[set_A] ll; goto_or_nop ..."); > (and libbpf will remove ld_imm64 from the prog before loading.) > > or via > asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long"); > (and libbpf will copy from the special section into a set and remove > special section). This is one more alternative. |asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop %l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label);| User code can use libbpf API static_key_enable("|static_key_loc_1|") to enable the above gotol_or_nop. static_key_disable("static_key_loc_1") or static_key_enabled("static_key_loc_1") can be similary defined. Inside the libbpf, it can look at ELF file, find label "static_key_loc_1", and also find label's corresponding insn index and verify that the insn with label "static_key_loc_1" must be a gotol_or_nop or nop_or_gotol insn. Eventually libbpf can call a bpf syscall e.g. sys_bpf(cmd=STATIC_KEY_ENABLE, prog_fd, insn_offset) to actually change the "current" state to gotol or nop depending on what ENABLE means. For enable/disable static keys in the bpf program itself, similary, bpf program can have bpf_static_key_enable("static_key_loc_1"), the libbpf needs to change it to bpf_static_key_enable(insn_offset) and kernel verifier should process it properly. Slightly different from what Alexei proposed, but another approach for consideration and discussion. > > It will be a libbpf convention and the kernel doesn't need > to know about a special static branch map type or array of addresses > in prog_load cmd. > Only JA insn is relevant to the verifier and JITs. > > Ideally we don't need to introduce SET map type and > libbpf wouldn't need to populate it. > If we can make it work with an array of values that .pushsection + .long > automatically populates and libbpf treats it as a normal global data array > that would be ideal. > Insn addresses from all progs will be in that array after loading. > Sort of like ".kconfig" section that libbpf populates, > but it's a normal array underneath. > >> If this is implemented in userspace, then how we prevent synchronous >> updates of the key (and a relocation variant doesn't seem to work from >> userspace)? Or is this a new kfunc? If yes, then how is it >> executed, > then user space can have small helper in libbpf that iterates > over SET (or array) and > calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set) > > Similar in the kernel. When bpf progs want to enable a key it does > bpf_for_each(set) { // open coded iterator > bpf_static_branch_enable(addr); // kfunc call > }
On Wed, Dec 13, 2023 at 06:15:20PM -0800, Alexei Starovoitov wrote: > On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote: > > > > > > This seems to have a benefit that there is no back compatibility issue > > (if we use r1, because r0/r11 will be rejected by old verifiers). We > > can have > > > > r1 = 64bit_const > > if r1 == r1 goto > > > > and > > > > r1 = 64bit_const > > if r1 != r1 goto > > > > and translate it on prog load to new instruction as JUMP_OF_NOP and > > NOP_OR_JUMP, correspondingly. On older kernels it will have the > > default (key is off) behaviour. > > As Andrii pointed out any new insn either JA with extra bits > or special meaning if rX == rX can be sanitized by libbpf > into plain JA. > There will be no backward compat issues. > > > Ok, from BPF arch perspective this can work with two bits (not for > > practical purposes though, IMO, see my next e-mail). > > I read this email and I still don't understand why you need a 3rd bit. > > > > > > And the special map really doesn't fit. > > > Whatever we do, let's keep text_poke-able insn logic separate > > > from bookkeeping of addresses of those insns. > > > I think a special prefixed section that is understood by libbpf > > > (like what I proposed with "name.static_branch") will do fine. > > > If it's not good enough we can add a "set" map type > > > that will be a generic set of values. > > > It can be a set of 8-byte addresses to keep locations of static_branches, > > > but let's keep it generic. > > > I think it's fine to add: > > > __uint(type, BPF_MAP_TYPE_SET) > > > and let libbpf populate it with addresses of insns, > > > or address of variables, or other values > > > when it prepares a program for loading. > > > > What is the higher-level API in this case? The static_branch_set(branch, > > bool on) is not enough because we want to distinguish between "normal" > > and "inverse" branches (for unlikely/likely cases). > > What is "likely/unlikely cases" ? > likely() is a hint to the compiler to order basic blocks in > a certain way. There is no likely/unlikely bit in the binary code > after compilation on x86 or other architectures. > > There used to be a special bit on sparc64 that would mean > a default jmp|fallthrough action for a conditional jmp. > But that was before sparc became out of order and gained proper > branch predictor in HW. Consider this code: int foo(void) { if (static_branch_likely(&key)) return 33; return 55; } int hoo(void) { if (static_branch_unlikely(&key)) return 33; return 55; } When the key is disabled the corresponding code is: 0000000000000010 <foo>: 19: eb 0b jmp 26 <foo+0x16> <-------- likely(static branch), key off 1b: b8 21 00 00 00 mov $0x21,%eax 20: 5d pop %rbp 21: e9 00 00 00 00 jmp 26 <foo+0x16> 26: b8 37 00 00 00 mov $0x37,%eax 2b: 5d pop %rbp 2c: e9 00 00 00 00 jmp 31 <foo+0x21> 31: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 38: 00 00 00 00 3c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000000050 <hoo>: <hoo>: 59: 66 90 xchg %ax,%ax <-------------- unlikely(static branch), key off 5b: b8 37 00 00 00 mov $0x37,%eax 60: 5d pop %rbp 61: e9 00 00 00 00 jmp 66 <hoo+0x16> 66: b8 21 00 00 00 mov $0x21,%eax 6b: 5d pop %rbp 6c: e9 00 00 00 00 jmp 71 <__UNIQUE_ID_vermagic248+0x5> When the key is enabled, the code is: 0000000000000010 <foo>: 19: 66 90 xchg %ax,%ax <--------------- likely(static branch), key on 1b: b8 21 00 00 00 mov $0x21,%eax 20: 5d pop %rbp 21: e9 00 00 00 00 jmp 26 <foo+0x16> 26: b8 37 00 00 00 mov $0x37,%eax 2b: 5d pop %rbp 2c: e9 00 00 00 00 jmp 31 <foo+0x21> 31: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 38: 00 00 00 00 3c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000000050 <hoo>: 59: eb 0b jmp 66 <hoo+0x16> <--------- unlikely(static branch), key on 5b: b8 37 00 00 00 mov $0x37,%eax 60: 5d pop %rbp 61: e9 00 00 00 00 jmp 66 <hoo+0x16> 66: b8 21 00 00 00 mov $0x21,%eax 6b: 5d pop %rbp 6c: e9 00 00 00 00 jmp 71 <__UNIQUE_ID_vermagic248+0x5> So, for the likely case we set branch to JUMP/NOP when key is OFF/ON. And for the unlikely case we set branch to NOP/JUMP when key is OFF/ON. The kernel keeps this information, and when static_key_enable(key) is executed, it does [simplified for reading] for (entry = key->start; entry < stop; entry++) arch_jump_label_transform(entry, jump_label_type(entry)); this jump_label_type() contains this bit of information: are we writing NOP or JUMP for an enabled key. Same for static_key_disable(key). Now, for BPF we do static_branch_enable(branch). To generate proper JITed code, we have enough of information (NOP=0, JUMP=2): static_branch_enable(JA[SRC=1|NOP]) jits to ARCH_JUMP static_branch_enable(JA[SRC=1|JUMP]) jits to ARCH_NOP static_branch_disable(JA[SRC=1|NOP]) jits to ARCH_NOP static_branch_disable(JA[SRC=1|JUMP]) jits to ARCH_JUMP But how do we represent this in xlated code to user? Do we patch the xlated code? If we do, then static_branch_enable changes JA[SRC=1|NOP] to JA[SRC=1|JUMP], ARCH_JUMP generated static_branch_disable sees JA[SRC=1|JUMP], changes it to JA[SRC=1|NOP], but ARCH_JUMP is generated or what about two static_branch_enable on the same branch? By flipping I meant that we always do JA[SRC=1|NOP] jits to ARCH_NOP JA[SRC=1|JUMP] jits to ARCH_JUMP the primitive operation is static_branch_flip which changes JA[SRC=1|NOP] to JA[SRC=1|JUMP] and vice versa. Then for static_key_enable(key) we flip all the branches if key was disabled and do nothing otherwise. Same for static_key_enable. What you've proposed before is to keep this "third bit" in xlated+jitted form. Basically, we check the state of the branch "on/off" like this: say, we see that xlated/jitted state of a branch is JA[SRC=1|NOP] and ARCH_JUMP, then we can say that this branch is on. How do we report it to user in PROG_INFO to we set the instruction to JA[SRC=1|JUMP] in output, specifying that its current stae is to jump? This would work, I think. > > We can implement > > this using something like this: > > > > static_key_set(key, bool new_value) > > { > > /* true if we change key value */ > > bool key_changed = key->old_value ^ new_value; > > > > for_each_prog(prog, key) > > for_each_branch(branch, prog, key) > > static_branch_flip(prog, branch, key_changed) > > } > > > > where static_branch_flip flips the second bit of SRC_REG. > > I don't understand why you keep bringing up 'flip' use case. > The kernel doesn't have such an operation on static branches. > Which makes me believe that it wasn't necessary. > Why do we need one for the bpf static branch? See above. > > We need to > > keep track of prog->branches and key->progs. How is this different > > from what my patch implements? > > What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET) > and by naming convention libbpf will populate it with addresses > of JA_OR_NOP from all progs. > In asm it could be: > asm volatile ("r0 = %[set_A] ll; goto_or_nop ..."); > (and libbpf will remove ld_imm64 from the prog before loading.) > > or via > asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long"); > (and libbpf will copy from the special section into a set and remove > special section). > > It will be a libbpf convention and the kernel doesn't need > to know about a special static branch map type or array of addresses > in prog_load cmd. > Only JA insn is relevant to the verifier and JITs. > > Ideally we don't need to introduce SET map type and > libbpf wouldn't need to populate it. > If we can make it work with an array of values that .pushsection + .long > automatically populates and libbpf treats it as a normal global data array > that would be ideal. > Insn addresses from all progs will be in that array after loading. > Sort of like ".kconfig" section that libbpf populates, > but it's a normal array underneath. This doesn't work without the kernel side, as we change instructions offsets inside the verifier when, e.g., converting helper calls to individual instructions, etc. Otherwise, this is more-or-less what my patch does: get a list of offsets of banches inside the program, associate them with a map (or ID of any kind), and then push to kernel. And if you need to keep this list in kernel, then it looks like we can keep it and access per key, not in a loop... Alternative is to encode key ID in the instruction in some form so that it is visible in xlated code after the program is loaded. But in this case this makes users responsible to writing iterators and bookkeeping all relationships vs. just loading a program with a map fd, which is a well-established api in BPF. > > If this is implemented in userspace, then how we prevent synchronous > > updates of the key (and a relocation variant doesn't seem to work from > > userspace)? Or is this a new kfunc? If yes, then how is it > > executed, > > then user space can have small helper in libbpf that iterates > over SET (or array) and > calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set) > > Similar in the kernel. When bpf progs want to enable a key it does > bpf_for_each(set) { // open coded iterator > bpf_static_branch_enable(addr); // kfunc call > } Is this possible to use map like it is used in my patch, but do sys_bpf(cmd=STATIC_KEY_ENABLE, attr.map_fd) so that it specifically separated from map_update_value? (If it was not a map, but just some "bpf object"?)
On Wed, 2023-12-13 at 19:04 -0800, Yonghong Song wrote: > Slightly different from what Alexei proposed, but another approach > for consideration and discussion. It looks like Alexei / Yonghong focus on individual addresses, while as far as I understand Anton's original proposal is more about program wide switches. E.g. there is some "DEBUG" flag and from application point of view it does not matter how many static branches are enabled or disabled by this flag. In a sense, there is one-to-one vs one-to-many control granularity. Here is an additional variation for one-to-many approach, basically a hybrid of Anton's original idea and Andrii's suggestion to use relocations. Suppose there is a special type of maps :) (it does not really matter if it is a map or not, what matters is that this object has an FD and libbpf knows how to relocate it when preparing program for load): struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, __u32); __type(value, __u32); __uint(map_flags, BPF_F_STATIC_KEY); __uint(max_entries, 1); } skey1 SEC(".maps") And a new instruction, that accepts an fd/pointer corresponding to this map as a parameter + a label to which to jump if static key is enabled. E.g., as below: __attribute__((naked)) int foo(void) { asm volatile ( "if static %[skey1] goto 1f;" // hypthetical syntax "r1 = r10;" "r1 += -8;" "r2 = 1;" "call %[bpf_trace_printk];" "1:" "exit;" :: __imm_addr(skey1), __imm(bpf_trace_printk) : __clobber_all ); } This "if static" is relocated by libbpf in order to get the correct map fd, when program is loaded. In effect, each static key is a single entry map with one to many relationship to instructions / programs it controls. It can be enabled or disabled using either map update functions or new system call commands. What I think is a plus with such approach: - The application is free to decide how it wants to organize these FDs: - use one fd per condition; - have an fd per program; - have an fd per set of programs. - Embedding fd with instruction removes need to communicate mapping information back and forth, or track special section kinds.
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 9dc9625651dc..f67d6a4dac05 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -232,7 +232,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, size_t insn_cnt, struct bpf_prog_load_opts *opts) { - const size_t attr_sz = offsetofend(union bpf_attr, log_true_size); + const size_t attr_sz = offsetofend(union bpf_attr, static_branches_info_size); void *finfo = NULL, *linfo = NULL; const char *func_info, *line_info; __u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd; @@ -262,6 +262,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type, attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0); attr.kern_version = OPTS_GET(opts, kern_version, 0); + attr.static_branches_info = ptr_to_u64(OPTS_GET(opts, static_branches_info, NULL)); + attr.static_branches_info_size = OPTS_GET(opts, static_branches_info_size, 0); + if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME)) libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name)); attr.license = ptr_to_u64(license); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index d0f53772bdc0..ec6d4b955fb8 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -102,9 +102,11 @@ struct bpf_prog_load_opts { * If kernel doesn't support this feature, log_size is left unchanged. */ __u32 log_true_size; + struct bpf_static_branch_info *static_branches_info; + __u32 static_branches_info_size; size_t :0; }; -#define bpf_prog_load_opts__last_field log_true_size +#define bpf_prog_load_opts__last_field static_branches_info_size LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type, const char *prog_name, const char *license, diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index 77ceea575dc7..e3bfa0697304 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -400,4 +400,68 @@ extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym; ) #endif /* bpf_repeat */ +#define DEFINE_STATIC_KEY(NAME) \ + struct { \ + __uint(type, BPF_MAP_TYPE_ARRAY); \ + __type(key, __u32); \ + __type(value, __u32); \ + __uint(map_flags, BPF_F_STATIC_KEY); \ + __uint(max_entries, 1); \ + } NAME SEC(".maps") + +#ifndef likely +#define likely(x) (__builtin_expect(!!(x), 1)) +#endif + +#ifndef unlikely +#define unlikely(x) (__builtin_expect(!!(x), 0)) +#endif + +static __always_inline int __bpf_static_branch_nop(void *static_key) +{ + asm goto("1:\n\t" + "goto +0\n\t" + ".pushsection .jump_table, \"aw\"\n\t" + ".balign 8\n\t" + ".long 1b - .\n\t" + ".long %l[l_yes] - .\n\t" + ".quad %c0 - .\n\t" + ".popsection\n\t" + :: "i" (static_key) + :: l_yes); + return 0; +l_yes: + return 1; +} + +static __always_inline int __bpf_static_branch_jump(void *static_key) +{ + asm goto("1:\n\t" + "goto %l[l_yes]\n\t" + ".pushsection .jump_table, \"aw\"\n\t" + ".balign 8\n\t" + ".long 1b - .\n\t" + ".long %l[l_yes] - .\n\t" + ".quad %c0 - . + 1\n\t" + ".popsection\n\t" + :: "i" (static_key) + :: l_yes); + return 0; +l_yes: + return 1; +} + +/* + * The bpf_static_branch_{unlikely,likely} macros provide a way to utilize BPF + * Static Keys in BPF programs in exactly the same manner this is done in the + * Linux Kernel. The "unlikely" macro compiles in the code where the else-branch + * (key is off) is prioritized, the "likely" macro prioritises the if-branch. + */ + +#define bpf_static_branch_unlikely(static_key) \ + unlikely(__bpf_static_branch_nop(static_key)) + +#define bpf_static_branch_likely(static_key) \ + likely(!__bpf_static_branch_jump(static_key)) + #endif diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e067be95da3c..92620717abda 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -391,6 +391,13 @@ struct bpf_sec_def { libbpf_prog_attach_fn_t prog_attach_fn; }; +struct static_branch_info { + struct bpf_map *map; + __u32 insn_offset; + __u32 jump_target; + __u32 flags; +}; + /* * bpf_prog should be a better name but it has been used in * linux/filter.h. @@ -463,6 +470,9 @@ struct bpf_program { __u32 line_info_rec_size; __u32 line_info_cnt; __u32 prog_flags; + + struct static_branch_info *static_branches_info; + __u32 static_branches_info_size; }; struct bpf_struct_ops { @@ -493,6 +503,7 @@ struct bpf_struct_ops { #define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" #define STRUCT_OPS_LINK_SEC ".struct_ops.link" +#define STATIC_JUMPS_SEC ".jump_table" enum libbpf_map_type { LIBBPF_MAP_UNSPEC, @@ -624,6 +635,7 @@ struct elf_state { Elf_Data *symbols; Elf_Data *st_ops_data; Elf_Data *st_ops_link_data; + Elf_Data *static_branches_data; size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct elf_sec_desc *secs; @@ -634,6 +646,7 @@ struct elf_state { int symbols_shndx; int st_ops_shndx; int st_ops_link_shndx; + int static_branches_shndx; }; struct usdt_manager; @@ -715,6 +728,7 @@ void bpf_program__unload(struct bpf_program *prog) zfree(&prog->func_info); zfree(&prog->line_info); + zfree(&prog->static_branches_info); } static void bpf_program__exit(struct bpf_program *prog) @@ -3605,6 +3619,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) { obj->efile.st_ops_link_data = data; obj->efile.st_ops_link_shndx = idx; + } else if (strcmp(name, STATIC_JUMPS_SEC) == 0) { + obj->efile.static_branches_data = data; + obj->efile.static_branches_shndx = idx; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -3620,7 +3637,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) if (!section_have_execinstr(obj, targ_sec_idx) && strcmp(name, ".rel" STRUCT_OPS_SEC) && strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) && - strcmp(name, ".rel" MAPS_ELF_SEC)) { + strcmp(name, ".rel" MAPS_ELF_SEC) && + strcmp(name, ".rel" STATIC_JUMPS_SEC)) { pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", idx, name, targ_sec_idx, elf_sec_name(obj, elf_sec_by_idx(obj, targ_sec_idx)) ?: "<?>"); @@ -4422,6 +4440,189 @@ bpf_object__collect_prog_relos(struct bpf_object *obj, Elf64_Shdr *shdr, Elf_Dat return 0; } +struct jump_table_entry { + __u32 insn_offset; + __u32 jump_target; + union { + __u64 map_ptr; /* map_ptr is always zero, as it is relocated */ + __u64 flags; /* so we can reuse it to store flags */ + }; +}; + +static struct bpf_program *shndx_to_prog(struct bpf_object *obj, + size_t sec_idx, + struct jump_table_entry *entry) +{ + __u32 insn_offset = entry->insn_offset / 8; + __u32 jump_target = entry->jump_target / 8; + struct bpf_program *prog; + size_t i; + + for (i = 0; i < obj->nr_programs; i++) { + prog = &obj->programs[i]; + if (prog->sec_idx != sec_idx) + continue; + + if (insn_offset < prog->sec_insn_off || + insn_offset >= prog->sec_insn_off + prog->sec_insn_cnt) + continue; + + if (jump_target < prog->sec_insn_off || + jump_target >= prog->sec_insn_off + prog->sec_insn_cnt) { + pr_warn("static branch: offset %u is in boundaries, target %u is not\n", + insn_offset, jump_target); + return NULL; + } + + return prog; + } + + return NULL; +} + +static struct bpf_program *find_prog_for_jump_entry(struct bpf_object *obj, + int nrels, + Elf_Data *relo_data, + __u32 entry_offset, + struct jump_table_entry *entry) +{ + struct bpf_program *prog; + Elf64_Rel *rel; + Elf64_Sym *sym; + int i; + + for (i = 0; i < nrels; i++) { + rel = elf_rel_by_idx(relo_data, i); + if (!rel) { + pr_warn("static branch: relo #%d: failed to get ELF relo\n", i); + return ERR_PTR(-LIBBPF_ERRNO__FORMAT); + } + + if ((__u32)rel->r_offset != entry_offset) + continue; + + sym = elf_sym_by_idx(obj, ELF64_R_SYM(rel->r_info)); + if (!sym) { + pr_warn("static branch: .maps relo #%d: symbol %zx not found\n", + i, (size_t)ELF64_R_SYM(rel->r_info)); + return ERR_PTR(-LIBBPF_ERRNO__FORMAT); + } + + prog = shndx_to_prog(obj, sym->st_shndx, entry); + if (!prog) { + pr_warn("static branch: .maps relo #%d: program %zx not found\n", + i, (size_t)sym->st_shndx); + return ERR_PTR(-LIBBPF_ERRNO__FORMAT); + } + return prog; + } + return ERR_PTR(-LIBBPF_ERRNO__FORMAT); +} + +static struct bpf_map *find_map_for_jump_entry(struct bpf_object *obj, + int nrels, + Elf_Data *relo_data, + __u32 entry_offset) +{ + struct bpf_map *map; + const char *name; + Elf64_Rel *rel; + Elf64_Sym *sym; + int i; + + for (i = 0; i < nrels; i++) { + rel = elf_rel_by_idx(relo_data, i); + if (!rel) { + pr_warn("static branch: relo #%d: failed to get ELF relo\n", i); + return NULL; + } + + if ((__u32)rel->r_offset != entry_offset) + continue; + + sym = elf_sym_by_idx(obj, ELF64_R_SYM(rel->r_info)); + if (!sym) { + pr_warn(".maps relo #%d: symbol %zx not found\n", + i, (size_t)ELF64_R_SYM(rel->r_info)); + return NULL; + } + + name = elf_sym_str(obj, sym->st_name) ?: "<?>"; + if (!name || !strcmp(name, "")) { + pr_warn(".maps relo #%d: symbol name is zero or empty\n", i); + return NULL; + } + + map = bpf_object__find_map_by_name(obj, name); + if (!map) + return NULL; + return map; + } + return NULL; +} + +static int add_static_branch(struct bpf_program *prog, + struct jump_table_entry *entry, + struct bpf_map *map) +{ + __u32 size_old = prog->static_branches_info_size; + __u32 size_new = size_old + sizeof(struct static_branch_info); + struct static_branch_info *info; + void *x; + + x = realloc(prog->static_branches_info, size_new); + if (!x) + return -ENOMEM; + + info = x + size_old; + info->insn_offset = entry->insn_offset - prog->sec_insn_off * 8; + info->jump_target = entry->jump_target - prog->sec_insn_off * 8; + info->flags = (__u32) entry->flags; + info->map = map; + + prog->static_branches_info = x; + prog->static_branches_info_size = size_new; + + return 0; +} + +static int +bpf_object__collect_static_branches_relos(struct bpf_object *obj, + Elf64_Shdr *shdr, + Elf_Data *relo_data) +{ + Elf_Data *branches_data = obj->efile.static_branches_data; + int nrels = shdr->sh_size / shdr->sh_entsize; + struct jump_table_entry *entries; + size_t i; + int err; + + if (!branches_data) + return 0; + + entries = (void *)branches_data->d_buf; + for (i = 0; i < branches_data->d_size / sizeof(struct jump_table_entry); i++) { + __u32 entry_offset = i * sizeof(struct jump_table_entry); + struct bpf_program *prog; + struct bpf_map *map; + + prog = find_prog_for_jump_entry(obj, nrels, relo_data, entry_offset, &entries[i]); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + map = find_map_for_jump_entry(obj, nrels, relo_data, + entry_offset + offsetof(struct jump_table_entry, map_ptr)); + if (!map) + return -EINVAL; + + err = add_static_branch(prog, &entries[i], map); + if (err) + return err; + } + + return 0; +} + static int map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map) { int id; @@ -6298,10 +6499,44 @@ static struct reloc_desc *find_prog_insn_relo(const struct bpf_program *prog, si sizeof(*prog->reloc_desc), cmp_relo_by_insn_idx); } +static int append_subprog_static_branches(struct bpf_program *main_prog, + struct bpf_program *subprog) +{ + size_t subprog_size = subprog->static_branches_info_size; + size_t main_size = main_prog->static_branches_info_size; + size_t entry_size = sizeof(struct static_branch_info); + void *old_info = main_prog->static_branches_info; + int n_entries = subprog_size / entry_size; + struct static_branch_info *branch; + void *new_info; + int i; + + if (!subprog_size) + return 0; + + new_info = realloc(old_info, subprog_size + main_size); + if (!new_info) + return -ENOMEM; + + memcpy(new_info + main_size, subprog->static_branches_info, subprog_size); + + for (i = 0; i < n_entries; i++) { + branch = new_info + main_size + i * entry_size; + branch->insn_offset += subprog->sub_insn_off * 8; + branch->jump_target += subprog->sub_insn_off * 8; + } + + main_prog->static_branches_info = new_info; + main_prog->static_branches_info_size += subprog_size; + + return 0; +} + static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_program *subprog) { int new_cnt = main_prog->nr_reloc + subprog->nr_reloc; struct reloc_desc *relos; + int err; int i; if (main_prog == subprog) @@ -6324,6 +6559,11 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra */ main_prog->reloc_desc = relos; main_prog->nr_reloc = new_cnt; + + err = append_subprog_static_branches(main_prog, subprog); + if (err) + return err; + return 0; } @@ -6879,6 +7119,8 @@ static int bpf_object__collect_relos(struct bpf_object *obj) err = bpf_object__collect_st_ops_relos(obj, shdr, data); else if (idx == obj->efile.btf_maps_shndx) err = bpf_object__collect_map_relos(obj, shdr, data); + else if (idx == obj->efile.static_branches_shndx) + err = bpf_object__collect_static_branches_relos(obj, shdr, data); else err = bpf_object__collect_prog_relos(obj, shdr, data); if (err) @@ -7002,6 +7244,30 @@ static int libbpf_prepare_prog_load(struct bpf_program *prog, static void fixup_verifier_log(struct bpf_program *prog, char *buf, size_t buf_sz); +static struct bpf_static_branch_info * +convert_branch_info(struct static_branch_info *info, size_t size) +{ + size_t n = size/sizeof(struct static_branch_info); + struct bpf_static_branch_info *bpf_info; + size_t i; + + if (!info) + return NULL; + + bpf_info = calloc(n, sizeof(struct bpf_static_branch_info)); + if (!bpf_info) + return NULL; + + for (i = 0; i < n; i++) { + bpf_info[i].insn_offset = info[i].insn_offset; + bpf_info[i].jump_target = info[i].jump_target; + bpf_info[i].flags = info[i].flags; + bpf_info[i].map_fd = info[i].map->fd; + } + + return bpf_info; +} + static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, const char *license, __u32 kern_version, int *prog_fd) @@ -7106,6 +7372,11 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog load_attr.log_size = log_buf_size; load_attr.log_level = log_level; + load_attr.static_branches_info = convert_branch_info(prog->static_branches_info, + prog->static_branches_info_size); + load_attr.static_branches_info_size = prog->static_branches_info_size / + sizeof(struct static_branch_info) * sizeof(struct bpf_static_branch_info); + ret = bpf_prog_load(prog->type, prog_name, license, insns, insns_cnt, &load_attr); if (ret >= 0) { if (log_level && own_log_buf) { diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index f0f08635adb0..62020e7a58b0 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -40,6 +40,9 @@ #ifndef R_BPF_64_ABS32 #define R_BPF_64_ABS32 3 #endif +#ifndef R_BPF_64_NODYLD32 +#define R_BPF_64_NODYLD32 4 +#endif #ifndef R_BPF_64_32 #define R_BPF_64_32 10 #endif diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c index 5ced96d99f8c..47b343e2813e 100644 --- a/tools/lib/bpf/linker.c +++ b/tools/lib/bpf/linker.c @@ -22,6 +22,7 @@ #include "strset.h" #define BTF_EXTERN_SEC ".extern" +#define STATIC_JUMPS_REL_SEC ".rel.jump_table" struct src_sec { const char *sec_name; @@ -888,8 +889,9 @@ static int linker_sanity_check_elf_relos(struct src_obj *obj, struct src_sec *se size_t sym_type = ELF64_R_TYPE(relo->r_info); if (sym_type != R_BPF_64_64 && sym_type != R_BPF_64_32 && - sym_type != R_BPF_64_ABS64 && sym_type != R_BPF_64_ABS32) { - pr_warn("ELF relo #%d in section #%zu has unexpected type %zu in %s\n", + sym_type != R_BPF_64_ABS64 && sym_type != R_BPF_64_ABS32 && + sym_type != R_BPF_64_NODYLD32 && strcmp(sec->sec_name, STATIC_JUMPS_REL_SEC)) { + pr_warn("ELF relo #%d in section #%zu unexpected type %zu in %s\n", i, sec->sec_idx, sym_type, obj->filename); return -EINVAL; } @@ -2087,7 +2089,7 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob insn->imm += sec->dst_off / sizeof(struct bpf_insn); else insn->imm += sec->dst_off; - } else { + } else if (strcmp(src_sec->sec_name, STATIC_JUMPS_REL_SEC)) { pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n"); return -EINVAL; }
Introduce the DEFINE_STATIC_KEY() and bpf_static_branch_{unlikely,likely} macros to mimic Linux Kernel Static Keys API in BPF. Example of usage would be as follows: DEFINE_STATIC_KEY(key); void prog(void) { if (bpf_static_branch_unlikely(&key)) /* rarely used code */ else /* default hot path code */ } or, using the likely variant: void prog2(void) { if (bpf_static_branch_likely(&key)) /* default hot path code */ else /* rarely used code */ } The "unlikely" version of macro compiles in the code where the else-branch (key is off) is fall-through, the "likely" macro prioritises the if-branch. Both macros push an entry in a new ".jump_table" section which contains the following information: 32 bits 32 bits 64 bits offset of jump instruction | offset of jump target | flags The corresponding ".rel.jump_table" relocations table entry contains the base section name and the static key (map) name. The bigger portion of this patch works on parsing, relocating and sending this information to kernel via the static_branches_info and static_branches_info_size attributes of the BPF_PROG_LOAD syscall. The same key may be used multiple times in one program and can be used by multiple BPF programs. BPF doesn't provide guarantees on order in which static branches controlled by one key are patched. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> --- tools/lib/bpf/bpf.c | 5 +- tools/lib/bpf/bpf.h | 4 +- tools/lib/bpf/bpf_helpers.h | 64 ++++++++ tools/lib/bpf/libbpf.c | 273 +++++++++++++++++++++++++++++++- tools/lib/bpf/libbpf_internal.h | 3 + tools/lib/bpf/linker.c | 8 +- 6 files changed, 351 insertions(+), 6 deletions(-)