[bpf-next,v3,1/1] docs/bpf: Add description of register liveness tracking algorithm

Message ID	20230131181118.733845-2-eddyz87@gmail.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Eduard Zingerman <eddyz87@gmail.com> To: bpf@vger.kernel.org, ast@kernel.org Cc: andrii@kernel.org, daniel@iogearbox.net, martin.lau@linux.dev, kernel-team@fb.com, yhs@fb.com, ecree.xilinx@gmail.com, Eduard Zingerman <eddyz87@gmail.com> Subject: [PATCH bpf-next v3 1/1] docs/bpf: Add description of register liveness tracking algorithm Date: Tue, 31 Jan 2023 20:11:18 +0200 Message-Id: <20230131181118.733845-2-eddyz87@gmail.com> In-Reply-To: <20230131181118.733845-1-eddyz87@gmail.com> References: <20230131181118.733845-1-eddyz87@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	docs/bpf: Add description of register liveness tracking algorithm \| expand [bpf-next,v3,0/1] docs/bpf: Add description of register liveness tracking algorithm [bpf-next,v3,1/1] docs/bpf: Add description of register liveness tracking algorithm

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	fail	Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-37	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38	success	Logs for test_verifier on x86_64 with llvm-17
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	8 maintainers not CCed: john.fastabend@gmail.com sdf@google.com corbet@lwn.net linux-doc@vger.kernel.org jolsa@kernel.org song@kernel.org haoluo@google.com kpsingh@kernel.org
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 286 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-15	fail	Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-21	fail	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	fail	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26	fail	Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for set-matrix

diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst index 3afa548ec28c..34093d60fe87 100644 --- a/Documentation/bpf/verifier.rst +++ b/Documentation/bpf/verifier.rst @@ -316,6 +316,286 @@ Pruning considers not only the registers but also the stack (and any spilled registers it may hold). They must all be safe for the branch to be pruned. This is implemented in states_equal(). +Some technical details about state pruning implementation could be found below. + +Register liveness tracking +========================== + +In order to make state pruning effective, liveness state is tracked for each +register and stack slot. The basic idea is to track which registers and stack +slots are actually used during subseqeuent execution of the program, until +program exit is reached. Registers and stack slots that were never used could be +removed from the cached state thus making more states equivalent to a cached +state. This could be illustrated by the following program:: + + 0: call bpf_get_prandom_u32() + 1: r1 = 0 + 2: if r0 == 0 goto +1 + 3: r0 = 1 + --- checkpoint --- + 4: r0 = r1 + 5: exit + +Suppose that a state cache entry is created at instruction #4 (such entries are +also called "checkpoints" in the text below). The verifier could reach the +instruction with one of two possible register states: + +* r0 = 1, r1 = 0 +* r0 = 0, r1 = 0 + +However, only the value of register ``r1`` is important to successfully finish +verification. The goal of the liveness tracking algorithm is to spot this fact +and figure out that both states are actually equivalent. + +Data structures +=============== + +Liveness is tracked using the following data structures:: + + enum bpf_reg_liveness { + REG_LIVE_NONE = 0, + REG_LIVE_READ32 = 0x1, + REG_LIVE_READ64 = 0x2, + REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64, + REG_LIVE_WRITTEN = 0x4, + REG_LIVE_DONE = 0x8, + }; + + struct bpf_reg_state { + ... + struct bpf_reg_state *parent; + ... + enum bpf_reg_liveness live; + ... + }; + + struct bpf_stack_state { + struct bpf_reg_state spilled_ptr; + ... + }; + + struct bpf_func_state { + struct bpf_reg_state regs[MAX_BPF_REG]; + ... + struct bpf_stack_state *stack; + } + + struct bpf_verifier_state { + struct bpf_func_state *frame[MAX_CALL_FRAMES]; + struct bpf_verifier_state *parent; + ... + } + +* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new + verifier state creation; + +* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is + defined by some instruction "executed" within verifier state; + +* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot) + is read by some instruction "executed" within verifier state; + +* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid + processing same verifier state multiple times and for some sanity checks; + +* ``->live`` field values are formed by combining ``enum bpf_reg_liveness`` + values using bitwise or. + +Register parentage chains +========================= + +In order to propagate information between parent and child states, a *register +parentage chain* is established. Each register or stack slot is linked to a +corresponding register or stack slot in its parent state via a ``->parent`` +pointer. This link is established upon state creation in ``is_state_visited()`` +and might be modified by ``set_callee_state()`` called from +``__check_func_call()``. + +The rules for correspondence between registers / stack slots are as follows: + +* For the current stack frame, registers and stack slots of the new state are + linked to the registers and stack slots of the parent state with the same + indices. + +* For the outer stack frames, only caller saved registers (r6-r9) and stack + slots are linked to the registers and stack slots of the parent state with the + same indices. + +* When function call is processed a new ``struct bpf_func_state`` instance is + allocated, it encapsulates a new set of registers and stack slots. For this + new frame, parent links for r6-r9 and stack slots are set to nil, parent links + for r1-r5 are set to match caller r1-r5 parent links. + +This could be illustrated by the following diagram (arrows stand for +``->parent`` pointers):: + + ... ; Frame #0, some instructions + --- checkpoint #0 --- + 1 : r6 = 42 ; Frame #0 + --- checkpoint #1 --- + 2 : call foo() ; Frame #0 + ... ; Frame #1, instructions from foo() + --- checkpoint #2 --- + ... ; Frame #1, instructions from foo() + --- checkpoint #3 --- + exit ; Frame #1, return from foo() + 3 : r1 = r6 ; Frame #0 <- current state + + +-------------------------------+-------------------------------+ + | Frame #0 | Frame #1 | + Checkpoint +-------------------------------+-------------------------------+ + #0 | r0 | r1-r5 | r6-r9 | fp-8 ... | + +-------------------------------+ + ^ ^ ^ ^ + | | | | + Checkpoint +-------------------------------+ + #1 | r0 | r1-r5 | r6-r9 | fp-8 ... | + +-------------------------------+ + ^ ^ ^ + |_______|_______|_______________ + | | | + nil nil | | | nil nil + | | | | | | | + Checkpoint +-------------------------------+-------------------------------+ + #2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | + +-------------------------------+-------------------------------+ + ^ ^ ^ ^ ^ + nil nil | | | | | + | | | | | | | + Checkpoint +-------------------------------+-------------------------------+ + #3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | + +-------------------------------+-------------------------------+ + ^ ^ + nil nil | | + | | | | + Current +-------------------------------+ + state | r0 | r1-r5 | r6-r9 | fp-8 ... | + +-------------------------------+ + \ + r6 read mark is propagated via + these links all the way up to + checkpoint #1. + +Liveness marks tracking +======================= + +For each processed instruction, the verifier propagates information about reads +up the parentage chain and saves information about writes in the current state. +The information about reads is propagated by function ``mark_reg_read()`` which +could be summarized as follows:: + + mark_reg_read(struct bpf_reg_state *state): + parent = state->parent + while parent: + if parent->live & REG_LIVE_WRITTEN: + break + if parent->live & REG_LIVE_READ64: + break + parent->live |= REG_LIVE_READ64 + state = parent + parent = state->parent + +Note: details about REG_LIVE_READ32 are omitted. + +Also note: the read marks are applied to the **parent** state while write marks +are applied to the **current** state. + +Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are +applied conservatively: stack slots are marked as written only if write size +corresponds to the size of the register, e.g. see function ``save_register_state()``. + +Consider the following example:: + + 0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8 + --- checkpoint #0 --- + 1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes + 2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1) + 3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0) + +As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should +it be otherwise, the algorithm above wouldn't be able to propagate the read mark +from (3) to checkpoint #0. + +Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is +called to update the ``->branches`` counter for each verifier state in a chain +of parent verifier states. When the ``->branches`` counter reaches zero the +verifier state becomes a valid entry in a set of cached verifier states. + +Each entry of the verifier states cache is post-processed by a function +``clean_live_states()``. This function marks all registers and stack slots +without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``. +Registers/stack slots marked in this way are ignored in function ``stacksafe()`` +called from ``states_equal()`` when a state cache entry is considered for +equivalence with a current state. + +Now it is possible to explain how the example from the beginning of the section +works:: + + 0: call bpf_get_prandom_u32() + 1: r1 = 0 + 2: if r0 == 0 goto +1 + 3: r0 = 1 + --- checkpoint[0] --- + 4: r0 = r1 + 5: exit + +* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }`` + is pushed to states processing queue (pc stands for program counter). + +* At instruction #4: + + * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``; + * ``checkpoint[0].r0`` is marked as written; + * ``checkpoint[0].r1`` is marked as read; + +* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed + by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a + read mark and all other registers and stack slots are marked as ``NOT_INIT`` + or ``STACK_INVALID`` + +* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue + and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states + are considered equivalent. + +Read marks propagation for cache hits +===================================== + +Another point is the handling of read marks when a previously verified state is +found in the states cache. Upon cache hit verifier must behave in the same way +as if the current state was verified to the program exit. This means that all +read marks, present on registers and stack slots of the cached state, must be +propagated over the parentage chain of the current state. Example below shows +why this is important. Function ``propagate_liveness()`` handles this case. + +Consider the following state parentage chain (S is a starting state, A-E are +derived states, -> arrows show which state is derived from which):: + + r1 read + <------------- A[r1] == 0 + C[r1] == 0 + S ---> A ---> B ---> exit E[r1] == 1 + | + ` ---> C ---> D + | + ` ---> E ^ + |___ suppose all these + ^ states are at insn #Y + | + suppose all these + states are at insn #X + +* Chain of states ``S -> A -> B -> exit`` is verified first. + +* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is + propagated up to state ``A``. + +* When chain of states ``C -> D`` is verified the state ``D`` turns out to be + equivalent to state ``B``. + +* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state + ``C`` might get mistakenly marked as equivalent to state ``E`` even though + values for register ``r1`` differ between ``C`` and ``E``. + Understanding eBPF verifier messages ====================================

[bpf-next,v3,1/1] docs/bpf: Add description of register liveness tracking algorithm

Checks

Commit Message

Comments

Patch