diff mbox series

[bpf-next,v2] bpf, docs: Improve English readability

Message ID 20230706160537.1309-1-dthaler1968@googlemail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next,v2] bpf, docs: Improve English readability | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers warning 14 maintainers not CCed: void@manifault.com daniel@iogearbox.net yhs@fb.com kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com song@kernel.org sdf@google.com andrii@kernel.org corbet@lwn.net jolsa@kernel.org haoluo@google.com ast@kernel.org linux-doc@vger.kernel.org
netdev/build_clang fail Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: Missing commit description - Add an appropriate one
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix

Commit Message

Dave Thaler July 6, 2023, 4:05 p.m. UTC
From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
--
V1 -> V2: addressed comments from Alexei
---
 Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
 Documentation/bpf/linux-notes.rst     |  5 +++
 2 files changed, 50 insertions(+), 14 deletions(-)

Comments

Alexei Starovoitov July 6, 2023, 8:41 p.m. UTC | #1
On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> --
> V1 -> V2: addressed comments from Alexei
> ---
>  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
>  Documentation/bpf/linux-notes.rst     |  5 +++
>  2 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 751e657973f..740989f4c1e 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
>  
>  This document specifies version 1.0 of the eBPF instruction set.
>  
> +The eBPF instruction set consists of eleven 64 bit registers, a program counter,
> +and an implementation-specific amount (e.g., 512 bytes) of stack space.
> +
>  Documentation conventions
>  =========================
>  
> @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
>  * R6 - R9: callee saved registers that function calls will preserve
>  * R10: read-only frame pointer to access stack
>  
> -R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
> -necessary across calls.
> +Registers R0 - R5 are caller-saved registers, meaning the BPF program needs to either
> +spill them to the BPF stack or move them to callee saved registers if these
> +arguments are to be reused across multiple function calls. Spilling means
> +that the value in the register is moved to the BPF stack. The reverse operation
> +of moving the variable from the BPF stack to the register is called filling.
> +The reason for spilling/filling is due to the limited number of registers.

imo this extended explanation goes too far.
It's also not entirely correct. We could have an ISA with limited number of registers
where every register is callee saved. A bit absurd, but possible.
Or went with SPARC style register windows.

> +
> +Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
> +the input arguments for the program (similar to the argc/argv pair for a typical C program).

argc/argv is only for main(). We don't have main() concept in BPF ISA.
argc/argv is also not a property of ISA.

> +The actual number of registers used, and their meaning, is defined by the program type;
> +for example, a networking program might have an argument that includes network packet data
> +and/or metadata.

that makes things even more confusing.

tbh none of the above changes make the doc easier to read.

>  Instruction encoding
>  ====================
>  
> +An eBPF program is a sequence of instructions.

Kinda true, but it opens the door for plenty of bike shedding.
Is it contiguous sequence? what about subprograms?
Is BPF program a one function or multiple functions?
etc.
Just not worth it.
This is ISA doc.

> +
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> @@ -74,7 +89,7 @@ For example::
>    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
>  
>  Note that most instructions do not use all of the fields.
> -Unused fields shall be cleared to zero.
> +Unused fields must be set to zero.

How is this better?

>  
>  As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
>  instruction uses a 64-bit immediate value that is constructed as follows.
> @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to zero.
>  Instruction classes
>  -------------------
>  
> -The three LSB bits of the 'opcode' field store the instruction class:
> +The encoding of the 'opcode' field varies and can be determined from
> +the three least significant bits (LSB) of the 'opcode' field which holds
> +the "instruction class", as follows:

same question. Don't see an improvement in wording.

>  
>  =========  =====  ===============================  ===================================
>  class      value  description                      reference
> @@ -149,7 +166,8 @@ code            source  instruction class
>  Arithmetic instructions
>  -----------------------
>  
> -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
> +Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
> +of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for

The other part of the doc mentions zeroing. No need to repeat.

>  otherwise identical operations.
>  The 'code' field encodes the operation as below, where 'src' and 'dst' refer
>  to the values of the source and destination registers, respectively.
> @@ -216,8 +234,9 @@ The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
>  The byte swap instructions operate on the destination register
>  only and do not use a separate source register or immediate value.
>  
> -The 1-bit source operand field in the opcode is used to select what byte
> -order the operation convert from or to:
> +Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
> +as follows.  Instead of indicating the source operator, it is instead
> +used to select what byte order the operation converts from or to:

+1 to this part.

>  
>  =========  =====  =================================================
>  source     value  description
> @@ -235,16 +254,21 @@ Examples:
>  
>    dst = htole16(dst)
>  
> +where 'htole16()' indicates converting a 16-bit value from host byte order to little-endian byte order.
> +
>  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
>  
>    dst = htobe64(dst)
>  
> +where 'htobe64()' indicates converting a 64-bit value from host byte order to big-endian byte order.
> +
>  Jump instructions
>  -----------------
>  
> -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
> +Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
>  otherwise identical operations.
> -The 'code' field encodes the operation as below:
> +
> +The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
>  
>  ========  =====  ===  ===========================================  =========================================
>  code      value  src  description                                  notes
> @@ -311,7 +335,8 @@ For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
>  mode          size    instruction class
>  ============  ======  =================
>  
> -The mode modifier is one of:
> +mode
> +  one of:
>  
>    =============  =====  ====================================  =============
>    mode modifier  value  description                           reference
> @@ -323,7 +348,8 @@ The mode modifier is one of:
>    BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
>    =============  =====  ====================================  =============
>  
> -The size modifier is one of:
> +size
> +  one of:
>  
>    =============  =====  =====================
>    size modifier  value  description
> @@ -334,6 +360,9 @@ The size modifier is one of:
>    BPF_DW         0x18   double word (8 bytes)
>    =============  =====  =====================
>  
> +instruction class
> +  the instruction class (see `Instruction classes`_)
> +
>  Regular load and store operations
>  ---------------------------------
>  
> @@ -352,7 +381,7 @@ instructions that transfer data between a register and memory.
>  
>    dst = *(size *) (src + offset)
>  
> -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> +where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
>  
>  Atomic operations
>  -----------------
> @@ -366,7 +395,9 @@ that use the ``BPF_ATOMIC`` mode modifier as follows:
>  
>  * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
>  * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
> -* 8-bit and 16-bit wide atomic operations are not supported.
> +
> +Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not currently supported,
> +nor is ``BPF_ATOMIC | <size> | BPF_ST``.
>  
>  The 'imm' field is used to encode the actual atomic operation.
>  Simple atomic operation use a subset of the values defined to encode
> @@ -390,7 +421,7 @@ BPF_XOR   0xa0   atomic xor
>  
>    *(u64 *)(dst + offset) += src
>  
> -In addition to the simple atomic operations, there also is a modifier and
> +In addition to the simple atomic operations above, there also is a modifier and
>  two complex atomic operations:
>  
>  ===========  ================  ===========================
> diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
> index 508d009d3be..724579fd62d 100644
> --- a/Documentation/bpf/linux-notes.rst
> +++ b/Documentation/bpf/linux-notes.rst
> @@ -7,6 +7,11 @@ Linux implementation notes
>  
>  This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
>  
> +Stack space
> +======================
> +
> +Linux currently supports 512 bytes of stack space.

I wouldn't open this door.
The verifier enforces 512 stack space for bpf prog plus all of its subprogs that it calls directly.
There is no good way to describe it concisely. And such description doesn't belong in ISA doc.
Dave Thaler July 6, 2023, 10:03 p.m. UTC | #2
> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Thursday, July 6, 2023 1:42 PM
> To: Dave Thaler <dthaler1968@googlemail.com>
> Cc: bpf@vger.kernel.org; bpf@ietf.org; Dave Thaler
> <dthaler@microsoft.com>
> Subject: Re: [PATCH bpf-next v2] bpf, docs: Improve English readability
> 
> On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> > From: Dave Thaler <dthaler@microsoft.com>
> >
> > Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> > --
> > V1 -> V2: addressed comments from Alexei
> > ---
> >  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
> >  Documentation/bpf/linux-notes.rst     |  5 +++
> >  2 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/bpf/instruction-set.rst
> > b/Documentation/bpf/instruction-set.rst
> > index 751e657973f..740989f4c1e 100644
> > --- a/Documentation/bpf/instruction-set.rst
> > +++ b/Documentation/bpf/instruction-set.rst
> > @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
> >
> >  This document specifies version 1.0 of the eBPF instruction set.
> >
> > +The eBPF instruction set consists of eleven 64 bit registers, a
> > +program counter, and an implementation-specific amount (e.g., 512 bytes)
> of stack space.
> > +
> >  Documentation conventions
> >  =========================
> >
> > @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
> >  * R6 - R9: callee saved registers that function calls will preserve
> >  * R10: read-only frame pointer to access stack
> >
> > -R0 - R5 are scratch registers and eBPF programs needs to spill/fill
> > them if -necessary across calls.
> > +Registers R0 - R5 are caller-saved registers, meaning the BPF program
> > +needs to either spill them to the BPF stack or move them to callee
> > +saved registers if these arguments are to be reused across multiple
> > +function calls. Spilling means that the value in the register is
> > +moved to the BPF stack. The reverse operation of moving the variable
> from the BPF stack to the register is called filling.
> > +The reason for spilling/filling is due to the limited number of registers.
> 
> imo this extended explanation goes too far.
> It's also not entirely correct. We could have an ISA with limited number of
> registers where every register is callee saved. A bit absurd, but possible.
> Or went with SPARC style register windows.

At https://lore.kernel.org/bpf/20220930221624.mqjrzmdxc6etkadm@macbook-pro-4.dhcp.thefacebook.com/ you said about the above
"I like above clarification though."

I think it's important for interoperability to define which registers are caller-saved
and which are not, so a compiler (or even verifier) can be used for multiple runtimes.

> > +
> > +Upon entering execution of an eBPF program, registers R1 - R5
> > +initially can contain the input arguments for the program (similar to the
> argc/argv pair for a typical C program).
> 
> argc/argv is only for main(). We don't have main() concept in BPF ISA.
> argc/argv is also not a property of ISA.

That's why it's "similar to".  I think the analogy helps understanding for new readers.
 
> > +The actual number of registers used, and their meaning, is defined by
> > +the program type; for example, a networking program might have an
> > +argument that includes network packet data and/or metadata.
> 
> that makes things even more confusing.
> 
> tbh none of the above changes make the doc easier to read.

The program type defines the number and meaning of any arguments passed
to the program.  In the ISA that means the number of registered used to
pass inputs, and their contents.

> >  Instruction encoding
> >  ====================
> >
> > +An eBPF program is a sequence of instructions.
> 
> Kinda true, but it opens the door for plenty of bike shedding.
> Is it contiguous sequence? what about subprograms?
> Is BPF program a one function or multiple functions?

The term "subprogram" is not currently part of the 
instruction-set.rst doc.   "Program-local functions"
are, and the text says they're part of the same BPF program.
Hence the doc already says a BPF program can have multiple
functions.

> etc.
> Just not worth it.
> This is ISA doc.
> 
> > +
> >  eBPF has two instruction encodings:
> >
> >  * the basic instruction encoding, which uses 64 bits to encode an
> > instruction @@ -74,7 +89,7 @@ For example::
> >    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
> >
> >  Note that most instructions do not use all of the fields.
> > -Unused fields shall be cleared to zero.
> > +Unused fields must be set to zero.
> 
> How is this better?

It uses the language common in RFCs.

> >  As discussed below in `64-bit immediate instructions`_, a 64-bit
> > immediate  instruction uses a 64-bit immediate value that is constructed as
> follows.
> > @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to
> zero.
> >  Instruction classes
> >  -------------------
> >
> > -The three LSB bits of the 'opcode' field store the instruction class:
> > +The encoding of the 'opcode' field varies and can be determined from
> > +the three least significant bits (LSB) of the 'opcode' field which
> > +holds the "instruction class", as follows:
> 
> same question. Don't see an improvement in wording.

1. The acronym LSB was not defined and does not have an asterisk by it in
the https://www.rfc-editor.org/materials/abbrev.expansion.txt list.

2. "LSB bits" is redundant.

3. Putting "instruction class" in quotes is common when defining by use
the first time.

> >
> >  =========  =====  ===============================
> ===================================
> >  class      value  description                      reference
> > @@ -149,7 +166,8 @@ code            source  instruction class
> >  Arithmetic instructions
> >  -----------------------
> >
> > -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit
> > wide operands for
> > +Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the
> > +upper 32 bits of the destination register) while ``BPF_ALU64`` uses
> > +64-bit wide operands for
> 
> The other part of the doc mentions zeroing. No need to repeat.
> 
> >  otherwise identical operations.
> >  The 'code' field encodes the operation as below, where 'src' and
> > 'dst' refer  to the values of the source and destination registers,
> respectively.
> > @@ -216,8 +234,9 @@ The byte swap instructions use an instruction
> > class of ``BPF_ALU`` and a 4-bit  The byte swap instructions operate
> > on the destination register  only and do not use a separate source register
> or immediate value.
> >
> > -The 1-bit source operand field in the opcode is used to select what
> > byte -order the operation convert from or to:
> > +Byte swap instructions use the 1-bit 'source' field in the 'opcode'
> > +field as follows.  Instead of indicating the source operator, it is
> > +instead used to select what byte order the operation converts from or to:
> 
> +1 to this part.
> 
> >
> >  =========  =====
> =================================================
> >  source     value  description
> > @@ -235,16 +254,21 @@ Examples:
> >
> >    dst = htole16(dst)
> >
> > +where 'htole16()' indicates converting a 16-bit value from host byte order
> to little-endian byte order.
> > +
> >  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> >
> >    dst = htobe64(dst)
> >
> > +where 'htobe64()' indicates converting a 64-bit value from host byte order
> to big-endian byte order.
> > +
> >  Jump instructions
> >  -----------------
> >
> > -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit
> > wide operands for
> > +Instruction class ``BPF_JMP32`` uses 32-bit wide operands while
> > +``BPF_JMP`` uses 64-bit wide operands for
> >  otherwise identical operations.
> > -The 'code' field encodes the operation as below:
> > +
> > +The 4-bit 'code' field encodes the operation as below, where PC is the
> program counter:
> >
> >  ========  =====  ===  ===========================================
> =========================================
> >  code      value  src  description                                  notes
> > @@ -311,7 +335,8 @@ For load and store instructions (``BPF_LD``,
> ``BPF_LDX``, ``BPF_ST``, and ``BPF_
> >  mode          size    instruction class
> >  ============  ======  =================
> >
> > -The mode modifier is one of:
> > +mode
> > +  one of:
> >
> >    =============  =====  ====================================
> =============
> >    mode modifier  value  description                           reference
> > @@ -323,7 +348,8 @@ The mode modifier is one of:
> >    BPF_ATOMIC     0xc0   atomic operations                     `Atomic
> operations`_
> >    =============  =====  ====================================
> > =============
> >
> > -The size modifier is one of:
> > +size
> > +  one of:
> >
> >    =============  =====  =====================
> >    size modifier  value  description
> > @@ -334,6 +360,9 @@ The size modifier is one of:
> >    BPF_DW         0x18   double word (8 bytes)
> >    =============  =====  =====================
> >
> > +instruction class
> > +  the instruction class (see `Instruction classes`_)
> > +
> >  Regular load and store operations
> >  ---------------------------------
> >
> > @@ -352,7 +381,7 @@ instructions that transfer data between a register
> and memory.
> >
> >    dst = *(size *) (src + offset)
> >
> > -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> > +where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> >
> >  Atomic operations
> >  -----------------
> > @@ -366,7 +395,9 @@ that use the ``BPF_ATOMIC`` mode modifier as
> follows:
> >
> >  * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
> >  * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
> > -* 8-bit and 16-bit wide atomic operations are not supported.
> > +
> > +Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic
> > +operations are not currently supported, nor is ``BPF_ATOMIC | <size> |
> BPF_ST``.
> >
> >  The 'imm' field is used to encode the actual atomic operation.
> >  Simple atomic operation use a subset of the values defined to encode
> > @@ -390,7 +421,7 @@ BPF_XOR   0xa0   atomic xor
> >
> >    *(u64 *)(dst + offset) += src
> >
> > -In addition to the simple atomic operations, there also is a modifier
> > and
> > +In addition to the simple atomic operations above, there also is a
> > +modifier and
> >  two complex atomic operations:
> >
> >  ===========  ================  =========================== diff --git
> > a/Documentation/bpf/linux-notes.rst
> > b/Documentation/bpf/linux-notes.rst
> > index 508d009d3be..724579fd62d 100644
> > --- a/Documentation/bpf/linux-notes.rst
> > +++ b/Documentation/bpf/linux-notes.rst
> > @@ -7,6 +7,11 @@ Linux implementation notes
> >
> >  This document provides more details specific to the Linux kernel
> implementation of the eBPF instruction set.
> >
> > +Stack space
> > +======================
> > +
> > +Linux currently supports 512 bytes of stack space.
> 
> I wouldn't open this door.
> The verifier enforces 512 stack space for bpf prog plus all of its subprogs that
> it calls directly.
> There is no good way to describe it concisely. And such description doesn't
> belong in ISA doc.

linux-notes.rst is not in the ISA doc.   The ISA doc says the value is implementation
defined.  linux-notes.rst says what Linux does for things the ISA doc leaves up
to the implementation.

Dave
Alexei Starovoitov July 6, 2023, 11:14 p.m. UTC | #3
On Thu, Jul 6, 2023 at 3:04 PM Dave Thaler <dthaler@microsoft.com> wrote:
>
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Thursday, July 6, 2023 1:42 PM
> > To: Dave Thaler <dthaler1968@googlemail.com>
> > Cc: bpf@vger.kernel.org; bpf@ietf.org; Dave Thaler
> > <dthaler@microsoft.com>
> > Subject: Re: [PATCH bpf-next v2] bpf, docs: Improve English readability
> >
> > On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> > > From: Dave Thaler <dthaler@microsoft.com>
> > >
> > > Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> > > --
> > > V1 -> V2: addressed comments from Alexei
> > > ---
> > >  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
> > >  Documentation/bpf/linux-notes.rst     |  5 +++
> > >  2 files changed, 50 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/Documentation/bpf/instruction-set.rst
> > > b/Documentation/bpf/instruction-set.rst
> > > index 751e657973f..740989f4c1e 100644
> > > --- a/Documentation/bpf/instruction-set.rst
> > > +++ b/Documentation/bpf/instruction-set.rst
> > > @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
> > >
> > >  This document specifies version 1.0 of the eBPF instruction set.
> > >
> > > +The eBPF instruction set consists of eleven 64 bit registers, a
> > > +program counter, and an implementation-specific amount (e.g., 512 bytes)
> > of stack space.
> > > +
> > >  Documentation conventions
> > >  =========================
> > >
> > > @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
> > >  * R6 - R9: callee saved registers that function calls will preserve
> > >  * R10: read-only frame pointer to access stack
> > >
> > > -R0 - R5 are scratch registers and eBPF programs needs to spill/fill
> > > them if -necessary across calls.
> > > +Registers R0 - R5 are caller-saved registers, meaning the BPF program
> > > +needs to either spill them to the BPF stack or move them to callee
> > > +saved registers if these arguments are to be reused across multiple
> > > +function calls. Spilling means that the value in the register is
> > > +moved to the BPF stack. The reverse operation of moving the variable
> > from the BPF stack to the register is called filling.
> > > +The reason for spilling/filling is due to the limited number of registers.
> >
> > imo this extended explanation goes too far.
> > It's also not entirely correct. We could have an ISA with limited number of
> > registers where every register is callee saved. A bit absurd, but possible.
> > Or went with SPARC style register windows.
>
> At https://lore.kernel.org/bpf/20220930221624.mqjrzmdxc6etkadm@macbook-pro-4.dhcp.thefacebook.com/ you said about the above
> "I like above clarification though."

That was on "30 Sep 2022".
I like to change my mind often enough to confuse everyone :)

> I think it's important for interoperability to define which registers are caller-saved
> and which are not, so a compiler (or even verifier) can be used for multiple runtimes.

but it really doesn't belong in the ISA doc.
We've discussed it at length.
We need the psABI doc.
ISA doc is a description of instructions and _not_ how they shape
into functions and programs.
More below.

>
> > > +
> > > +Upon entering execution of an eBPF program, registers R1 - R5
> > > +initially can contain the input arguments for the program (similar to the
> > argc/argv pair for a typical C program).
> >
> > argc/argv is only for main(). We don't have main() concept in BPF ISA.
> > argc/argv is also not a property of ISA.
>
> That's why it's "similar to".  I think the analogy helps understanding for new readers.

argc is a K&R C thing. Not even a calling convention.
That sentence is a combination of analogies from different areas.
argc could be interpreted that the first argument is a count
and a second argument is an array.
See the confusion it might cause?

>
> > > +The actual number of registers used, and their meaning, is defined by
> > > +the program type; for example, a networking program might have an
> > > +argument that includes network packet data and/or metadata.
> >
> > that makes things even more confusing.
> >
> > tbh none of the above changes make the doc easier to read.
>
> The program type defines the number and meaning of any arguments passed
> to the program.  In the ISA that means the number of registered used to
> pass inputs, and their contents.

Not at all. ISA is an instruction set only and this doc is for ISA.
What different program types should accept belongs in a different doc.
And it's not a psABI doc.
More below.

>
> > >  Instruction encoding
> > >  ====================
> > >
> > > +An eBPF program is a sequence of instructions.
> >
> > Kinda true, but it opens the door for plenty of bike shedding.
> > Is it contiguous sequence? what about subprograms?
> > Is BPF program a one function or multiple functions?
>
> The term "subprogram" is not currently part of the
> instruction-set.rst doc.   "Program-local functions"
> are, and the text says they're part of the same BPF program.
> Hence the doc already says a BPF program can have multiple
> functions.

Yes. It's a mess, but we must not make it worse.
instruction-set.rst is for instructions.
Definition of BPF program belongs to a different doc.

> > etc.
> > Just not worth it.
> > This is ISA doc.
> >
> > > +
> > >  eBPF has two instruction encodings:
> > >
> > >  * the basic instruction encoding, which uses 64 bits to encode an
> > > instruction @@ -74,7 +89,7 @@ For example::
> > >    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
> > >
> > >  Note that most instructions do not use all of the fields.
> > > -Unused fields shall be cleared to zero.
> > > +Unused fields must be set to zero.
> >
> > How is this better?
>
> It uses the language common in RFCs.
>
> > >  As discussed below in `64-bit immediate instructions`_, a 64-bit
> > > immediate  instruction uses a 64-bit immediate value that is constructed as
> > follows.
> > > @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to
> > zero.
> > >  Instruction classes
> > >  -------------------
> > >
> > > -The three LSB bits of the 'opcode' field store the instruction class:
> > > +The encoding of the 'opcode' field varies and can be determined from
> > > +the three least significant bits (LSB) of the 'opcode' field which
> > > +holds the "instruction class", as follows:
> >
> > same question. Don't see an improvement in wording.
>
> 1. The acronym LSB was not defined and does not have an asterisk by it in
> the https://www.rfc-editor.org/materials/abbrev.expansion.txt list.
>
> 2. "LSB bits" is redundant.
>
> 3. Putting "instruction class" in quotes is common when defining by use
> the first time.

ok. fair enough.

>
> linux-notes.rst is not in the ISA doc.   The ISA doc says the value is implementation
> defined.  linux-notes.rst says what Linux does for things the ISA doc leaves up
> to the implementation.

I see no value in "Linux currently supports 512 bytes of stack space" sentence.
It's too ambiguous to be useful.

instruction-set.rst is a description of ISA. We should remove things
from it that don't belong instead of doubling down on the current mess.

we need the psABI doc that will not be standardized as Jose recommended.
That doc will describe recommended calling convention, argument
promotion, stack usage, relocation, function/subprogram definition, etc
psABI probably should include BTF.ext description, but I'm open to alternatives.

BPF program types supported by the kernel is a 3rd document.
There we can explain that XDP prog takes a single ctx argument and it's
a pointer to ...
Such info is linux specific. Based on ISA and psABI one can come up
with different program/map types and semantics of their arguments
while being fully compliant with ISA and psABI docs.
Such doc can start its journey in linux-notes.rst and then split, if necessary.
diff mbox series

Patch

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 751e657973f..740989f4c1e 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -7,6 +7,9 @@  eBPF Instruction Set Specification, v1.0
 
 This document specifies version 1.0 of the eBPF instruction set.
 
+The eBPF instruction set consists of eleven 64 bit registers, a program counter,
+and an implementation-specific amount (e.g., 512 bytes) of stack space.
+
 Documentation conventions
 =========================
 
@@ -27,12 +30,24 @@  The eBPF calling convention is defined as:
 * R6 - R9: callee saved registers that function calls will preserve
 * R10: read-only frame pointer to access stack
 
-R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
-necessary across calls.
+Registers R0 - R5 are caller-saved registers, meaning the BPF program needs to either
+spill them to the BPF stack or move them to callee saved registers if these
+arguments are to be reused across multiple function calls. Spilling means
+that the value in the register is moved to the BPF stack. The reverse operation
+of moving the variable from the BPF stack to the register is called filling.
+The reason for spilling/filling is due to the limited number of registers.
+
+Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
+the input arguments for the program (similar to the argc/argv pair for a typical C program).
+The actual number of registers used, and their meaning, is defined by the program type;
+for example, a networking program might have an argument that includes network packet data
+and/or metadata.
 
 Instruction encoding
 ====================
 
+An eBPF program is a sequence of instructions.
+
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
@@ -74,7 +89,7 @@  For example::
   07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
 
 Note that most instructions do not use all of the fields.
-Unused fields shall be cleared to zero.
+Unused fields must be set to zero.
 
 As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
 instruction uses a 64-bit immediate value that is constructed as follows.
@@ -103,7 +118,9 @@  instruction are reserved and shall be cleared to zero.
 Instruction classes
 -------------------
 
-The three LSB bits of the 'opcode' field store the instruction class:
+The encoding of the 'opcode' field varies and can be determined from
+the three least significant bits (LSB) of the 'opcode' field which holds
+the "instruction class", as follows:
 
 =========  =====  ===============================  ===================================
 class      value  description                      reference
@@ -149,7 +166,8 @@  code            source  instruction class
 Arithmetic instructions
 -----------------------
 
-``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
+Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
+of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for
 otherwise identical operations.
 The 'code' field encodes the operation as below, where 'src' and 'dst' refer
 to the values of the source and destination registers, respectively.
@@ -216,8 +234,9 @@  The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
 The byte swap instructions operate on the destination register
 only and do not use a separate source register or immediate value.
 
-The 1-bit source operand field in the opcode is used to select what byte
-order the operation convert from or to:
+Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
+as follows.  Instead of indicating the source operator, it is instead
+used to select what byte order the operation converts from or to:
 
 =========  =====  =================================================
 source     value  description
@@ -235,16 +254,21 @@  Examples:
 
   dst = htole16(dst)
 
+where 'htole16()' indicates converting a 16-bit value from host byte order to little-endian byte order.
+
 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
 
   dst = htobe64(dst)
 
+where 'htobe64()' indicates converting a 64-bit value from host byte order to big-endian byte order.
+
 Jump instructions
 -----------------
 
-``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
+Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
 otherwise identical operations.
-The 'code' field encodes the operation as below:
+
+The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
 
 ========  =====  ===  ===========================================  =========================================
 code      value  src  description                                  notes
@@ -311,7 +335,8 @@  For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
 mode          size    instruction class
 ============  ======  =================
 
-The mode modifier is one of:
+mode
+  one of:
 
   =============  =====  ====================================  =============
   mode modifier  value  description                           reference
@@ -323,7 +348,8 @@  The mode modifier is one of:
   BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
   =============  =====  ====================================  =============
 
-The size modifier is one of:
+size
+  one of:
 
   =============  =====  =====================
   size modifier  value  description
@@ -334,6 +360,9 @@  The size modifier is one of:
   BPF_DW         0x18   double word (8 bytes)
   =============  =====  =====================
 
+instruction class
+  the instruction class (see `Instruction classes`_)
+
 Regular load and store operations
 ---------------------------------
 
@@ -352,7 +381,7 @@  instructions that transfer data between a register and memory.
 
   dst = *(size *) (src + offset)
 
-Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
+where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
 
 Atomic operations
 -----------------
@@ -366,7 +395,9 @@  that use the ``BPF_ATOMIC`` mode modifier as follows:
 
 * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
 * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
-* 8-bit and 16-bit wide atomic operations are not supported.
+
+Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not currently supported,
+nor is ``BPF_ATOMIC | <size> | BPF_ST``.
 
 The 'imm' field is used to encode the actual atomic operation.
 Simple atomic operation use a subset of the values defined to encode
@@ -390,7 +421,7 @@  BPF_XOR   0xa0   atomic xor
 
   *(u64 *)(dst + offset) += src
 
-In addition to the simple atomic operations, there also is a modifier and
+In addition to the simple atomic operations above, there also is a modifier and
 two complex atomic operations:
 
 ===========  ================  ===========================
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 508d009d3be..724579fd62d 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -7,6 +7,11 @@  Linux implementation notes
 
 This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
 
+Stack space
+======================
+
+Linux currently supports 512 bytes of stack space.
+
 Byte swap instructions
 ======================