[bpf-next,v2] bpf, docs: Improve English readability

Message ID	20230706160537.1309-1-dthaler1968@googlemail.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 443E2A941 for <bpf@vger.kernel.org>; Thu, 6 Jul 2023 16:05:46 +0000 (UTC) From: Dave Thaler <dthaler1968@googlemail.com> To: bpf@vger.kernel.org Cc: bpf@ietf.org, Dave Thaler <dthaler@microsoft.com> Subject: [PATCH bpf-next v2] bpf, docs: Improve English readability Date: Thu, 6 Jul 2023 16:05:37 +0000 Message-Id: <20230706160537.1309-1-dthaler1968@googlemail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[bpf-next,v2] bpf, docs: Improve English readability \| expand [bpf-next,v2] bpf, docs: Improve English readability

Message ID

20230706160537.1309-1-dthaler1968@googlemail.com (mailing list archive)

State

Changes Requested

Delegated to:

BPF

Headers

From: Dave Thaler <dthaler1968@googlemail.com>
To: bpf@vger.kernel.org
Cc: bpf@ietf.org,
	Dave Thaler <dthaler@microsoft.com>
Subject: [PATCH bpf-next v2] bpf, docs: Improve English readability
Date: Thu,  6 Jul 2023 16:05:37 +0000
Message-Id: <20230706160537.1309-1-dthaler1968@googlemail.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[bpf-next,v2] bpf, docs: Improve English readability | expand

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on s390x with gcc
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers	warning	14 maintainers not CCed: void@manifault.com daniel@iogearbox.net yhs@fb.com kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com song@kernel.org sdf@google.com andrii@kernel.org corbet@lwn.net jolsa@kernel.org haoluo@google.com ast@kernel.org linux-doc@vger.kernel.org
netdev/build_clang	fail	Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	warning	WARNING: Missing commit description - Add an appropriate one
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for set-matrix

Context

Check

Description

bpf/vmtest-bpf-next-VM_Test-3

success

Logs for build for s390x with gcc

bpf/vmtest-bpf-next-VM_Test-7

success

Logs for test_maps on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-9

success

Logs for test_maps on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-10

success

Logs for test_maps on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-13

success

Logs for test_progs on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-14

success

Logs for test_progs on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-15

success

Logs for test_progs_no_alu32 on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-17

success

Logs for test_progs_no_alu32 on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-18

success

Logs for test_progs_no_alu32 on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-19

success

Logs for test_progs_no_alu32_parallel on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-20

success

Logs for test_progs_no_alu32_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-21

success

Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-22

success

Logs for test_progs_parallel on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-23

success

Logs for test_progs_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-24

success

Logs for test_progs_parallel on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-25

success

Logs for test_verifier on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-27

success

Logs for test_verifier on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-28

success

Logs for test_verifier on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-29

success

Logs for veristat

bpf/vmtest-bpf-next-VM_Test-11

success

Logs for test_progs on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-16

success

Logs for test_progs_no_alu32 on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-26

success

Logs for test_verifier on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-12

success

Logs for test_progs on s390x with gcc

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-8

success

Logs for test_maps on s390x with gcc

netdev/series_format

success

Single patches do not need cover letters

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 8 this patch: 8

netdev/cc_maintainers

warning

14 maintainers not CCed: void@manifault.com daniel@iogearbox.net yhs@fb.com kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com song@kernel.org sdf@google.com andrii@kernel.org corbet@lwn.net jolsa@kernel.org haoluo@google.com ast@kernel.org linux-doc@vger.kernel.org

netdev/build_clang

fail

Errors and warnings before: 18 this patch: 18

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 8 this patch: 8

netdev/checkpatch

warning

WARNING: Missing commit description - Add an appropriate one

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for build for aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-4

success

Logs for build for x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-5

success

Logs for build for x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-6

success

Logs for set-matrix

Commit Message

Dave Thaler July 6, 2023, 4:05 p.m. UTC

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
--
V1 -> V2: addressed comments from Alexei
---
 Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
 Documentation/bpf/linux-notes.rst     |  5 +++
 2 files changed, 50 insertions(+), 14 deletions(-)

Comments

Alexei Starovoitov July 6, 2023, 8:41 p.m. UTC | #1

On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> --
> V1 -> V2: addressed comments from Alexei
> ---
>  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
>  Documentation/bpf/linux-notes.rst     |  5 +++
>  2 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 751e657973f..740989f4c1e 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
>  
>  This document specifies version 1.0 of the eBPF instruction set.
>  
> +The eBPF instruction set consists of eleven 64 bit registers, a program counter,
> +and an implementation-specific amount (e.g., 512 bytes) of stack space.
> +
>  Documentation conventions
>  =========================
>  
> @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
>  * R6 - R9: callee saved registers that function calls will preserve
>  * R10: read-only frame pointer to access stack
>  
> -R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
> -necessary across calls.
> +Registers R0 - R5 are caller-saved registers, meaning the BPF program needs to either
> +spill them to the BPF stack or move them to callee saved registers if these
> +arguments are to be reused across multiple function calls. Spilling means
> +that the value in the register is moved to the BPF stack. The reverse operation
> +of moving the variable from the BPF stack to the register is called filling.
> +The reason for spilling/filling is due to the limited number of registers.

imo this extended explanation goes too far.
It's also not entirely correct. We could have an ISA with limited number of registers
where every register is callee saved. A bit absurd, but possible.
Or went with SPARC style register windows.

> +
> +Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
> +the input arguments for the program (similar to the argc/argv pair for a typical C program).

argc/argv is only for main(). We don't have main() concept in BPF ISA.
argc/argv is also not a property of ISA.

> +The actual number of registers used, and their meaning, is defined by the program type;
> +for example, a networking program might have an argument that includes network packet data
> +and/or metadata.

that makes things even more confusing.

tbh none of the above changes make the doc easier to read.

>  Instruction encoding
>  ====================
>  
> +An eBPF program is a sequence of instructions.

Kinda true, but it opens the door for plenty of bike shedding.
Is it contiguous sequence? what about subprograms?
Is BPF program a one function or multiple functions?
etc.
Just not worth it.
This is ISA doc.

> +
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> @@ -74,7 +89,7 @@ For example::
>    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
>  
>  Note that most instructions do not use all of the fields.
> -Unused fields shall be cleared to zero.
> +Unused fields must be set to zero.

How is this better?

>  
>  As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
>  instruction uses a 64-bit immediate value that is constructed as follows.
> @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to zero.
>  Instruction classes
>  -------------------
>  
> -The three LSB bits of the 'opcode' field store the instruction class:
> +The encoding of the 'opcode' field varies and can be determined from
> +the three least significant bits (LSB) of the 'opcode' field which holds
> +the "instruction class", as follows:

same question. Don't see an improvement in wording.

>  
>  =========  =====  ===============================  ===================================
>  class      value  description                      reference
> @@ -149,7 +166,8 @@ code            source  instruction class
>  Arithmetic instructions
>  -----------------------
>  
> -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
> +Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
> +of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for

The other part of the doc mentions zeroing. No need to repeat.

>  otherwise identical operations.
>  The 'code' field encodes the operation as below, where 'src' and 'dst' refer
>  to the values of the source and destination registers, respectively.
> @@ -216,8 +234,9 @@ The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
>  The byte swap instructions operate on the destination register
>  only and do not use a separate source register or immediate value.
>  
> -The 1-bit source operand field in the opcode is used to select what byte
> -order the operation convert from or to:
> +Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
> +as follows.  Instead of indicating the source operator, it is instead
> +used to select what byte order the operation converts from or to:

+1 to this part.

>  
>  =========  =====  =================================================
>  source     value  description
> @@ -235,16 +254,21 @@ Examples:
>  
>    dst = htole16(dst)
>  
> +where 'htole16()' indicates converting a 16-bit value from host byte order to little-endian byte order.
> +
>  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
>  
>    dst = htobe64(dst)
>  
> +where 'htobe64()' indicates converting a 64-bit value from host byte order to big-endian byte order.
> +
>  Jump instructions
>  -----------------
>  
> -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
> +Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
>  otherwise identical operations.
> -The 'code' field encodes the operation as below:
> +
> +The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
>  
>  ========  =====  ===  ===========================================  =========================================
>  code      value  src  description                                  notes
> @@ -311,7 +335,8 @@ For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
>  mode          size    instruction class
>  ============  ======  =================
>  
> -The mode modifier is one of:
> +mode
> +  one of:
>  
>    =============  =====  ====================================  =============
>    mode modifier  value  description                           reference
> @@ -323,7 +348,8 @@ The mode modifier is one of:
>    BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
>    =============  =====  ====================================  =============
>  
> -The size modifier is one of:
> +size
> +  one of:
>  
>    =============  =====  =====================
>    size modifier  value  description
> @@ -334,6 +360,9 @@ The size modifier is one of:
>    BPF_DW         0x18   double word (8 bytes)
>    =============  =====  =====================
>  
> +instruction class
> +  the instruction class (see `Instruction classes`_)
> +
>  Regular load and store operations
>  ---------------------------------
>  
> @@ -352,7 +381,7 @@ instructions that transfer data between a register and memory.
>  
>    dst = *(size *) (src + offset)
>  
> -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> +where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
>  
>  Atomic operations
>  -----------------
> @@ -366,7 +395,9 @@ that use the ``BPF_ATOMIC`` mode modifier as follows:
>  
>  * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
>  * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
> -* 8-bit and 16-bit wide atomic operations are not supported.
> +
> +Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not currently supported,
> +nor is ``BPF_ATOMIC | <size> | BPF_ST``.
>  
>  The 'imm' field is used to encode the actual atomic operation.
>  Simple atomic operation use a subset of the values defined to encode
> @@ -390,7 +421,7 @@ BPF_XOR   0xa0   atomic xor
>  
>    *(u64 *)(dst + offset) += src
>  
> -In addition to the simple atomic operations, there also is a modifier and
> +In addition to the simple atomic operations above, there also is a modifier and
>  two complex atomic operations:
>  
>  ===========  ================  ===========================
> diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
> index 508d009d3be..724579fd62d 100644
> --- a/Documentation/bpf/linux-notes.rst
> +++ b/Documentation/bpf/linux-notes.rst
> @@ -7,6 +7,11 @@ Linux implementation notes
>  
>  This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
>  
> +Stack space
> +======================
> +
> +Linux currently supports 512 bytes of stack space.

I wouldn't open this door.
The verifier enforces 512 stack space for bpf prog plus all of its subprogs that it calls directly.
There is no good way to describe it concisely. And such description doesn't belong in ISA doc.

Dave Thaler July 6, 2023, 10:03 p.m. UTC | #2

> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Thursday, July 6, 2023 1:42 PM
> To: Dave Thaler <dthaler1968@googlemail.com>
> Cc: bpf@vger.kernel.org; bpf@ietf.org; Dave Thaler
> <dthaler@microsoft.com>
> Subject: Re: [PATCH bpf-next v2] bpf, docs: Improve English readability
> 
> On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> > From: Dave Thaler <dthaler@microsoft.com>
> >
> > Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> > --
> > V1 -> V2: addressed comments from Alexei
> > ---
> >  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
> >  Documentation/bpf/linux-notes.rst     |  5 +++
> >  2 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/bpf/instruction-set.rst
> > b/Documentation/bpf/instruction-set.rst
> > index 751e657973f..740989f4c1e 100644
> > --- a/Documentation/bpf/instruction-set.rst
> > +++ b/Documentation/bpf/instruction-set.rst
> > @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
> >
> >  This document specifies version 1.0 of the eBPF instruction set.
> >
> > +The eBPF instruction set consists of eleven 64 bit registers, a
> > +program counter, and an implementation-specific amount (e.g., 512 bytes)
> of stack space.
> > +
> >  Documentation conventions
> >  =========================
> >
> > @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
> >  * R6 - R9: callee saved registers that function calls will preserve
> >  * R10: read-only frame pointer to access stack
> >
> > -R0 - R5 are scratch registers and eBPF programs needs to spill/fill
> > them if -necessary across calls.
> > +Registers R0 - R5 are caller-saved registers, meaning the BPF program
> > +needs to either spill them to the BPF stack or move them to callee
> > +saved registers if these arguments are to be reused across multiple
> > +function calls. Spilling means that the value in the register is
> > +moved to the BPF stack. The reverse operation of moving the variable
> from the BPF stack to the register is called filling.
> > +The reason for spilling/filling is due to the limited number of registers.
> 
> imo this extended explanation goes too far.
> It's also not entirely correct. We could have an ISA with limited number of
> registers where every register is callee saved. A bit absurd, but possible.
> Or went with SPARC style register windows.

At https://lore.kernel.org/bpf/20220930221624.mqjrzmdxc6etkadm@macbook-pro-4.dhcp.thefacebook.com/ you said about the above
"I like above clarification though."

I think it's important for interoperability to define which registers are caller-saved
and which are not, so a compiler (or even verifier) can be used for multiple runtimes.

> > +
> > +Upon entering execution of an eBPF program, registers R1 - R5
> > +initially can contain the input arguments for the program (similar to the
> argc/argv pair for a typical C program).
> 
> argc/argv is only for main(). We don't have main() concept in BPF ISA.
> argc/argv is also not a property of ISA.

That's why it's "similar to".  I think the analogy helps understanding for new readers.
 
> > +The actual number of registers used, and their meaning, is defined by
> > +the program type; for example, a networking program might have an
> > +argument that includes network packet data and/or metadata.
> 
> that makes things even more confusing.
> 
> tbh none of the above changes make the doc easier to read.

The program type defines the number and meaning of any arguments passed
to the program.  In the ISA that means the number of registered used to
pass inputs, and their contents.

> >  Instruction encoding
> >  ====================
> >
> > +An eBPF program is a sequence of instructions.
> 
> Kinda true, but it opens the door for plenty of bike shedding.
> Is it contiguous sequence? what about subprograms?
> Is BPF program a one function or multiple functions?

The term "subprogram" is not currently part of the 
instruction-set.rst doc.   "Program-local functions"
are, and the text says they're part of the same BPF program.
Hence the doc already says a BPF program can have multiple
functions.

> etc.
> Just not worth it.
> This is ISA doc.
> 
> > +
> >  eBPF has two instruction encodings:
> >
> >  * the basic instruction encoding, which uses 64 bits to encode an
> > instruction @@ -74,7 +89,7 @@ For example::
> >    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
> >
> >  Note that most instructions do not use all of the fields.
> > -Unused fields shall be cleared to zero.
> > +Unused fields must be set to zero.
> 
> How is this better?

It uses the language common in RFCs.

> >  As discussed below in `64-bit immediate instructions`_, a 64-bit
> > immediate  instruction uses a 64-bit immediate value that is constructed as
> follows.
> > @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to
> zero.
> >  Instruction classes
> >  -------------------
> >
> > -The three LSB bits of the 'opcode' field store the instruction class:
> > +The encoding of the 'opcode' field varies and can be determined from
> > +the three least significant bits (LSB) of the 'opcode' field which
> > +holds the "instruction class", as follows:
> 
> same question. Don't see an improvement in wording.

1. The acronym LSB was not defined and does not have an asterisk by it in
the https://www.rfc-editor.org/materials/abbrev.expansion.txt list.

2. "LSB bits" is redundant.

3. Putting "instruction class" in quotes is common when defining by use
the first time.

> >
> >  =========  =====  ===============================
> ===================================
> >  class      value  description                      reference
> > @@ -149,7 +166,8 @@ code            source  instruction class
> >  Arithmetic instructions
> >  -----------------------
> >
> > -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit
> > wide operands for
> > +Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the
> > +upper 32 bits of the destination register) while ``BPF_ALU64`` uses
> > +64-bit wide operands for
> 
> The other part of the doc mentions zeroing. No need to repeat.
> 
> >  otherwise identical operations.
> >  The 'code' field encodes the operation as below, where 'src' and
> > 'dst' refer  to the values of the source and destination registers,
> respectively.
> > @@ -216,8 +234,9 @@ The byte swap instructions use an instruction
> > class of ``BPF_ALU`` and a 4-bit  The byte swap instructions operate
> > on the destination register  only and do not use a separate source register
> or immediate value.
> >
> > -The 1-bit source operand field in the opcode is used to select what
> > byte -order the operation convert from or to:
> > +Byte swap instructions use the 1-bit 'source' field in the 'opcode'
> > +field as follows.  Instead of indicating the source operator, it is
> > +instead used to select what byte order the operation converts from or to:
> 
> +1 to this part.
> 
> >
> >  =========  =====
> =================================================
> >  source     value  description
> > @@ -235,16 +254,21 @@ Examples:
> >
> >    dst = htole16(dst)
> >
> > +where 'htole16()' indicates converting a 16-bit value from host byte order
> to little-endian byte order.
> > +
> >  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> >
> >    dst = htobe64(dst)
> >
> > +where 'htobe64()' indicates converting a 64-bit value from host byte order
> to big-endian byte order.
> > +
> >  Jump instructions
> >  -----------------
> >
> > -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit
> > wide operands for
> > +Instruction class ``BPF_JMP32`` uses 32-bit wide operands while
> > +``BPF_JMP`` uses 64-bit wide operands for
> >  otherwise identical operations.
> > -The 'code' field encodes the operation as below:
> > +
> > +The 4-bit 'code' field encodes the operation as below, where PC is the
> program counter:
> >
> >  ========  =====  ===  ===========================================
> =========================================
> >  code      value  src  description                                  notes
> > @@ -311,7 +335,8 @@ For load and store instructions (``BPF_LD``,
> ``BPF_LDX``, ``BPF_ST``, and ``BPF_
> >  mode          size    instruction class
> >  ============  ======  =================
> >
> > -The mode modifier is one of:
> > +mode
> > +  one of:
> >
> >    =============  =====  ====================================
> =============
> >    mode modifier  value  description                           reference
> > @@ -323,7 +348,8 @@ The mode modifier is one of:
> >    BPF_ATOMIC     0xc0   atomic operations                     `Atomic
> operations`_
> >    =============  =====  ====================================
> > =============
> >
> > -The size modifier is one of:
> > +size
> > +  one of:
> >
> >    =============  =====  =====================
> >    size modifier  value  description
> > @@ -334,6 +360,9 @@ The size modifier is one of:
> >    BPF_DW         0x18   double word (8 bytes)
> >    =============  =====  =====================
> >
> > +instruction class
> > +  the instruction class (see `Instruction classes`_)
> > +
> >  Regular load and store operations
> >  ---------------------------------
> >
> > @@ -352,7 +381,7 @@ instructions that transfer data between a register
> and memory.
> >
> >    dst = *(size *) (src + offset)
> >
> > -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> > +where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> >
> >  Atomic operations
> >  -----------------
> > @@ -366,7 +395,9 @@ that use the ``BPF_ATOMIC`` mode modifier as
> follows:
> >
> >  * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
> >  * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
> > -* 8-bit and 16-bit wide atomic operations are not supported.
> > +
> > +Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic
> > +operations are not currently supported, nor is ``BPF_ATOMIC | <size> |
> BPF_ST``.
> >
> >  The 'imm' field is used to encode the actual atomic operation.
> >  Simple atomic operation use a subset of the values defined to encode
> > @@ -390,7 +421,7 @@ BPF_XOR   0xa0   atomic xor
> >
> >    *(u64 *)(dst + offset) += src
> >
> > -In addition to the simple atomic operations, there also is a modifier
> > and
> > +In addition to the simple atomic operations above, there also is a
> > +modifier and
> >  two complex atomic operations:
> >
> >  ===========  ================  =========================== diff --git
> > a/Documentation/bpf/linux-notes.rst
> > b/Documentation/bpf/linux-notes.rst
> > index 508d009d3be..724579fd62d 100644
> > --- a/Documentation/bpf/linux-notes.rst
> > +++ b/Documentation/bpf/linux-notes.rst
> > @@ -7,6 +7,11 @@ Linux implementation notes
> >
> >  This document provides more details specific to the Linux kernel
> implementation of the eBPF instruction set.
> >
> > +Stack space
> > +======================
> > +
> > +Linux currently supports 512 bytes of stack space.
> 
> I wouldn't open this door.
> The verifier enforces 512 stack space for bpf prog plus all of its subprogs that
> it calls directly.
> There is no good way to describe it concisely. And such description doesn't
> belong in ISA doc.

linux-notes.rst is not in the ISA doc.   The ISA doc says the value is implementation
defined.  linux-notes.rst says what Linux does for things the ISA doc leaves up
to the implementation.

Dave

Alexei Starovoitov July 6, 2023, 11:14 p.m. UTC | #3

On Thu, Jul 6, 2023 at 3:04 PM Dave Thaler <dthaler@microsoft.com> wrote:
>
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Thursday, July 6, 2023 1:42 PM
> > To: Dave Thaler <dthaler1968@googlemail.com>
> > Cc: bpf@vger.kernel.org; bpf@ietf.org; Dave Thaler
> > <dthaler@microsoft.com>
> > Subject: Re: [PATCH bpf-next v2] bpf, docs: Improve English readability
> >
> > On Thu, Jul 06, 2023 at 04:05:37PM +0000, Dave Thaler wrote:
> > > From: Dave Thaler <dthaler@microsoft.com>
> > >
> > > Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> > > --
> > > V1 -> V2: addressed comments from Alexei
> > > ---
> > >  Documentation/bpf/instruction-set.rst | 59 ++++++++++++++++++++-------
> > >  Documentation/bpf/linux-notes.rst     |  5 +++
> > >  2 files changed, 50 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/Documentation/bpf/instruction-set.rst
> > > b/Documentation/bpf/instruction-set.rst
> > > index 751e657973f..740989f4c1e 100644
> > > --- a/Documentation/bpf/instruction-set.rst
> > > +++ b/Documentation/bpf/instruction-set.rst
> > > @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
> > >
> > >  This document specifies version 1.0 of the eBPF instruction set.
> > >
> > > +The eBPF instruction set consists of eleven 64 bit registers, a
> > > +program counter, and an implementation-specific amount (e.g., 512 bytes)
> > of stack space.
> > > +
> > >  Documentation conventions
> > >  =========================
> > >
> > > @@ -27,12 +30,24 @@ The eBPF calling convention is defined as:
> > >  * R6 - R9: callee saved registers that function calls will preserve
> > >  * R10: read-only frame pointer to access stack
> > >
> > > -R0 - R5 are scratch registers and eBPF programs needs to spill/fill
> > > them if -necessary across calls.
> > > +Registers R0 - R5 are caller-saved registers, meaning the BPF program
> > > +needs to either spill them to the BPF stack or move them to callee
> > > +saved registers if these arguments are to be reused across multiple
> > > +function calls. Spilling means that the value in the register is
> > > +moved to the BPF stack. The reverse operation of moving the variable
> > from the BPF stack to the register is called filling.
> > > +The reason for spilling/filling is due to the limited number of registers.
> >
> > imo this extended explanation goes too far.
> > It's also not entirely correct. We could have an ISA with limited number of
> > registers where every register is callee saved. A bit absurd, but possible.
> > Or went with SPARC style register windows.
>
> At https://lore.kernel.org/bpf/20220930221624.mqjrzmdxc6etkadm@macbook-pro-4.dhcp.thefacebook.com/ you said about the above
> "I like above clarification though."

That was on "30 Sep 2022".
I like to change my mind often enough to confuse everyone :)

> I think it's important for interoperability to define which registers are caller-saved
> and which are not, so a compiler (or even verifier) can be used for multiple runtimes.

but it really doesn't belong in the ISA doc.
We've discussed it at length.
We need the psABI doc.
ISA doc is a description of instructions and _not_ how they shape
into functions and programs.
More below.

>
> > > +
> > > +Upon entering execution of an eBPF program, registers R1 - R5
> > > +initially can contain the input arguments for the program (similar to the
> > argc/argv pair for a typical C program).
> >
> > argc/argv is only for main(). We don't have main() concept in BPF ISA.
> > argc/argv is also not a property of ISA.
>
> That's why it's "similar to".  I think the analogy helps understanding for new readers.

argc is a K&R C thing. Not even a calling convention.
That sentence is a combination of analogies from different areas.
argc could be interpreted that the first argument is a count
and a second argument is an array.
See the confusion it might cause?

>
> > > +The actual number of registers used, and their meaning, is defined by
> > > +the program type; for example, a networking program might have an
> > > +argument that includes network packet data and/or metadata.
> >
> > that makes things even more confusing.
> >
> > tbh none of the above changes make the doc easier to read.
>
> The program type defines the number and meaning of any arguments passed
> to the program.  In the ISA that means the number of registered used to
> pass inputs, and their contents.

Not at all. ISA is an instruction set only and this doc is for ISA.
What different program types should accept belongs in a different doc.
And it's not a psABI doc.
More below.

>
> > >  Instruction encoding
> > >  ====================
> > >
> > > +An eBPF program is a sequence of instructions.
> >
> > Kinda true, but it opens the door for plenty of bike shedding.
> > Is it contiguous sequence? what about subprograms?
> > Is BPF program a one function or multiple functions?
>
> The term "subprogram" is not currently part of the
> instruction-set.rst doc.   "Program-local functions"
> are, and the text says they're part of the same BPF program.
> Hence the doc already says a BPF program can have multiple
> functions.

Yes. It's a mess, but we must not make it worse.
instruction-set.rst is for instructions.
Definition of BPF program belongs to a different doc.

> > etc.
> > Just not worth it.
> > This is ISA doc.
> >
> > > +
> > >  eBPF has two instruction encodings:
> > >
> > >  * the basic instruction encoding, which uses 64 bits to encode an
> > > instruction @@ -74,7 +89,7 @@ For example::
> > >    07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
> > >
> > >  Note that most instructions do not use all of the fields.
> > > -Unused fields shall be cleared to zero.
> > > +Unused fields must be set to zero.
> >
> > How is this better?
>
> It uses the language common in RFCs.
>
> > >  As discussed below in `64-bit immediate instructions`_, a 64-bit
> > > immediate  instruction uses a 64-bit immediate value that is constructed as
> > follows.
> > > @@ -103,7 +118,9 @@ instruction are reserved and shall be cleared to
> > zero.
> > >  Instruction classes
> > >  -------------------
> > >
> > > -The three LSB bits of the 'opcode' field store the instruction class:
> > > +The encoding of the 'opcode' field varies and can be determined from
> > > +the three least significant bits (LSB) of the 'opcode' field which
> > > +holds the "instruction class", as follows:
> >
> > same question. Don't see an improvement in wording.
>
> 1. The acronym LSB was not defined and does not have an asterisk by it in
> the https://www.rfc-editor.org/materials/abbrev.expansion.txt list.
>
> 2. "LSB bits" is redundant.
>
> 3. Putting "instruction class" in quotes is common when defining by use
> the first time.

ok. fair enough.

>
> linux-notes.rst is not in the ISA doc.   The ISA doc says the value is implementation
> defined.  linux-notes.rst says what Linux does for things the ISA doc leaves up
> to the implementation.

I see no value in "Linux currently supports 512 bytes of stack space" sentence.
It's too ambiguous to be useful.

instruction-set.rst is a description of ISA. We should remove things
from it that don't belong instead of doubling down on the current mess.

we need the psABI doc that will not be standardized as Jose recommended.
That doc will describe recommended calling convention, argument
promotion, stack usage, relocation, function/subprogram definition, etc
psABI probably should include BTF.ext description, but I'm open to alternatives.

BPF program types supported by the kernel is a 3rd document.
There we can explain that XDP prog takes a single ctx argument and it's
a pointer to ...
Such info is linux specific. Based on ISA and psABI one can come up
with different program/map types and semantics of their arguments
while being fully compliant with ISA and psABI docs.
Such doc can start its journey in linux-notes.rst and then split, if necessary.

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 751e657973f..740989f4c1e 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -7,6 +7,9 @@  eBPF Instruction Set Specification, v1.0
 
 This document specifies version 1.0 of the eBPF instruction set.
 
+The eBPF instruction set consists of eleven 64 bit registers, a program counter,
+and an implementation-specific amount (e.g., 512 bytes) of stack space.
+
 Documentation conventions
 =========================
 
@@ -27,12 +30,24 @@  The eBPF calling convention is defined as:
 * R6 - R9: callee saved registers that function calls will preserve
 * R10: read-only frame pointer to access stack
 
-R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
-necessary across calls.
+Registers R0 - R5 are caller-saved registers, meaning the BPF program needs to either
+spill them to the BPF stack or move them to callee saved registers if these
+arguments are to be reused across multiple function calls. Spilling means
+that the value in the register is moved to the BPF stack. The reverse operation
+of moving the variable from the BPF stack to the register is called filling.
+The reason for spilling/filling is due to the limited number of registers.
+
+Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
+the input arguments for the program (similar to the argc/argv pair for a typical C program).
+The actual number of registers used, and their meaning, is defined by the program type;
+for example, a networking program might have an argument that includes network packet data
+and/or metadata.
 
 Instruction encoding
 ====================
 
+An eBPF program is a sequence of instructions.
+
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
@@ -74,7 +89,7 @@  For example::
   07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
 
 Note that most instructions do not use all of the fields.
-Unused fields shall be cleared to zero.
+Unused fields must be set to zero.
 
 As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
 instruction uses a 64-bit immediate value that is constructed as follows.
@@ -103,7 +118,9 @@  instruction are reserved and shall be cleared to zero.
 Instruction classes
 -------------------
 
-The three LSB bits of the 'opcode' field store the instruction class:
+The encoding of the 'opcode' field varies and can be determined from
+the three least significant bits (LSB) of the 'opcode' field which holds
+the "instruction class", as follows:
 
 =========  =====  ===============================  ===================================
 class      value  description                      reference
@@ -149,7 +166,8 @@  code            source  instruction class
 Arithmetic instructions
 -----------------------
 
-``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
+Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
+of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for
 otherwise identical operations.
 The 'code' field encodes the operation as below, where 'src' and 'dst' refer
 to the values of the source and destination registers, respectively.
@@ -216,8 +234,9 @@  The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
 The byte swap instructions operate on the destination register
 only and do not use a separate source register or immediate value.
 
-The 1-bit source operand field in the opcode is used to select what byte
-order the operation convert from or to:
+Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
+as follows.  Instead of indicating the source operator, it is instead
+used to select what byte order the operation converts from or to:
 
 =========  =====  =================================================
 source     value  description
@@ -235,16 +254,21 @@  Examples:
 
   dst = htole16(dst)
 
+where 'htole16()' indicates converting a 16-bit value from host byte order to little-endian byte order.
+
 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
 
   dst = htobe64(dst)
 
+where 'htobe64()' indicates converting a 64-bit value from host byte order to big-endian byte order.
+
 Jump instructions
 -----------------
 
-``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
+Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
 otherwise identical operations.
-The 'code' field encodes the operation as below:
+
+The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
 
 ========  =====  ===  ===========================================  =========================================
 code      value  src  description                                  notes
@@ -311,7 +335,8 @@  For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
 mode          size    instruction class
 ============  ======  =================
 
-The mode modifier is one of:
+mode
+  one of:
 
   =============  =====  ====================================  =============
   mode modifier  value  description                           reference
@@ -323,7 +348,8 @@  The mode modifier is one of:
   BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
   =============  =====  ====================================  =============
 
-The size modifier is one of:
+size
+  one of:
 
   =============  =====  =====================
   size modifier  value  description
@@ -334,6 +360,9 @@  The size modifier is one of:
   BPF_DW         0x18   double word (8 bytes)
   =============  =====  =====================
 
+instruction class
+  the instruction class (see `Instruction classes`_)
+
 Regular load and store operations
 ---------------------------------
 
@@ -352,7 +381,7 @@  instructions that transfer data between a register and memory.
 
   dst = *(size *) (src + offset)
 
-Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
+where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
 
 Atomic operations
 -----------------
@@ -366,7 +395,9 @@  that use the ``BPF_ATOMIC`` mode modifier as follows:
 
 * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
 * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
-* 8-bit and 16-bit wide atomic operations are not supported.
+
+Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not currently supported,
+nor is ``BPF_ATOMIC | <size> | BPF_ST``.
 
 The 'imm' field is used to encode the actual atomic operation.
 Simple atomic operation use a subset of the values defined to encode
@@ -390,7 +421,7 @@  BPF_XOR   0xa0   atomic xor
 
   *(u64 *)(dst + offset) += src
 
-In addition to the simple atomic operations, there also is a modifier and
+In addition to the simple atomic operations above, there also is a modifier and
 two complex atomic operations:
 
 ===========  ================  ===========================
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 508d009d3be..724579fd62d 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -7,6 +7,11 @@  Linux implementation notes
 
 This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
 
+Stack space
+======================
+
+Linux currently supports 512 bytes of stack space.
+
 Byte swap instructions
 ======================

[bpf-next,v2] bpf, docs: Improve English readability

Checks

Commit Message

Comments

Patch