diff mbox series

bpf, docs: Document BPF insn encoding in term of stored bytes

Message ID 87y1om25l4.fsf@oracle.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf, docs: Document BPF insn encoding in term of stored bytes | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc

Commit Message

Jose E. Marchesi Feb. 24, 2023, 8:04 p.m. UTC
This patch modifies instruction-set.rst so it documents the encoding
of BPF instructions in terms of how the bytes are stored (be it in an
ELF file or as bytes in a memory buffer to be loaded into the kernel
or some other BPF consumer) as opposed to how the instruction looks
like once loaded.

This is hopefully easier to understand by implementors looking to
generate and/or consume bytes conforming BPF instructions.

The patch also clarifies that the unused bytes in a pseudo-instruction
shall be cleared with zeros.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
---
 Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
 1 file changed, 21 insertions(+), 22 deletions(-)

Comments

Dave Thaler Feb. 24, 2023, 8:44 p.m. UTC | #1
> -----Original Message-----
> From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
> Sent: Friday, February 24, 2023 12:04 PM
> To: bpf <bpf@vger.kernel.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org
> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of
> stored bytes
> 
> 
> This patch modifies instruction-set.rst so it documents the encoding of BPF
> instructions in terms of how the bytes are stored (be it in an ELF file or as
> bytes in a memory buffer to be loaded into the kernel or some other BPF
> consumer) as opposed to how the instruction looks like once loaded.
> 
> This is hopefully easier to understand by implementors looking to generate
> and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> ---
>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>  1 file changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst
> b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..9b28c0e15bb6 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate
> (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
> 
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order:
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

Personally I find this notation harder to understand in general.
For example, it encodes (without explanation) the C language
assumption that "//" is a comment, ":" indicates a bit width,
and the fields are in order from most significate byte to least
significant byte.  The text before this change has no such
unexplained assumptions. 

[...]
> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
> the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big

Similar assumption without explanation of "//" meaning comment, and
some implied tabular formatting without being an actual table?

[...]
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure:
> +
> +  basic_instruction                 pseudo_instruction
> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32

And here the use of "|" above I find confusing.

What do others think?

Dave
Jose E. Marchesi Feb. 24, 2023, 8:59 p.m. UTC | #2
>> -----Original Message-----
>> From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
>> Sent: Friday, February 24, 2023 12:04 PM
>> To: bpf <bpf@vger.kernel.org>
>> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org
>> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of
>> stored bytes
>> 
>> 
>> This patch modifies instruction-set.rst so it documents the encoding of BPF
>> instructions in terms of how the bytes are stored (be it in an ELF file or as
>> bytes in a memory buffer to be loaded into the kernel or some other BPF
>> consumer) as opposed to how the instruction looks like once loaded.
>> 
>> This is hopefully easier to understand by implementors looking to generate
>> and/or consume bytes conforming BPF instructions.
>> 
>> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
>> cleared with zeros.
>> 
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
>> ---
>>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>>  1 file changed, 21 insertions(+), 22 deletions(-)
>> 
>> diff --git a/Documentation/bpf/instruction-set.rst
>> b/Documentation/bpf/instruction-set.rst
>> index 01802ed9b29b..9b28c0e15bb6 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>>  * the wide instruction encoding, which appends a second 64-bit immediate
>> (i.e.,
>>    constant) value after the basic instruction for a total of 128 bits.
>> 
>> -The basic instruction encoding looks as follows for a little-endian processor,
>> -where MSB and LSB mean the most significant bits and least significant bits,
>> -respectively:
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order:
>> 
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   src_reg  dst_reg  opcode
>> -=============  =======  =======  =======  ============
>> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
>> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> Personally I find this notation harder to understand in general.
> For example, it encodes (without explanation) the C language
> assumption that "//" is a comment, ":" indicates a bit width,
> and the fields are in order from most significate byte to least
> significant byte.  The text before this change has no such
> unexplained assumptions. 

The fields are not ordered from "most significative byte" to "least
significative byte".  The fields are ordered as they are stored.  Thats
the whole point of the patch.

As for //, :N and | below, I think these signs are obvious enough to not
require further explanation, but I wouldn't mind to use some other
better notation, if you can suggest one. I am not a very graphical
person myself.

>
> [...]
>> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
>> the processor.
>> +  opcode         offset imm          assembly
>> +         src dst
>> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
>> +         dst src
>> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>
> Similar assumption without explanation of "//" meaning comment, and
> some implied tabular formatting without being an actual table?

It is intended to be a diagram, not a table.  I used indentation which
AFAIK is the rst way to denote multi-line verbatim environments... is
that wrong for ascii-art diagrams?

> [...]
>> -=================  ==================
>> -64 bits (MSB)      64 bits (LSB)
>> -=================  ==================
>> -basic instruction  pseudo instruction
>> -=================  ==================
>> +This is depicted in the following figure:
>> +
>> +  basic_instruction                 pseudo_instruction
>> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32
>
> And here the use of "|" above I find confusing.
>
> What do others think?
>
> Dave
Alexei Starovoitov Feb. 25, 2023, 12:20 a.m. UTC | #3
On Fri, Feb 24, 2023 at 12:59 PM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> >> -----Original Message-----
> >> From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
> >> Sent: Friday, February 24, 2023 12:04 PM
> >> To: bpf <bpf@vger.kernel.org>
> >> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org
> >> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of
> >> stored bytes
> >>
> >>
> >> This patch modifies instruction-set.rst so it documents the encoding of BPF
> >> instructions in terms of how the bytes are stored (be it in an ELF file or as
> >> bytes in a memory buffer to be loaded into the kernel or some other BPF
> >> consumer) as opposed to how the instruction looks like once loaded.
> >>
> >> This is hopefully easier to understand by implementors looking to generate
> >> and/or consume bytes conforming BPF instructions.
> >>
> >> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> >> cleared with zeros.
> >>
> >> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> >> ---
> >>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
> >>  1 file changed, 21 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/Documentation/bpf/instruction-set.rst
> >> b/Documentation/bpf/instruction-set.rst
> >> index 01802ed9b29b..9b28c0e15bb6 100644
> >> --- a/Documentation/bpf/instruction-set.rst
> >> +++ b/Documentation/bpf/instruction-set.rst
> >> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
> >>  * the wide instruction encoding, which appends a second 64-bit immediate
> >> (i.e.,
> >>    constant) value after the basic instruction for a total of 128 bits.
> >>
> >> -The basic instruction encoding looks as follows for a little-endian processor,
> >> -where MSB and LSB mean the most significant bits and least significant bits,
> >> -respectively:
> >> +The fields conforming an encoded basic instruction are stored in the
> >> +following order:
> >>
> >> -=============  =======  =======  =======  ============
> >> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> >> -=============  =======  =======  =======  ============
> >> -imm            offset   src_reg  dst_reg  opcode
> >> -=============  =======  =======  =======  ============
> >> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> >> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
> >
> > Personally I find this notation harder to understand in general.
> > For example, it encodes (without explanation) the C language
> > assumption that "//" is a comment, ":" indicates a bit width,
> > and the fields are in order from most significate byte to least
> > significant byte.  The text before this change has no such
> > unexplained assumptions.
>
> The fields are not ordered from "most significative byte" to "least
> significative byte".  The fields are ordered as they are stored.  Thats
> the whole point of the patch.
>
> As for //, :N and | below, I think these signs are obvious enough to not
> require further explanation, but I wouldn't mind to use some other
> better notation, if you can suggest one. I am not a very graphical
> person myself.

We're using plenty of C lingvo in this doc including:
dst = dst ^ imm32
and
dst = (u32) ((u32) dst + (u32) src)

So I think '//' is fine to have without extra verbosity.

imo the patch is a nice improvement in readability.
The ldimm64 part is especially nice.
kernel test robot Feb. 25, 2023, 1:07 p.m. UTC | #4
Hi Jose,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]
[cannot apply to bpf/master linus/master v6.2 next-20230225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jose-E-Marchesi/bpf-docs-Document-BPF-insn-encoding-in-term-of-stored-bytes/20230225-040647
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/87y1om25l4.fsf%40oracle.com
patch subject: [PATCH] bpf, docs: Document BPF insn encoding in term of stored bytes
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/018d3423bad903b4544673361f6df2ea28ce047a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Jose-E-Marchesi/bpf-docs-Document-BPF-insn-encoding-in-term-of-stored-bytes/20230225-040647
        git checkout 018d3423bad903b4544673361f6df2ea28ce047a
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302252028.csJFgGqg-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/bpf/instruction-set.rst:75: WARNING: Definition list ends without a blank line; unexpected unindent.

vim +75 Documentation/bpf/instruction-set.rst

    70	
    71	  opcode         offset imm          assembly
    72	         src dst
    73	  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
    74	         dst src
  > 75	  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
    76
David Vernet Feb. 27, 2023, 4:52 p.m. UTC | #5
On Fri, Feb 24, 2023 at 09:04:07PM +0100, Jose E. Marchesi wrote:
> 
> This patch modifies instruction-set.rst so it documents the encoding
> of BPF instructions in terms of how the bytes are stored (be it in an
> ELF file or as bytes in a memory buffer to be loaded into the kernel
> or some other BPF consumer) as opposed to how the instruction looks
> like once loaded.
> 
> This is hopefully easier to understand by implementors looking to
> generate and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction
> shall be cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>

Hi Jose,

Thanks for writing this up.

> ---
>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>  1 file changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..9b28c0e15bb6 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
>  
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order:
>  
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

Unfortunately this won't render correctly. It'll look something like
this:

The fields conforming an encoded basic instruction are stored in the following order:

opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF. opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

You'll have to add some extra newlines. You can test out your changes
with:

make SPHINXDIRS=bpf htmldocs

And then the output is put in Documentation/output/bpf

In general, this is sort of the problem we have with rst. We want to
strike a balance between readable in a text editor, and readable when
rendered in a web browser. I think we can strike such a balance here,
but it'll probably require a bit of rst-fu. As described below, I think
we can fix this with a literal code block by just adding a : to order:

> +The fields conforming an encoded basic instruction are stored in the
> +following order::


> +
> +Where,
>  
>  **imm**
>    signed integer immediate value
> @@ -64,16 +62,17 @@ imm            offset   src_reg  dst_reg  opcode
>  **opcode**
>    operation to perform
>  
> -and as follows for a big-endian processor:
> +Note that the contents of multi-byte fields ('imm' and 'offset') are
> +stored using big-endian byte ordering in big-endian BPF and
> +little-endian byte ordering in little-endian BPF.
>  
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   dst_reg  src_reg  opcode
> -=============  =======  =======  =======  ============
> +For example:
>  
> -Multi-byte fields ('imm' and 'offset') are similarly stored in
> -the byte order of the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big

This also won't render. rst will think it's a "definition list" (see
[0]), so it's interpreting the line with '// big' as a term that will be
defined on the next line.

[0]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks

If you do "For example::" it should render correctly. This applies
elsewhere to the patch. Let's just make all of these literal code
blocks.

Thanks,
David

>  
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
> @@ -84,18 +83,18 @@ The 64 bits following the basic instruction contain a pseudo instruction
>  using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
>  and imm containing the high 32 bits of the immediate value.
>  
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure:
> +
> +  basic_instruction                 pseudo_instruction
> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32
>  
>  Thus the 64-bit immediate value is constructed as follows:
>  
>    imm64 = (next_imm << 32) | imm
>  
>  where 'next_imm' refers to the imm value of the pseudo instruction
> -following the basic instruction.
> +following the basic instruction.  The unused bytes in the pseudo
> +instruction shall be cleared to zero.
>  
>  Instruction classes
>  -------------------
> -- 
> 2.30.2
> 
> -- 
> Bpf mailing list
> Bpf@ietf.org
> https://www.ietf.org/mailman/listinfo/bpf
Jose E. Marchesi Feb. 27, 2023, 6:05 p.m. UTC | #6
Hi David.

Thanks for the comments, and sorry for the rst formatting mistakes.  I
forgot to add the :: before the indented blocks.  I'm resending with
these.

> On Fri, Feb 24, 2023 at 09:04:07PM +0100, Jose E. Marchesi wrote:
>> 
>> This patch modifies instruction-set.rst so it documents the encoding
>> of BPF instructions in terms of how the bytes are stored (be it in an
>> ELF file or as bytes in a memory buffer to be loaded into the kernel
>> or some other BPF consumer) as opposed to how the instruction looks
>> like once loaded.
>> 
>> This is hopefully easier to understand by implementors looking to
>> generate and/or consume bytes conforming BPF instructions.
>> 
>> The patch also clarifies that the unused bytes in a pseudo-instruction
>> shall be cleared with zeros.
>> 
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
>
> Hi Jose,
>
> Thanks for writing this up.
>
>> ---
>>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>>  1 file changed, 21 insertions(+), 22 deletions(-)
>> 
>> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> index 01802ed9b29b..9b28c0e15bb6 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>>    constant) value after the basic instruction for a total of 128 bits.
>>  
>> -The basic instruction encoding looks as follows for a little-endian processor,
>> -where MSB and LSB mean the most significant bits and least significant bits,
>> -respectively:
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order:
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   src_reg  dst_reg  opcode
>> -=============  =======  =======  =======  ============
>> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
>> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> Unfortunately this won't render correctly. It'll look something like
> this:
>
> The fields conforming an encoded basic instruction are stored in the following order:
>
> opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian
> BPF. opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> You'll have to add some extra newlines. You can test out your changes
> with:
>
> make SPHINXDIRS=bpf htmldocs
>
> And then the output is put in Documentation/output/bpf
>
> In general, this is sort of the problem we have with rst. We want to
> strike a balance between readable in a text editor, and readable when
> rendered in a web browser. I think we can strike such a balance here,
> but it'll probably require a bit of rst-fu. As described below, I think
> we can fix this with a literal code block by just adding a : to order:
>
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order::
>
>
>> +
>> +Where,
>>  
>>  **imm**
>>    signed integer immediate value
>> @@ -64,16 +62,17 @@ imm            offset   src_reg  dst_reg  opcode
>>  **opcode**
>>    operation to perform
>>  
>> -and as follows for a big-endian processor:
>> +Note that the contents of multi-byte fields ('imm' and 'offset') are
>> +stored using big-endian byte ordering in big-endian BPF and
>> +little-endian byte ordering in little-endian BPF.
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   dst_reg  src_reg  opcode
>> -=============  =======  =======  =======  ============
>> +For example:
>>  
>> -Multi-byte fields ('imm' and 'offset') are similarly stored in
>> -the byte order of the processor.
>> +  opcode         offset imm          assembly
>> +         src dst
>> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
>> +         dst src
>> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>
> This also won't render. rst will think it's a "definition list" (see
> [0]), so it's interpreting the line with '// big' as a term that will be
> defined on the next line.
>
> [0]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks
>
> If you do "For example::" it should render correctly. This applies
> elsewhere to the patch. Let's just make all of these literal code
> blocks.
>
> Thanks,
> David
>
>>  
>>  Note that most instructions do not use all of the fields.
>>  Unused fields shall be cleared to zero.
>> @@ -84,18 +83,18 @@ The 64 bits following the basic instruction contain a pseudo instruction
>>  using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
>>  and imm containing the high 32 bits of the immediate value.
>>  
>> -=================  ==================
>> -64 bits (MSB)      64 bits (LSB)
>> -=================  ==================
>> -basic instruction  pseudo instruction
>> -=================  ==================
>> +This is depicted in the following figure:
>> +
>> +  basic_instruction                 pseudo_instruction
>> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32
>>  
>>  Thus the 64-bit immediate value is constructed as follows:
>>  
>>    imm64 = (next_imm << 32) | imm
>>  
>>  where 'next_imm' refers to the imm value of the pseudo instruction
>> -following the basic instruction.
>> +following the basic instruction.  The unused bytes in the pseudo
>> +instruction shall be cleared to zero.
>>  
>>  Instruction classes
>>  -------------------
>> -- 
>> 2.30.2
>> 
>> -- 
>> Bpf mailing list
>> Bpf@ietf.org
>> https://www.ietf.org/mailman/listinfo/bpf
diff mbox series

Patch

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 01802ed9b29b..9b28c0e15bb6 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -38,15 +38,13 @@  eBPF has two instruction encodings:
 * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
   constant) value after the basic instruction for a total of 128 bits.
 
-The basic instruction encoding looks as follows for a little-endian processor,
-where MSB and LSB mean the most significant bits and least significant bits,
-respectively:
+The fields conforming an encoded basic instruction are stored in the
+following order:
 
-=============  =======  =======  =======  ============
-32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
-=============  =======  =======  =======  ============
-imm            offset   src_reg  dst_reg  opcode
-=============  =======  =======  =======  ============
+  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
+  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
+
+Where,
 
 **imm**
   signed integer immediate value
@@ -64,16 +62,17 @@  imm            offset   src_reg  dst_reg  opcode
 **opcode**
   operation to perform
 
-and as follows for a big-endian processor:
+Note that the contents of multi-byte fields ('imm' and 'offset') are
+stored using big-endian byte ordering in big-endian BPF and
+little-endian byte ordering in little-endian BPF.
 
-=============  =======  =======  =======  ============
-32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
-=============  =======  =======  =======  ============
-imm            offset   dst_reg  src_reg  opcode
-=============  =======  =======  =======  ============
+For example:
 
-Multi-byte fields ('imm' and 'offset') are similarly stored in
-the byte order of the processor.
+  opcode         offset imm          assembly
+         src dst
+  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
+         dst src
+  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
 
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
@@ -84,18 +83,18 @@  The 64 bits following the basic instruction contain a pseudo instruction
 using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
 and imm containing the high 32 bits of the immediate value.
 
-=================  ==================
-64 bits (MSB)      64 bits (LSB)
-=================  ==================
-basic instruction  pseudo instruction
-=================  ==================
+This is depicted in the following figure:
+
+  basic_instruction                 pseudo_instruction
+  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32
 
 Thus the 64-bit immediate value is constructed as follows:
 
   imm64 = (next_imm << 32) | imm
 
 where 'next_imm' refers to the imm value of the pseudo instruction
-following the basic instruction.
+following the basic instruction.  The unused bytes in the pseudo
+instruction shall be cleared to zero.
 
 Instruction classes
 -------------------