Message ID | 20230805030921.52035-1-hawkinsw@obs.cr (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [v3,1/2] bpf, docs: Formalize type notation and function semantics in ISA standard | expand |
On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote: > Give a single place where the shorthand for types are defined, the > semantics of helper functions are described, and certain terms can > be defined. > > Signed-off-by: Will Hawkins <hawkinsw@obs.cr> Hi Will, This is looking great. Left a couple more comments below, let me know what you think. > --- > .../bpf/standardization/instruction-set.rst | 79 +++++++++++++++++-- > 1 file changed, 71 insertions(+), 8 deletions(-) > > Changelog: > v1 -> v2: > - Remove change to Sphinx version > - Address David's comments > - Address Dave's comments > v2 -> v3: > - Add information about sign extending > > diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst > index 655494ac7af6..fe296f35e5a7 100644 > --- a/Documentation/bpf/standardization/instruction-set.rst > +++ b/Documentation/bpf/standardization/instruction-set.rst > @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set. > Documentation conventions > ========================= > > -For brevity, this document uses the type notion "u64", "u32", etc. > -to mean an unsigned integer whose width is the specified number of bits, > -and "s32", etc. to mean a signed integer of the specified number of bits. > +For brevity and consistency, this document refers to families > +of types using a shorthand syntax and refers to several expository, > +mnemonic functions when describing the semantics of opcodes. The range Hmm, I wonder if it's slightly more accurate to say that those functions are describing the semantics of "instructions" rather than "opcodes", given that the value in the immediate for the byte swap instructions dictate the width. What do you think? > +of valid values for those types and the semantics of those functions > +are defined in the following subsections. > + > +Types > +----- > +This document refers to integer types with a notation of the form `SN` > +that succinctly defines whether their values are signed or unsigned Suggestion: I don't think we need the word "succinctly" here. I'm also inclined to suggest that we avoid using the word "define" here, as the notation itself isn't really defining the values of the types, but is rather just a shorthand. > +numbers and the bit widths: What do you think about this phrasing: This document refers to integer types with the notation `SN` to specify a type's signedness and bit width respectively. Feel free to disagree. My thinking here is that it might make more sense to explain the notation as an informal shorthand rather than as a formal defnition, as making it a formal definition might open its own can of worms (e.g. we would probably also have to define what signedness means, etc). > + > +=== ======= > +`S` Meaning > +=== ======= > +`u` unsigned > +`s` signed > +=== ======= > + > +===== ========= > +`N` Bit width > +===== ========= > +`8` 8 bits > +`16` 16 bits > +`32` 32 bits > +`64` 64 bits > +`128` 128 bits > +===== ========= Is it by any chance possible to put these two tables on the same row? Not sure if rst is up to that challenge, and if not feel free to ignore. > + > +For example, `u32` is a type whose valid values are all the 32-bit unsigned > +numbers and `s16` is a types whose valid values are all the 16-bit signed > +numbers. > + > +Functions > +--------- > +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and > + returns the equivalent number as an unsigned 16-bit number in big-endian > + format. > +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and > + returns the equivalent number as an unsigned 32-bit number in big-endian > + format. > +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and > + returns the equivalent number as an unsigned 64-bit number in big-endian > + format. > +* `htole16`: Takes an unsigned 16-bit number in host-endian format and > + returns the equivalent number as an unsigned 16-bit number in little-endian > + format. > +* `htole32`: Takes an unsigned 32-bit number in host-endian format and > + returns the equivalent number as an unsigned 32-bit number in little-endian > + format. > +* `htole64`: Takes an unsigned 64-bit number in host-endian format and > + returns the equivalent number as an unsigned 64-bit number in little-endian > + format. > +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian > + format and returns the equivalent number with the same bit width but > + opposite endianness. > +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian > + format and returns the equivalent number with the same bit width but > + opposite endianness. > +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian > + format and returns the equivalent number with the same bit width but > + opposite endianness. Upon further reflection, maybe this belongs in the Byte swap instructions section of the document? The types make sense to list above because they're ubiquitous throughout the doc, but these are really 100% specific to byte swap. > > Registers and calling convention > ================================ > @@ -252,19 +311,23 @@ are supported: 16, 32 and 64. > > Examples: > > -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: > +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > dst = htole16(dst) > + dst = htole32(dst) > + dst = htole64(dst) > > -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: > +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: > > + dst = htobe16(dst) > + dst = htobe32(dst) > dst = htobe64(dst) > > ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > - dst = bswap16 dst > - dst = bswap32 dst > - dst = bswap64 dst > + dst = bswap16(dst) > + dst = bswap32(dst) > + dst = bswap64(dst) > [...] - David
On Sat, Aug 5, 2023 at 10:14 AM David Vernet <void@manifault.com> wrote: > > On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote: > > Give a single place where the shorthand for types are defined, the > > semantics of helper functions are described, and certain terms can > > be defined. > > > > Signed-off-by: Will Hawkins <hawkinsw@obs.cr> > > Hi Will, > > This is looking great. Left a couple more comments below, let me know > what you think. Thanks for the feedback! > > > --- > > .../bpf/standardization/instruction-set.rst | 79 +++++++++++++++++-- > > 1 file changed, 71 insertions(+), 8 deletions(-) > > > > Changelog: > > v1 -> v2: > > - Remove change to Sphinx version > > - Address David's comments > > - Address Dave's comments > > v2 -> v3: > > - Add information about sign extending > > > > diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst > > index 655494ac7af6..fe296f35e5a7 100644 > > --- a/Documentation/bpf/standardization/instruction-set.rst > > +++ b/Documentation/bpf/standardization/instruction-set.rst > > @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set. > > Documentation conventions > > ========================= > > > > -For brevity, this document uses the type notion "u64", "u32", etc. > > -to mean an unsigned integer whose width is the specified number of bits, > > -and "s32", etc. to mean a signed integer of the specified number of bits. > > +For brevity and consistency, this document refers to families > > +of types using a shorthand syntax and refers to several expository, > > +mnemonic functions when describing the semantics of opcodes. The range > > Hmm, I wonder if it's slightly more accurate to say that those functions > are describing the semantics of "instructions" rather than "opcodes", > given that the value in the immediate for the byte swap instructions > dictate the width. What do you think? Great point! > > > +of valid values for those types and the semantics of those functions > > +are defined in the following subsections. > > + > > +Types > > +----- > > +This document refers to integer types with a notation of the form `SN` > > +that succinctly defines whether their values are signed or unsigned > > Suggestion: I don't think we need the word "succinctly" here. I'm also > inclined to suggest that we avoid using the word "define" here, as the > notation itself isn't really defining the values of the types, but is > rather just a shorthand. Yes! > > > +numbers and the bit widths: > > What do you think about this phrasing: > > This document refers to integer types with the notation `SN` to specify > a type's signedness and bit width respectively. > > Feel free to disagree. My thinking here is that it might make more sense > to explain the notation as an informal shorthand rather than as a formal > defnition, as making it a formal definition might open its own can of > worms (e.g. we would probably also have to define what signedness means, > etc). I think that you make an excellent point. We have already gone back/forth about whether there is going to be a definition for the "type" of signedness that we use in the representation (i.e., two's complement, one's complement, sign magnitude, etc) and we definitely don't want to rush in to that rabbit hole again. I will incorporate your suggestion in the next revision. > > > + > > +=== ======= > > +`S` Meaning > > +=== ======= > > +`u` unsigned > > +`s` signed > > +=== ======= > > + > > +===== ========= > > +`N` Bit width > > +===== ========= > > +`8` 8 bits > > +`16` 16 bits > > +`32` 32 bits > > +`64` 64 bits > > +`128` 128 bits > > +===== ========= > > Is it by any chance possible to put these two tables on the same row? > Not sure if rst is up to that challenge, and if not feel free to ignore. I would love to make that happen. My rst skills are developing and I will see what I can do! > > > + > > +For example, `u32` is a type whose valid values are all the 32-bit unsigned > > +numbers and `s16` is a types whose valid values are all the 16-bit signed > > +numbers. > > + > > +Functions > > +--------- > > +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and > > + returns the equivalent number as an unsigned 16-bit number in big-endian > > + format. > > +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and > > + returns the equivalent number as an unsigned 32-bit number in big-endian > > + format. > > +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and > > + returns the equivalent number as an unsigned 64-bit number in big-endian > > + format. > > +* `htole16`: Takes an unsigned 16-bit number in host-endian format and > > + returns the equivalent number as an unsigned 16-bit number in little-endian > > + format. > > +* `htole32`: Takes an unsigned 32-bit number in host-endian format and > > + returns the equivalent number as an unsigned 32-bit number in little-endian > > + format. > > +* `htole64`: Takes an unsigned 64-bit number in host-endian format and > > + returns the equivalent number as an unsigned 64-bit number in little-endian > > + format. > > +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian > > + format and returns the equivalent number with the same bit width but > > + opposite endianness. > > +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian > > + format and returns the equivalent number with the same bit width but > > + opposite endianness. > > +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian > > + format and returns the equivalent number with the same bit width but > > + opposite endianness. > > Upon further reflection, maybe this belongs in the Byte swap > instructions section of the document? The types make sense to list above > because they're ubiquitous throughout the doc, but these are really 100% > specific to byte swap. I am not opposed to that. The only reason that I put them here was to make it fully centralized. They are also going to be used in the Appendix and I was hoping not to have to reproduce them there. In v4 of the patch I will leave them here and then if we do feel like we should move them I will happily make a v5! Thank you again for the feedback! A new revision of the patch will be out shortly! Have a great rest of the weekend! Will > > > > > Registers and calling convention > > ================================ > > @@ -252,19 +311,23 @@ are supported: 16, 32 and 64. > > > > Examples: > > > > -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: > > +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > > > dst = htole16(dst) > > + dst = htole32(dst) > > + dst = htole64(dst) > > > > -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: > > +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: > > > > + dst = htobe16(dst) > > + dst = htobe32(dst) > > dst = htobe64(dst) > > > > ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > > > - dst = bswap16 dst > > - dst = bswap32 dst > > - dst = bswap64 dst > > + dst = bswap16(dst) > > + dst = bswap32(dst) > > + dst = bswap64(dst) > > > > [...] > > - David
On Sat, Aug 5, 2023 at 11:01 PM Will Hawkins <hawkinsw@obs.cr> wrote: > > On Sat, Aug 5, 2023 at 10:14 AM David Vernet <void@manifault.com> wrote: > > > > On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote: > > > Give a single place where the shorthand for types are defined, the > > > semantics of helper functions are described, and certain terms can > > > be defined. > > > > > > Signed-off-by: Will Hawkins <hawkinsw@obs.cr> > > > > Hi Will, > > > > This is looking great. Left a couple more comments below, let me know > > what you think. > > Thanks for the feedback! > > > > > > --- > > > .../bpf/standardization/instruction-set.rst | 79 +++++++++++++++++-- > > > 1 file changed, 71 insertions(+), 8 deletions(-) > > > > > > Changelog: > > > v1 -> v2: > > > - Remove change to Sphinx version > > > - Address David's comments > > > - Address Dave's comments > > > v2 -> v3: > > > - Add information about sign extending > > > > > > diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst > > > index 655494ac7af6..fe296f35e5a7 100644 > > > --- a/Documentation/bpf/standardization/instruction-set.rst > > > +++ b/Documentation/bpf/standardization/instruction-set.rst > > > @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set. > > > Documentation conventions > > > ========================= > > > > > > -For brevity, this document uses the type notion "u64", "u32", etc. > > > -to mean an unsigned integer whose width is the specified number of bits, > > > -and "s32", etc. to mean a signed integer of the specified number of bits. > > > +For brevity and consistency, this document refers to families > > > +of types using a shorthand syntax and refers to several expository, > > > +mnemonic functions when describing the semantics of opcodes. The range > > > > Hmm, I wonder if it's slightly more accurate to say that those functions > > are describing the semantics of "instructions" rather than "opcodes", > > given that the value in the immediate for the byte swap instructions > > dictate the width. What do you think? > > Great point! > > > > > > +of valid values for those types and the semantics of those functions > > > +are defined in the following subsections. > > > + > > > +Types > > > +----- > > > +This document refers to integer types with a notation of the form `SN` > > > +that succinctly defines whether their values are signed or unsigned > > > > Suggestion: I don't think we need the word "succinctly" here. I'm also > > inclined to suggest that we avoid using the word "define" here, as the > > notation itself isn't really defining the values of the types, but is > > rather just a shorthand. > > Yes! > > > > > > +numbers and the bit widths: > > > > What do you think about this phrasing: > > > > This document refers to integer types with the notation `SN` to specify > > a type's signedness and bit width respectively. > > > > Feel free to disagree. My thinking here is that it might make more sense > > to explain the notation as an informal shorthand rather than as a formal > > defnition, as making it a formal definition might open its own can of > > worms (e.g. we would probably also have to define what signedness means, > > etc). > > I think that you make an excellent point. We have already gone > back/forth about whether there is going to be a definition for the > "type" of signedness that we use in the representation (i.e., two's > complement, one's complement, sign magnitude, etc) and we definitely > don't want to rush in to that rabbit hole again. I will incorporate > your suggestion in the next revision. > > > > > > + > > > +=== ======= > > > +`S` Meaning > > > +=== ======= > > > +`u` unsigned > > > +`s` signed > > > +=== ======= > > > + > > > +===== ========= > > > +`N` Bit width > > > +===== ========= > > > +`8` 8 bits > > > +`16` 16 bits > > > +`32` 32 bits > > > +`64` 64 bits > > > +`128` 128 bits > > > +===== ========= > > > > Is it by any chance possible to put these two tables on the same row? > > Not sure if rst is up to that challenge, and if not feel free to ignore. > > I would love to make that happen. My rst skills are developing and I > will see what I can do! FYI: In the upcoming version of this patch, you will see that I did not do the side-by-side table. I want you to know that I tried very hard and ultimately got it to work. However, it looked terrible and the result was confusing (I thought). If you would strongly prefer a side-by-side (ish) table, please let me know and I will send a patch to show what it looked like. I just wanted to note that I *tried* to incorporate your suggestion, but could not ultimately bend RST to my will (yes, pun intended)! Will > > > > > > + > > > +For example, `u32` is a type whose valid values are all the 32-bit unsigned > > > +numbers and `s16` is a types whose valid values are all the 16-bit signed > > > +numbers. > > > + > > > +Functions > > > +--------- > > > +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 16-bit number in big-endian > > > + format. > > > +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 32-bit number in big-endian > > > + format. > > > +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 64-bit number in big-endian > > > + format. > > > +* `htole16`: Takes an unsigned 16-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 16-bit number in little-endian > > > + format. > > > +* `htole32`: Takes an unsigned 32-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 32-bit number in little-endian > > > + format. > > > +* `htole64`: Takes an unsigned 64-bit number in host-endian format and > > > + returns the equivalent number as an unsigned 64-bit number in little-endian > > > + format. > > > +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian > > > + format and returns the equivalent number with the same bit width but > > > + opposite endianness. > > > +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian > > > + format and returns the equivalent number with the same bit width but > > > + opposite endianness. > > > +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian > > > + format and returns the equivalent number with the same bit width but > > > + opposite endianness. > > > > Upon further reflection, maybe this belongs in the Byte swap > > instructions section of the document? The types make sense to list above > > because they're ubiquitous throughout the doc, but these are really 100% > > specific to byte swap. > > I am not opposed to that. The only reason that I put them here was to > make it fully centralized. They are also going to be used in the > Appendix and I was hoping not to have to reproduce them there. In v4 > of the patch I will leave them here and then if we do feel like we > should move them I will happily make a v5! > > Thank you again for the feedback! A new revision of the patch will be > out shortly! > > Have a great rest of the weekend! > Will > > > > > > > > > Registers and calling convention > > > ================================ > > > @@ -252,19 +311,23 @@ are supported: 16, 32 and 64. > > > > > > Examples: > > > > > > -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: > > > +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > > > > > dst = htole16(dst) > > > + dst = htole32(dst) > > > + dst = htole64(dst) > > > > > > -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: > > > +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: > > > > > > + dst = htobe16(dst) > > > + dst = htobe32(dst) > > > dst = htobe64(dst) > > > > > > ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: > > > > > > - dst = bswap16 dst > > > - dst = bswap32 dst > > > - dst = bswap64 dst > > > + dst = bswap16(dst) > > > + dst = bswap32(dst) > > > + dst = bswap64(dst) > > > > > > > [...] > > > > - David
diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst index 655494ac7af6..fe296f35e5a7 100644 --- a/Documentation/bpf/standardization/instruction-set.rst +++ b/Documentation/bpf/standardization/instruction-set.rst @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set. Documentation conventions ========================= -For brevity, this document uses the type notion "u64", "u32", etc. -to mean an unsigned integer whose width is the specified number of bits, -and "s32", etc. to mean a signed integer of the specified number of bits. +For brevity and consistency, this document refers to families +of types using a shorthand syntax and refers to several expository, +mnemonic functions when describing the semantics of opcodes. The range +of valid values for those types and the semantics of those functions +are defined in the following subsections. + +Types +----- +This document refers to integer types with a notation of the form `SN` +that succinctly defines whether their values are signed or unsigned +numbers and the bit widths: + +=== ======= +`S` Meaning +=== ======= +`u` unsigned +`s` signed +=== ======= + +===== ========= +`N` Bit width +===== ========= +`8` 8 bits +`16` 16 bits +`32` 32 bits +`64` 64 bits +`128` 128 bits +===== ========= + +For example, `u32` is a type whose valid values are all the 32-bit unsigned +numbers and `s16` is a types whose valid values are all the 16-bit signed +numbers. + +Functions +--------- +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and + returns the equivalent number as an unsigned 16-bit number in big-endian + format. +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and + returns the equivalent number as an unsigned 32-bit number in big-endian + format. +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and + returns the equivalent number as an unsigned 64-bit number in big-endian + format. +* `htole16`: Takes an unsigned 16-bit number in host-endian format and + returns the equivalent number as an unsigned 16-bit number in little-endian + format. +* `htole32`: Takes an unsigned 32-bit number in host-endian format and + returns the equivalent number as an unsigned 32-bit number in little-endian + format. +* `htole64`: Takes an unsigned 64-bit number in host-endian format and + returns the equivalent number as an unsigned 64-bit number in little-endian + format. +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. Registers and calling convention ================================ @@ -252,19 +311,23 @@ are supported: 16, 32 and 64. Examples: -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: dst = htole16(dst) + dst = htole32(dst) + dst = htole64(dst) -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: + dst = htobe16(dst) + dst = htobe32(dst) dst = htobe64(dst) ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: - dst = bswap16 dst - dst = bswap32 dst - dst = bswap64 dst + dst = bswap16(dst) + dst = bswap32(dst) + dst = bswap64(dst) Jump instructions -----------------
Give a single place where the shorthand for types are defined, the semantics of helper functions are described, and certain terms can be defined. Signed-off-by: Will Hawkins <hawkinsw@obs.cr> --- .../bpf/standardization/instruction-set.rst | 79 +++++++++++++++++-- 1 file changed, 71 insertions(+), 8 deletions(-) Changelog: v1 -> v2: - Remove change to Sphinx version - Address David's comments - Address Dave's comments v2 -> v3: - Add information about sign extending