diff mbox series

[07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow

Message ID 20220927185958.14995-7-dthaler1968@googlemail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [01/15] ebpf-docs: Move legacy packet instructions to a separate file | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc

Commit Message

Dave Thaler Sept. 27, 2022, 6:59 p.m. UTC
From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Alexei Starovoitov Sept. 30, 2022, 8:52 p.m. UTC | #1
On Tue, Sep 27, 2022 at 06:59:50PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index a24bc5d53..3c5a63612 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -103,19 +103,26 @@ code      value  description
>  BPF_ADD   0x00   dst += src
>  BPF_SUB   0x10   dst -= src
>  BPF_MUL   0x20   dst \*= src
> -BPF_DIV   0x30   dst /= src
> +BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
>  BPF_OR    0x40   dst \|= src
>  BPF_AND   0x50   dst &= src
>  BPF_LSH   0x60   dst <<= src
>  BPF_RSH   0x70   dst >>= src
>  BPF_NEG   0x80   dst = ~src
> -BPF_MOD   0x90   dst %= src
> +BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
>  BPF_XOR   0xa0   dst ^= src
>  BPF_MOV   0xb0   dst = src
>  BPF_ARSH  0xc0   sign extending shift right
>  BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
>  ========  =====  ==========================================================
>  
> +Underflow and overflow are allowed during arithmetic operations,
> +meaning the 64-bit or 32-bit value will wrap.  If
> +eBPF program execution would result in division by zero,
> +the destination register is instead set to zero.
> +If execution would result in modulo by zero,
> +the destination register is instead left unchanged.
> +
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
>    dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
> @@ -135,6 +142,14 @@ where '(uint32_t)' indicates truncation to 32 bits.
>    src_reg = src_reg ^ imm32
>  
>  
> +Also note that the modulo operation often varies by language
> +when the dividend or divisor are negative, where Python, Ruby, etc.
> +differ from C, Go, Java, etc. This specification requires that
> +modulo use truncated division (where -13 % 3 == -1) as implemented
> +in C, Go, etc.:
> +
> +   a % n = a - n * trunc(a / n)
> +

Interesting bit of info, but I'm not sure how it relates to the ISA doc.

I think it's more important to say that BPF ISA only supports unsigned div/mod.
There are no instructions for signed div/mod.

>  Byte swap instructions
>  ~~~~~~~~~~~~~~~~~~~~~~
>  
> -- 
> 2.33.4
>
Dave Thaler Sept. 30, 2022, 9:54 p.m. UTC | #2
[...]
> > +Also note that the modulo operation often varies by language when the
> > +dividend or divisor are negative, where Python, Ruby, etc.
> > +differ from C, Go, Java, etc. This specification requires that modulo
> > +use truncated division (where -13 % 3 == -1) as implemented in C, Go,
> > +etc.:
> > +
> > +   a % n = a - n * trunc(a / n)
> > +
> 
> Interesting bit of info, but I'm not sure how it relates to the ISA doc.

It's because there's multiple definitions of modulo out there as the paragraph notes,
which differ in what they do with negative numbers.
The ISA defines the modulo operation as being the specific version above.
If you tried to implement the ISA in say Python and didn't know that,
you'd have a non-compliant implementation.

Dave
Alexei Starovoitov Sept. 30, 2022, 9:59 p.m. UTC | #3
On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> [...]
> > > +Also note that the modulo operation often varies by language when the
> > > +dividend or divisor are negative, where Python, Ruby, etc.
> > > +differ from C, Go, Java, etc. This specification requires that modulo
> > > +use truncated division (where -13 % 3 == -1) as implemented in C, Go,
> > > +etc.:
> > > +
> > > +   a % n = a - n * trunc(a / n)
> > > +
> > 
> > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> 
> It's because there's multiple definitions of modulo out there as the paragraph notes,
> which differ in what they do with negative numbers.
> The ISA defines the modulo operation as being the specific version above.
> If you tried to implement the ISA in say Python and didn't know that,
> you'd have a non-compliant implementation.

Is it because the languages have weird rules to pick between signed vs unsigned mod?
At least from llvm pov the smod and umod have fixed behavior.
Dave Thaler Sept. 30, 2022, 10:41 p.m. UTC | #4
> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Friday, September 30, 2022 2:59 PM
> To: Dave Thaler <dthaler@microsoft.com>
> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> overflow, and underflow
> 
> On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > [...]
> > > > +Also note that the modulo operation often varies by language when
> > > > +the dividend or divisor are negative, where Python, Ruby, etc.
> > > > +differ from C, Go, Java, etc. This specification requires that
> > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > +implemented in C, Go,
> > > > +etc.:
> > > > +
> > > > +   a % n = a - n * trunc(a / n)
> > > > +
> > >
> > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> >
> > It's because there's multiple definitions of modulo out there as the
> > paragraph notes, which differ in what they do with negative numbers.
> > The ISA defines the modulo operation as being the specific version above.
> > If you tried to implement the ISA in say Python and didn't know that,
> > you'd have a non-compliant implementation.
> 
> Is it because the languages have weird rules to pick between signed vs
> unsigned mod?
> At least from llvm pov the smod and umod have fixed behavior.

It's because there's different mathematical definitions and different languages have chosen different definitions.  E.g., languages/libraries that follow Knuth use a
different mathematical definition than C uses.  For details see:

https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition

https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/

Dave
Alexei Starovoitov Sept. 30, 2022, 11:41 p.m. UTC | #5
On Fri, Sep 30, 2022 at 3:41 PM Dave Thaler <dthaler@microsoft.com> wrote:
>
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Friday, September 30, 2022 2:59 PM
> > To: Dave Thaler <dthaler@microsoft.com>
> > Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> > Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> > overflow, and underflow
> >
> > On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > > [...]
> > > > > +Also note that the modulo operation often varies by language when
> > > > > +the dividend or divisor are negative, where Python, Ruby, etc.
> > > > > +differ from C, Go, Java, etc. This specification requires that
> > > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > > +implemented in C, Go,
> > > > > +etc.:
> > > > > +
> > > > > +   a % n = a - n * trunc(a / n)
> > > > > +
> > > >
> > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> > >
> > > It's because there's multiple definitions of modulo out there as the
> > > paragraph notes, which differ in what they do with negative numbers.
> > > The ISA defines the modulo operation as being the specific version above.
> > > If you tried to implement the ISA in say Python and didn't know that,
> > > you'd have a non-compliant implementation.
> >
> > Is it because the languages have weird rules to pick between signed vs
> > unsigned mod?
> > At least from llvm pov the smod and umod have fixed behavior.
>
> It's because there's different mathematical definitions and different languages have chosen different definitions.  E.g., languages/libraries that follow Knuth use a
> different mathematical definition than C uses.  For details see:
>
> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
>
> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/

Those differences are in signed div/mod only, right?
Unsigned div/mod doesn't have it, right?
bpf has only unsigned div/mod.
Dave Thaler Oct. 4, 2022, 4:36 p.m. UTC | #6
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
> > > > > > +Also note that the modulo operation often varies by language
> > > > > > +when the dividend or divisor are negative, where Python, Ruby,
> etc.
> > > > > > +differ from C, Go, Java, etc. This specification requires
> > > > > > +that modulo use truncated division (where -13 % 3 == -1) as
> > > > > > +implemented in C, Go,
> > > > > > +etc.:
> > > > > > +
> > > > > > +   a % n = a - n * trunc(a / n)
> > > > > > +
> > > > >
> > > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
[...]
> Those differences are in signed div/mod only, right?
> Unsigned div/mod doesn't have it, right?
> bpf has only unsigned div/mod.

Ah right, will replace.  However since imm is a signed integer, that leaves
an ambiguity that is important to clarify.

What is the expected value for the following 64-bit BPF_DIV operation:
    r0 = 0xFFFFFFFFFFFFFFFF
    r0 /= -10
Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
division?

Dave
Alexei Starovoitov Oct. 4, 2022, 5:24 p.m. UTC | #7
On Tue, Oct 4, 2022 at 9:36 AM Dave Thaler <dthaler@microsoft.com> wrote:
> > Those differences are in signed div/mod only, right?
> > Unsigned div/mod doesn't have it, right?
> > bpf has only unsigned div/mod.
>
> Ah right, will replace.  However since imm is a signed integer, that leaves
> an ambiguity that is important to clarify.
>
> What is the expected value for the following 64-bit BPF_DIV operation:
>     r0 = 0xFFFFFFFFFFFFFFFF
>     r0 /= -10
> Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> division?

x86 and arm64 JITs treat it as imm32 is zero extended.
But looking at the interpreter:
        ALU64_DIV_K:
                DST = div64_u64(DST, IMM);
it looks like we have a bug there.
But we have a bunch of div_k tests in lib/test_bpf.c
including negative imm32. Hmm.
Dave Thaler Oct. 4, 2022, 6:23 p.m. UTC | #8
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
> > What is the expected value for the following 64-bit BPF_DIV operation:
> >     r0 = 0xFFFFFFFFFFFFFFFF
> >     r0 /= -10
> > Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> > 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> > division?
> 
> x86 and arm64 JITs treat it as imm32 is zero extended.

Alan Jowett just pointed out to me that the question is not limited to DIV.

r0 = 1
r0 += -1

Is the answer 0 or 0x0000000100000000?
Assuming the answer is to zero extend imm32, it contains the latter, which
would be counter-intuitive enough to make it important to point out explicitly.

> But looking at the interpreter:
>         ALU64_DIV_K:
>                 DST = div64_u64(DST, IMM); it looks like we have a bug there.
> But we have a bunch of div_k tests in lib/test_bpf.c including negative
> imm32. Hmm.

Yeah.

"ALU64_DIV_K: 0xffffffffffffffff / (-1) = 0x0000000000000001",
"ALU64_ADD_K: 2147483646 + -2147483647 = -1",
"ALU64_ADD_K: 0 + (-1) = 0xffffffffffffffff",
"ALU64_MUL_K: 1 * -2147483647 = -2147483647",
"ALU64_MUL_K: 1 * (-1) = 0xffffffffffffffff",
"ALU64_AND_K: 0x0000ffffffff0000 & -1 = 0x0000ffffffff0000",
"ALU64_AND_K: 0xffffffffffffffff & -1 = 0xffffffffffffffff",
"ALU64_OR_K: 0x000000000000000 | -1 = 0xffffffffffffffff",

The above assume sign extension not zero extension is the correct behavior
for these operations, if I understand correctly.

Dave
Alexei Starovoitov Oct. 4, 2022, 6:34 p.m. UTC | #9
On Tue, Oct 4, 2022 at 11:23 AM Dave Thaler <dthaler@microsoft.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> [...]
> > > What is the expected value for the following 64-bit BPF_DIV operation:
> > >     r0 = 0xFFFFFFFFFFFFFFFF
> > >     r0 /= -10
> > > Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> > > 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> > > division?
> >
> > x86 and arm64 JITs treat it as imm32 is zero extended.
>
> Alan Jowett just pointed out to me that the question is not limited to DIV.
>
> r0 = 1
> r0 += -1
>
> Is the answer 0 or 0x0000000100000000?
> Assuming the answer is to zero extend imm32, it contains the latter, which
> would be counter-intuitive enough to make it important to point out explicitly.

This is an obvious one. imm32 is _sign_ extended everywhere.

>
> > But looking at the interpreter:
> >         ALU64_DIV_K:
> >                 DST = div64_u64(DST, IMM); it looks like we have a bug there.
> > But we have a bunch of div_k tests in lib/test_bpf.c including negative
> > imm32. Hmm.

Actually I misread the JITs.
/* mov r11, imm32 */
EMIT3_off32(0x49, 0xC7, 0xC3, imm32);

It is sign extending. There is no bug in the interpreter or JIT.

> Yeah.
>
> "ALU64_DIV_K: 0xffffffffffffffff / (-1) = 0x0000000000000001",
> "ALU64_ADD_K: 2147483646 + -2147483647 = -1",
> "ALU64_ADD_K: 0 + (-1) = 0xffffffffffffffff",
> "ALU64_MUL_K: 1 * -2147483647 = -2147483647",
> "ALU64_MUL_K: 1 * (-1) = 0xffffffffffffffff",
> "ALU64_AND_K: 0x0000ffffffff0000 & -1 = 0x0000ffffffff0000",
> "ALU64_AND_K: 0xffffffffffffffff & -1 = 0xffffffffffffffff",
> "ALU64_OR_K: 0x000000000000000 | -1 = 0xffffffffffffffff",
>
> The above assume sign extension not zero extension is the correct behavior
> for these operations, if I understand correctly.
>
> Dave
Dave Thaler Sept. 29, 2023, 9:03 p.m. UTC | #10
In the email discussion below, we concluded it wasn't relevant at the time because
there were no signed modulo instructions.  However, now there is and I believe the
ambiguity in the current spec needs to be addressed.

> -----Original Message-----
> From: Dave Thaler
> Sent: Friday, September 30, 2022 3:42 PM
> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> overflow, and underflow
> 
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Friday, September 30, 2022 2:59 PM
> > To: Dave Thaler <dthaler@microsoft.com>
> > Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> > Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
> > zero, overflow, and underflow
> >
> > On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > > [...]
> > > > > +Also note that the modulo operation often varies by language
> > > > > +when the dividend or divisor are negative, where Python, Ruby, etc.
> > > > > +differ from C, Go, Java, etc. This specification requires that
> > > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > > +implemented in C, Go,
> > > > > +etc.:
> > > > > +
> > > > > +   a % n = a - n * trunc(a / n)
> > > > > +
> > > >
> > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> > >
> > > It's because there's multiple definitions of modulo out there as the
> > > paragraph notes, which differ in what they do with negative numbers.
> > > The ISA defines the modulo operation as being the specific version above.
> > > If you tried to implement the ISA in say Python and didn't know
> > > that, you'd have a non-compliant implementation.
> >
> > Is it because the languages have weird rules to pick between signed vs
> > unsigned mod?
> > At least from llvm pov the smod and umod have fixed behavior.
> 
> It's because there's different mathematical definitions and different languages
> have chosen different definitions.  E.g., languages/libraries that follow Knuth
> use a different mathematical definition than C uses.  For details see:
> 
> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
> 
> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
> 
> Dave

Perhaps text like the proposed snippet quoted in the exchange above should be
added around the new text that now appears in the doc, i.e. the ambiguous text
is currently:
> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> interpreted as a 64-bit signed value.  

Dave
Carsten Bormann Sept. 30, 2023, 6:07 a.m. UTC | #11
I didn’t follow the whole discussion, but it maybe it's worth pointing out that C’s % is not a modulo operator, but a remainder operator.

Grüße, Carsten


> On 29. Sep 2023, at 23:03, Dave Thaler <dthaler=40microsoft.com@dmarc.ietf.org> wrote:
> 
> In the email discussion below, we concluded it wasn't relevant at the time because
> there were no signed modulo instructions.  However, now there is and I believe the
> ambiguity in the current spec needs to be addressed.
> 
>> -----Original Message-----
>> From: Dave Thaler
>> Sent: Friday, September 30, 2022 3:42 PM
>> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
>> overflow, and underflow
>> 
>>> -----Original Message-----
>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Sent: Friday, September 30, 2022 2:59 PM
>>> To: Dave Thaler <dthaler@microsoft.com>
>>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>>> Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
>>> zero, overflow, and underflow
>>> 
>>> On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
>>>> [...]
>>>>>> +Also note that the modulo operation often varies by language
>>>>>> +when the dividend or divisor are negative, where Python, Ruby, etc.
>>>>>> +differ from C, Go, Java, etc. This specification requires that
>>>>>> +modulo use truncated division (where -13 % 3 == -1) as
>>>>>> +implemented in C, Go,
>>>>>> +etc.:
>>>>>> +
>>>>>> +   a % n = a - n * trunc(a / n)
>>>>>> +
>>>>> 
>>>>> Interesting bit of info, but I'm not sure how it relates to the ISA doc.
>>>> 
>>>> It's because there's multiple definitions of modulo out there as the
>>>> paragraph notes, which differ in what they do with negative numbers.
>>>> The ISA defines the modulo operation as being the specific version above.
>>>> If you tried to implement the ISA in say Python and didn't know
>>>> that, you'd have a non-compliant implementation.
>>> 
>>> Is it because the languages have weird rules to pick between signed vs
>>> unsigned mod?
>>> At least from llvm pov the smod and umod have fixed behavior.
>> 
>> It's because there's different mathematical definitions and different languages
>> have chosen different definitions.  E.g., languages/libraries that follow Knuth
>> use a different mathematical definition than C uses.  For details see:
>> 
>> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
>> 
>> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
>> 
>> Dave
> 
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
>> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
>> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
>> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
>> interpreted as a 64-bit signed value.  
> 
> Dave
> 
> -- 
> Bpf mailing list
> Bpf@ietf.org
> https://www.ietf.org/mailman/listinfo/bpf
Alexei Starovoitov Sept. 30, 2023, 2:48 p.m. UTC | #12
On Fri, Sep 29, 2023 at 2:03 PM Dave Thaler
<dthaler=40microsoft.com@dmarc.ietf.org> wrote:
>
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
> > For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> > 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> > is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> > interpreted as a 64-bit signed value.

That's what we have in the doc and it's a correct description.
Which part is ambiguous?
Eduard Zingerman Oct. 2, 2023, 1:19 p.m. UTC | #13
On Sat, 2023-09-30 at 07:48 -0700, Alexei Starovoitov wrote:
> On Fri, Sep 29, 2023 at 2:03 PM Dave Thaler
> <dthaler=40microsoft.com@dmarc.ietf.org> wrote:
> > 
> > Perhaps text like the proposed snippet quoted in the exchange above should be
> > added around the new text that now appears in the doc, i.e. the ambiguous text
> > is currently:
> > > For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> > > 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> > > is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> > > interpreted as a 64-bit signed value.
> 
> That's what we have in the doc and it's a correct description.
> Which part is ambiguous?
> 

As far as I understand Dave suggests to add exact specification for
the SMOD instruction as "signed modulo" might have different definitions [1].

I double checked and current clang implementation generates SMOD for
LLVM's 'srem' operation [2], which follows C semantics and defines
remainder as:

  remainder = a - n * trunc(divident / divisor)

> This instruction returns the remainder of a division (where the
> result is either zero or has the same sign as the dividend, op1)

And this is consistent with interpreter logic in core.c, e.g.:

	ALU64_MOD_K:
		switch (OFF) {
		case 0: ... break;
		case 1:
			AX = div64_s64(DST, IMM); /* implemented as '/' */
			DST = DST - AX * IMM;
			break;
		}
		CONT;

[1] https://en.wikipedia.org/wiki/Modulo
[2] https://llvm.org/docs/LangRef.html#srem-instruction
diff mbox series

Patch

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index a24bc5d53..3c5a63612 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -103,19 +103,26 @@  code      value  description
 BPF_ADD   0x00   dst += src
 BPF_SUB   0x10   dst -= src
 BPF_MUL   0x20   dst \*= src
-BPF_DIV   0x30   dst /= src
+BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
 BPF_OR    0x40   dst \|= src
 BPF_AND   0x50   dst &= src
 BPF_LSH   0x60   dst <<= src
 BPF_RSH   0x70   dst >>= src
 BPF_NEG   0x80   dst = ~src
-BPF_MOD   0x90   dst %= src
+BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
 BPF_XOR   0xa0   dst ^= src
 BPF_MOV   0xb0   dst = src
 BPF_ARSH  0xc0   sign extending shift right
 BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
 ========  =====  ==========================================================
 
+Underflow and overflow are allowed during arithmetic operations,
+meaning the 64-bit or 32-bit value will wrap.  If
+eBPF program execution would result in division by zero,
+the destination register is instead set to zero.
+If execution would result in modulo by zero,
+the destination register is instead left unchanged.
+
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
   dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
@@ -135,6 +142,14 @@  where '(uint32_t)' indicates truncation to 32 bits.
   src_reg = src_reg ^ imm32
 
 
+Also note that the modulo operation often varies by language
+when the dividend or divisor are negative, where Python, Ruby, etc.
+differ from C, Go, Java, etc. This specification requires that
+modulo use truncated division (where -13 % 3 == -1) as implemented
+in C, Go, etc.:
+
+   a % n = a - n * trunc(a / n)
+
 Byte swap instructions
 ~~~~~~~~~~~~~~~~~~~~~~