diff mbox series

[v5,bpf-next,03/23] bpf: derive smin/smax from umin/max bounds

Message ID 20231027181346.4019398-4-andrii@kernel.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series BPF register bounds logic and testing improvements | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-16 / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-16 / veristat
netdev/series_format fail Series longer than 15 patches (and no cover letter)
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1374 this patch: 1374
netdev/cc_maintainers warning 8 maintainers not CCed: john.fastabend@gmail.com kpsingh@kernel.org song@kernel.org sdf@google.com jolsa@kernel.org martin.lau@linux.dev yonghong.song@linux.dev haoluo@google.com
netdev/build_clang fail Errors and warnings before: 15 this patch: 15
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1399 this patch: 1399
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success Link
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc

Commit Message

Andrii Nakryiko Oct. 27, 2023, 6:13 p.m. UTC
Add smin/smax derivation from appropriate umin/umax values. Previously the
logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
(if possible), but not trying to do the same in the other direction. A simple
addition to __reg64_deduce_bounds() fixes this.

Added also generic comment about u64/s64 ranges and their relationship.
Hopefully that helps readers to understand all the bounds deductions
a bit better.

Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

Comments

Eduard Zingerman Oct. 31, 2023, 3:37 p.m. UTC | #1
On Fri, 2023-10-27 at 11:13 -0700, Andrii Nakryiko wrote:
> Add smin/smax derivation from appropriate umin/umax values. Previously the
> logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
> (if possible), but not trying to do the same in the other direction. A simple
> addition to __reg64_deduce_bounds() fixes this.
> 
> Added also generic comment about u64/s64 ranges and their relationship.
> Hopefully that helps readers to understand all the bounds deductions
> a bit better.
> 
> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Nice comment, thank you. I noticed two typos, see below.

> ---
>  kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 857d76694517..bf4193706744 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2358,6 +2358,76 @@ static void __reg32_deduce_bounds(struct bpf_reg_state *reg)
>  
>  static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
>  {
> +	/* If u64 range forms a valid s64 range (due to matching sign bit),
> +	 * try to learn from that. Let's do a bit of ASCII art to see when
> +	 * this is happening. Let's take u64 range first:
> +	 *
> +	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> +	 * |-------------------------------|--------------------------------|
> +	 *
> +	 * Valid u64 range is formed when umin and umax are anywhere in this
> +	 * range [0, U64_MAX] and umin <= umax. u64 is simple and
> +	 * straightforward. Let's where s64 range maps to this simple [0,
> +	 * U64_MAX] range, annotated below the line for comparison:

Nit: this sentence sounds a bit weird, probably some word is missing
     between "let's" and "where".

> +	 *
> +	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> +	 * |-------------------------------|--------------------------------|
> +	 * 0                        S64_MAX S64_MIN                        -1
> +	 *
> +	 * So s64 values basically start in the middle and then are contiguous
> +	 * to the right of it, wrapping around from -1 to 0, and then
> +	 * finishing as S64_MAX (0x7fffffffffffffff) right before S64_MIN.
> +	 * We can try drawing more visually continuity of u64 vs s64 values as
> +	 * mapped to just actual hex valued range of values.
> +	 *
> +	 *  u64 start                                               u64 end
> +	 *  _______________________________________________________________
> +	 * /                                                               \
> +	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> +	 * |-------------------------------|--------------------------------|
> +	 * 0                        S64_MAX S64_MIN                        -1
> +	 *                                / \
> +	 * >------------------------------   ------------------------------->
> +	 * s64 continues...        s64 end   s64 start          s64 "midpoint"
> +	 *
> +	 * What this means is that in general, we can't always derive
> +	 * something new about u64 from any random s64 range, and vice versa.
> +	 * But we can do that in two particular cases. One is when entire
> +	 * u64/s64 range is *entirely* contained within left half of the above
> +	 * diagram or when it is *entirely* contained in the right half. I.e.:
> +	 *
> +	 * |-------------------------------|--------------------------------|
> +	 *     ^                   ^            ^                 ^
> +	 *     A                   B            C                 D
> +	 *
> +	 * [A, B] and [C, D] are contained entirely in their respective halves
> +	 * and form valid contiguous ranges as both u64 and s64 values. [A, B]
> +	 * will be non-negative both as u64 and s64 (and in fact it will be
> +	 * identical ranges no matter the signedness). [C, D] treated as s64
> +	 * will be a range of negative values, while in u64 it will be
> +	 * non-negative range of values larger than 0x8000000000000000.
> +	 *
> +	 * Now, any other range here can't be represented in both u64 and s64
> +	 * simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid
> +	 * contiguous u64 ranges, but they are discontinuous in s64. [B, C]
> +	 * in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX],
> +	 * for example. Similarly, valid s64 range [D, A] (going from negative
> +	 * to positive values), would be two separate [D, U64_MAX] and [0, A]
> +	 * ranges as u64. Currently reg_state can't represent two segments per
> +	 * numeric domain, so in such situations we can only derive maximal
> +	 * possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX) for s64).
                                                                  ^
Nit:                                                      missing bracket

> +	 *
> +	 * So we use these facts to derive umin/umax from smin/smax and vice
> +	 * versa only if they stay within the same "half". This is equivalent
> +	 * to checking sign bit: lower half will have sign bit as zero, upper
> +	 * half have sign bit 1. Below in code we simplify this by just
> +	 * casting umin/umax as smin/smax and checking if they form valid
> +	 * range, and vice versa. Those are equivalent checks.
> +	 */
> +	if ((s64)reg->umin_value <= (s64)reg->umax_value) {
> +		reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value);
> +		reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value);
> +	}
>  	/* Learn sign from signed bounds.
>  	 * If we cannot cross the sign boundary, then signed and unsigned bounds
>  	 * are the same, so combine.  This works even in the negative case, e.g.
Andrii Nakryiko Oct. 31, 2023, 5:30 p.m. UTC | #2
On Tue, Oct 31, 2023 at 8:37 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2023-10-27 at 11:13 -0700, Andrii Nakryiko wrote:
> > Add smin/smax derivation from appropriate umin/umax values. Previously the
> > logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
> > (if possible), but not trying to do the same in the other direction. A simple
> > addition to __reg64_deduce_bounds() fixes this.
> >
> > Added also generic comment about u64/s64 ranges and their relationship.
> > Hopefully that helps readers to understand all the bounds deductions
> > a bit better.
> >
> > Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> Nice comment, thank you. I noticed two typos, see below.
>
> > ---
> >  kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 70 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 857d76694517..bf4193706744 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -2358,6 +2358,76 @@ static void __reg32_deduce_bounds(struct bpf_reg_state *reg)
> >
> >  static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
> >  {
> > +     /* If u64 range forms a valid s64 range (due to matching sign bit),
> > +      * try to learn from that. Let's do a bit of ASCII art to see when
> > +      * this is happening. Let's take u64 range first:
> > +      *
> > +      * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> > +      * |-------------------------------|--------------------------------|
> > +      *
> > +      * Valid u64 range is formed when umin and umax are anywhere in this
> > +      * range [0, U64_MAX] and umin <= umax. u64 is simple and
> > +      * straightforward. Let's where s64 range maps to this simple [0,
> > +      * U64_MAX] range, annotated below the line for comparison:
>
> Nit: this sentence sounds a bit weird, probably some word is missing
>      between "let's" and "where".
>

I don't know what's going on here, I wasn't drunk when I wrote this
and I don't remember it being so incoherent :) Will re-read and try to
make it clearer.

> > +      *
> > +      * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> > +      * |-------------------------------|--------------------------------|
> > +      * 0                        S64_MAX S64_MIN                        -1
> > +      *
> > +      * So s64 values basically start in the middle and then are contiguous
> > +      * to the right of it, wrapping around from -1 to 0, and then
> > +      * finishing as S64_MAX (0x7fffffffffffffff) right before S64_MIN.
> > +      * We can try drawing more visually continuity of u64 vs s64 values as
> > +      * mapped to just actual hex valued range of values.
> > +      *
> > +      *  u64 start                                               u64 end
> > +      *  _______________________________________________________________
> > +      * /                                                               \
> > +      * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
> > +      * |-------------------------------|--------------------------------|
> > +      * 0                        S64_MAX S64_MIN                        -1
> > +      *                                / \
> > +      * >------------------------------   ------------------------------->
> > +      * s64 continues...        s64 end   s64 start          s64 "midpoint"
> > +      *
> > +      * What this means is that in general, we can't always derive
> > +      * something new about u64 from any random s64 range, and vice versa.
> > +      * But we can do that in two particular cases. One is when entire
> > +      * u64/s64 range is *entirely* contained within left half of the above
> > +      * diagram or when it is *entirely* contained in the right half. I.e.:
> > +      *
> > +      * |-------------------------------|--------------------------------|
> > +      *     ^                   ^            ^                 ^
> > +      *     A                   B            C                 D
> > +      *
> > +      * [A, B] and [C, D] are contained entirely in their respective halves
> > +      * and form valid contiguous ranges as both u64 and s64 values. [A, B]
> > +      * will be non-negative both as u64 and s64 (and in fact it will be
> > +      * identical ranges no matter the signedness). [C, D] treated as s64
> > +      * will be a range of negative values, while in u64 it will be
> > +      * non-negative range of values larger than 0x8000000000000000.
> > +      *
> > +      * Now, any other range here can't be represented in both u64 and s64
> > +      * simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid
> > +      * contiguous u64 ranges, but they are discontinuous in s64. [B, C]
> > +      * in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX],
> > +      * for example. Similarly, valid s64 range [D, A] (going from negative
> > +      * to positive values), would be two separate [D, U64_MAX] and [0, A]
> > +      * ranges as u64. Currently reg_state can't represent two segments per
> > +      * numeric domain, so in such situations we can only derive maximal
> > +      * possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX) for s64).
>                                                                   ^
> Nit:                                                      missing bracket
>

it's actually a typo, ) -> ], which is now fixed as well, thanks

> > +      *
> > +      * So we use these facts to derive umin/umax from smin/smax and vice
> > +      * versa only if they stay within the same "half". This is equivalent
> > +      * to checking sign bit: lower half will have sign bit as zero, upper
> > +      * half have sign bit 1. Below in code we simplify this by just
> > +      * casting umin/umax as smin/smax and checking if they form valid
> > +      * range, and vice versa. Those are equivalent checks.
> > +      */
> > +     if ((s64)reg->umin_value <= (s64)reg->umax_value) {
> > +             reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value);
> > +             reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value);
> > +     }
> >       /* Learn sign from signed bounds.
> >        * If we cannot cross the sign boundary, then signed and unsigned bounds
> >        * are the same, so combine.  This works even in the negative case, e.g.
>
>
>
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 857d76694517..bf4193706744 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2358,6 +2358,76 @@  static void __reg32_deduce_bounds(struct bpf_reg_state *reg)
 
 static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
 {
+	/* If u64 range forms a valid s64 range (due to matching sign bit),
+	 * try to learn from that. Let's do a bit of ASCII art to see when
+	 * this is happening. Let's take u64 range first:
+	 *
+	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
+	 * |-------------------------------|--------------------------------|
+	 *
+	 * Valid u64 range is formed when umin and umax are anywhere in this
+	 * range [0, U64_MAX] and umin <= umax. u64 is simple and
+	 * straightforward. Let's where s64 range maps to this simple [0,
+	 * U64_MAX] range, annotated below the line for comparison:
+	 *
+	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
+	 * |-------------------------------|--------------------------------|
+	 * 0                        S64_MAX S64_MIN                        -1
+	 *
+	 * So s64 values basically start in the middle and then are contiguous
+	 * to the right of it, wrapping around from -1 to 0, and then
+	 * finishing as S64_MAX (0x7fffffffffffffff) right before S64_MIN.
+	 * We can try drawing more visually continuity of u64 vs s64 values as
+	 * mapped to just actual hex valued range of values.
+	 *
+	 *  u64 start                                               u64 end
+	 *  _______________________________________________________________
+	 * /                                                               \
+	 * 0             0x7fffffffffffffff 0x8000000000000000        U64_MAX
+	 * |-------------------------------|--------------------------------|
+	 * 0                        S64_MAX S64_MIN                        -1
+	 *                                / \
+	 * >------------------------------   ------------------------------->
+	 * s64 continues...        s64 end   s64 start          s64 "midpoint"
+	 *
+	 * What this means is that in general, we can't always derive
+	 * something new about u64 from any random s64 range, and vice versa.
+	 * But we can do that in two particular cases. One is when entire
+	 * u64/s64 range is *entirely* contained within left half of the above
+	 * diagram or when it is *entirely* contained in the right half. I.e.:
+	 *
+	 * |-------------------------------|--------------------------------|
+	 *     ^                   ^            ^                 ^
+	 *     A                   B            C                 D
+	 *
+	 * [A, B] and [C, D] are contained entirely in their respective halves
+	 * and form valid contiguous ranges as both u64 and s64 values. [A, B]
+	 * will be non-negative both as u64 and s64 (and in fact it will be
+	 * identical ranges no matter the signedness). [C, D] treated as s64
+	 * will be a range of negative values, while in u64 it will be
+	 * non-negative range of values larger than 0x8000000000000000.
+	 *
+	 * Now, any other range here can't be represented in both u64 and s64
+	 * simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid
+	 * contiguous u64 ranges, but they are discontinuous in s64. [B, C]
+	 * in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX],
+	 * for example. Similarly, valid s64 range [D, A] (going from negative
+	 * to positive values), would be two separate [D, U64_MAX] and [0, A]
+	 * ranges as u64. Currently reg_state can't represent two segments per
+	 * numeric domain, so in such situations we can only derive maximal
+	 * possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX) for s64).
+	 *
+	 * So we use these facts to derive umin/umax from smin/smax and vice
+	 * versa only if they stay within the same "half". This is equivalent
+	 * to checking sign bit: lower half will have sign bit as zero, upper
+	 * half have sign bit 1. Below in code we simplify this by just
+	 * casting umin/umax as smin/smax and checking if they form valid
+	 * range, and vice versa. Those are equivalent checks.
+	 */
+	if ((s64)reg->umin_value <= (s64)reg->umax_value) {
+		reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value);
+		reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value);
+	}
 	/* Learn sign from signed bounds.
 	 * If we cannot cross the sign boundary, then signed and unsigned bounds
 	 * are the same, so combine.  This works even in the negative case, e.g.