diff mbox series

[bpf,v2,2/2] bpf: Fix hashtab overflow check on 32-bit arches

Message ID 20240229112250.13723-3-toke@redhat.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Fix hashmap overflow checks for 32-bit arches | expand

Checks

Context Check Description
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-VM_Test-15 fail Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 968 this patch: 968
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 974 this patch: 974
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 985 this patch: 985
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Toke Høiland-Jørgensen Feb. 29, 2024, 11:22 a.m. UTC
The hashtab code relies on roundup_pow_of_two() to compute the number of
hash buckets, and contains an overflow check by checking if the resulting
value is 0. However, on 32-bit arches, the roundup code itself can overflow
by doing a 32-bit left-shift of an unsigned long value, which is undefined
behaviour, so it is not guaranteed to truncate neatly. This was triggered
by syzbot on the DEVMAP_HASH type, which contains the same check, copied
from the hashtab code. So apply the same fix to hashtab, by moving the
overflow check to before the roundup.

The hashtab code also contained a check that prevents the total allocation
size for the buckets from overflowing a 32-bit value, but since all the
allocation code uses u64s, this does not really seem to be necessary, so
drop it and keep only the strict overflow check of the n_buckets variable.

Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 kernel/bpf/hashtab.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Alexei Starovoitov Feb. 29, 2024, 5:07 p.m. UTC | #1
On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> The hashtab code relies on roundup_pow_of_two() to compute the number of
> hash buckets, and contains an overflow check by checking if the resulting
> value is 0. However, on 32-bit arches, the roundup code itself can overflow
> by doing a 32-bit left-shift of an unsigned long value, which is undefined
> behaviour, so it is not guaranteed to truncate neatly. This was triggered
> by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> from the hashtab code. So apply the same fix to hashtab, by moving the
> overflow check to before the roundup.
>
> The hashtab code also contained a check that prevents the total allocation
> size for the buckets from overflowing a 32-bit value, but since all the
> allocation code uses u64s, this does not really seem to be necessary, so
> drop it and keep only the strict overflow check of the n_buckets variable.
>
> Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  kernel/bpf/hashtab.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 03a6a2500b6a..4caf8dab18b0 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>                                                           num_possible_cpus());
>         }
>
> -       /* hash table size must be power of 2 */
> -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>
>         htab->elem_size = sizeof(struct htab_elem) +
>                           round_up(htab->map.key_size, 8);
> @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>                 htab->elem_size += round_up(htab->map.value_size, 8);
>
>         err = -E2BIG;
> -       /* prevent zero size kmalloc and check for u32 overflow */
> -       if (htab->n_buckets == 0 ||
> -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> +       /* prevent overflow in roundup below */
> +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>                 goto free_htab;

No. We cannot artificially reduce max_entries that will break real users.
Hash table with 4B elements is not that uncommon.

pw-bot: cr
John Fastabend Feb. 29, 2024, 10:21 p.m. UTC | #2
Alexei Starovoitov wrote:
> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > The hashtab code relies on roundup_pow_of_two() to compute the number of
> > hash buckets, and contains an overflow check by checking if the resulting
> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> > from the hashtab code. So apply the same fix to hashtab, by moving the
> > overflow check to before the roundup.
> >
> > The hashtab code also contained a check that prevents the total allocation
> > size for the buckets from overflowing a 32-bit value, but since all the
> > allocation code uses u64s, this does not really seem to be necessary, so
> > drop it and keep only the strict overflow check of the n_buckets variable.
> >
> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > ---
> >  kernel/bpf/hashtab.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 03a6a2500b6a..4caf8dab18b0 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >                                                           num_possible_cpus());
> >         }
> >
> > -       /* hash table size must be power of 2 */
> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> >
> >         htab->elem_size = sizeof(struct htab_elem) +
> >                           round_up(htab->map.key_size, 8);
> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >                 htab->elem_size += round_up(htab->map.value_size, 8);
> >
> >         err = -E2BIG;
> > -       /* prevent zero size kmalloc and check for u32 overflow */
> > -       if (htab->n_buckets == 0 ||
> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> > +       /* prevent overflow in roundup below */
> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
> >                 goto free_htab;
> 
> No. We cannot artificially reduce max_entries that will break real users.
> Hash table with 4B elements is not that uncommon.

Agree how about return E2BIG in these cases (32bit arch and overflow) and 
let user figure it out. That makes more sense to me.

> 
> pw-bot: cr
Toke Høiland-Jørgensen March 1, 2024, 12:35 p.m. UTC | #3
John Fastabend <john.fastabend@gmail.com> writes:

> Alexei Starovoitov wrote:
>> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > The hashtab code relies on roundup_pow_of_two() to compute the number of
>> > hash buckets, and contains an overflow check by checking if the resulting
>> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
>> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
>> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
>> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
>> > from the hashtab code. So apply the same fix to hashtab, by moving the
>> > overflow check to before the roundup.
>> >
>> > The hashtab code also contained a check that prevents the total allocation
>> > size for the buckets from overflowing a 32-bit value, but since all the
>> > allocation code uses u64s, this does not really seem to be necessary, so
>> > drop it and keep only the strict overflow check of the n_buckets variable.
>> >
>> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
>> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> > ---
>> >  kernel/bpf/hashtab.c | 10 +++++-----
>> >  1 file changed, 5 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> > index 03a6a2500b6a..4caf8dab18b0 100644
>> > --- a/kernel/bpf/hashtab.c
>> > +++ b/kernel/bpf/hashtab.c
>> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >                                                           num_possible_cpus());
>> >         }
>> >
>> > -       /* hash table size must be power of 2 */
>> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> >
>> >         htab->elem_size = sizeof(struct htab_elem) +
>> >                           round_up(htab->map.key_size, 8);
>> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >                 htab->elem_size += round_up(htab->map.value_size, 8);
>> >
>> >         err = -E2BIG;
>> > -       /* prevent zero size kmalloc and check for u32 overflow */
>> > -       if (htab->n_buckets == 0 ||
>> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
>> > +       /* prevent overflow in roundup below */
>> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>> >                 goto free_htab;
>> 
>> No. We cannot artificially reduce max_entries that will break real users.
>> Hash table with 4B elements is not that uncommon.

Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
bucket) check, which limits max_entries to 134M (0x8000000). This patch
is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
0x80000000).

> Agree how about return E2BIG in these cases (32bit arch and overflow) and 
> let user figure it out. That makes more sense to me.

Isn't that exactly what this patch does? What am I missing here?

-Toke
Alexei Starovoitov March 1, 2024, 5:15 p.m. UTC | #4
On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> John Fastabend <john.fastabend@gmail.com> writes:
>
> > Alexei Starovoitov wrote:
> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >
> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
> >> > hash buckets, and contains an overflow check by checking if the resulting
> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
> >> > overflow check to before the roundup.
> >> >
> >> > The hashtab code also contained a check that prevents the total allocation
> >> > size for the buckets from overflowing a 32-bit value, but since all the
> >> > allocation code uses u64s, this does not really seem to be necessary, so
> >> > drop it and keep only the strict overflow check of the n_buckets variable.
> >> >
> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> > ---
> >> >  kernel/bpf/hashtab.c | 10 +++++-----
> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> > index 03a6a2500b6a..4caf8dab18b0 100644
> >> > --- a/kernel/bpf/hashtab.c
> >> > +++ b/kernel/bpf/hashtab.c
> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >                                                           num_possible_cpus());
> >> >         }
> >> >
> >> > -       /* hash table size must be power of 2 */
> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> >> >
> >> >         htab->elem_size = sizeof(struct htab_elem) +
> >> >                           round_up(htab->map.key_size, 8);
> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
> >> >
> >> >         err = -E2BIG;
> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
> >> > -       if (htab->n_buckets == 0 ||
> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> >> > +       /* prevent overflow in roundup below */
> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
> >> >                 goto free_htab;
> >>
> >> No. We cannot artificially reduce max_entries that will break real users.
> >> Hash table with 4B elements is not that uncommon.
>
> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
> bucket) check, which limits max_entries to 134M (0x8000000). This patch
> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
> 0x80000000).
>
> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
> > let user figure it out. That makes more sense to me.
>
> Isn't that exactly what this patch does? What am I missing here?

I see. Then what are you fixing?
roundup_pow_of_two() will return 0 and existing code is fine as-is.
John Fastabend March 1, 2024, 5:21 p.m. UTC | #5
Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Alexei Starovoitov wrote:
> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >
> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
> >> > hash buckets, and contains an overflow check by checking if the resulting
> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
> >> > overflow check to before the roundup.
> >> >
> >> > The hashtab code also contained a check that prevents the total allocation
> >> > size for the buckets from overflowing a 32-bit value, but since all the
> >> > allocation code uses u64s, this does not really seem to be necessary, so
> >> > drop it and keep only the strict overflow check of the n_buckets variable.
> >> >
> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> > ---

Acked-by: John Fastabend <john.fastabend@gmail.com>

> >> >  kernel/bpf/hashtab.c | 10 +++++-----
> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> > index 03a6a2500b6a..4caf8dab18b0 100644
> >> > --- a/kernel/bpf/hashtab.c
> >> > +++ b/kernel/bpf/hashtab.c
> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >                                                           num_possible_cpus());
> >> >         }
> >> >
> >> > -       /* hash table size must be power of 2 */
> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> >> >
> >> >         htab->elem_size = sizeof(struct htab_elem) +
> >> >                           round_up(htab->map.key_size, 8);
> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
> >> >
> >> >         err = -E2BIG;
> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
> >> > -       if (htab->n_buckets == 0 ||
> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> >> > +       /* prevent overflow in roundup below */
> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
> >> >                 goto free_htab;
> >> 
> >> No. We cannot artificially reduce max_entries that will break real users.
> >> Hash table with 4B elements is not that uncommon.
> 
> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
> bucket) check, which limits max_entries to 134M (0x8000000). This patch
> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
> 0x80000000).

Yep. From my side makes sense ACK for me. Maybe put it in the commit message
if it wasn't obvious. 


> 
> > Agree how about return E2BIG in these cases (32bit arch and overflow) and 
> > let user figure it out. That makes more sense to me.
> 
> Isn't that exactly what this patch does? What am I missing here?

Nothing it was me must have been tired. Sorry about that.

> 
> -Toke
>
Toke Høiland-Jørgensen March 4, 2024, 1:02 p.m. UTC | #6
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> John Fastabend <john.fastabend@gmail.com> writes:
>>
>> > Alexei Starovoitov wrote:
>> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >
>> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
>> >> > hash buckets, and contains an overflow check by checking if the resulting
>> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
>> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
>> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
>> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
>> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
>> >> > overflow check to before the roundup.
>> >> >
>> >> > The hashtab code also contained a check that prevents the total allocation
>> >> > size for the buckets from overflowing a 32-bit value, but since all the
>> >> > allocation code uses u64s, this does not really seem to be necessary, so
>> >> > drop it and keep only the strict overflow check of the n_buckets variable.
>> >> >
>> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
>> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> > ---
>> >> >  kernel/bpf/hashtab.c | 10 +++++-----
>> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
>> >> >
>> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> >> > index 03a6a2500b6a..4caf8dab18b0 100644
>> >> > --- a/kernel/bpf/hashtab.c
>> >> > +++ b/kernel/bpf/hashtab.c
>> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >                                                           num_possible_cpus());
>> >> >         }
>> >> >
>> >> > -       /* hash table size must be power of 2 */
>> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> >> >
>> >> >         htab->elem_size = sizeof(struct htab_elem) +
>> >> >                           round_up(htab->map.key_size, 8);
>> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
>> >> >
>> >> >         err = -E2BIG;
>> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
>> >> > -       if (htab->n_buckets == 0 ||
>> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
>> >> > +       /* prevent overflow in roundup below */
>> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>> >> >                 goto free_htab;
>> >>
>> >> No. We cannot artificially reduce max_entries that will break real users.
>> >> Hash table with 4B elements is not that uncommon.
>>
>> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
>> bucket) check, which limits max_entries to 134M (0x8000000). This patch
>> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
>> 0x80000000).
>>
>> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
>> > let user figure it out. That makes more sense to me.
>>
>> Isn't that exactly what this patch does? What am I missing here?
>
> I see. Then what are you fixing?
> roundup_pow_of_two() will return 0 and existing code is fine as-is.

On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
UB, so there's no guarantee that it truncates down to 0. And it seems at
least on arm32 it does not: syzbot managed to trigger a crash in the
DEVMAP_HASH code by creating a map with more than 0x80000000 entries:

https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com

This patch just preemptively applies the same fix to the hashtab code,
since I could not find any reason why it shouldn't be possible to hit
the same issue there. I haven't actually managed to trigger a crash
there, though (I don't have any arm32 hardware to test this on), so in
that sense it's a bit theoretical for hashtab. So up to you if you want
to take this, but even if you don't, could you please apply the first
patch? That does fix the issue reported by syzbot (cf the
reported-and-tested-by tag).

-Toke
Alexei Starovoitov March 6, 2024, 5:29 a.m. UTC | #7
On Mon, Mar 4, 2024 at 5:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> John Fastabend <john.fastabend@gmail.com> writes:
> >>
> >> > Alexei Starovoitov wrote:
> >> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >> >
> >> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
> >> >> > hash buckets, and contains an overflow check by checking if the resulting
> >> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
> >> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
> >> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
> >> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> >> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
> >> >> > overflow check to before the roundup.
> >> >> >
> >> >> > The hashtab code also contained a check that prevents the total allocation
> >> >> > size for the buckets from overflowing a 32-bit value, but since all the
> >> >> > allocation code uses u64s, this does not really seem to be necessary, so
> >> >> > drop it and keep only the strict overflow check of the n_buckets variable.
> >> >> >
> >> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> >> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> > ---
> >> >> >  kernel/bpf/hashtab.c | 10 +++++-----
> >> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >> >> >
> >> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> >> > index 03a6a2500b6a..4caf8dab18b0 100644
> >> >> > --- a/kernel/bpf/hashtab.c
> >> >> > +++ b/kernel/bpf/hashtab.c
> >> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >> >                                                           num_possible_cpus());
> >> >> >         }
> >> >> >
> >> >> > -       /* hash table size must be power of 2 */
> >> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> >> >> >
> >> >> >         htab->elem_size = sizeof(struct htab_elem) +
> >> >> >                           round_up(htab->map.key_size, 8);
> >> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
> >> >> >
> >> >> >         err = -E2BIG;
> >> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
> >> >> > -       if (htab->n_buckets == 0 ||
> >> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> >> >> > +       /* prevent overflow in roundup below */
> >> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
> >> >> >                 goto free_htab;
> >> >>
> >> >> No. We cannot artificially reduce max_entries that will break real users.
> >> >> Hash table with 4B elements is not that uncommon.
> >>
> >> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
> >> bucket) check, which limits max_entries to 134M (0x8000000). This patch
> >> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
> >> 0x80000000).
> >>
> >> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
> >> > let user figure it out. That makes more sense to me.
> >>
> >> Isn't that exactly what this patch does? What am I missing here?
> >
> > I see. Then what are you fixing?
> > roundup_pow_of_two() will return 0 and existing code is fine as-is.
>
> On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
> 32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
> UB, so there's no guarantee that it truncates down to 0. And it seems at
> least on arm32 it does not: syzbot managed to trigger a crash in the
> DEVMAP_HASH code by creating a map with more than 0x80000000 entries:
>
> https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
>
> This patch just preemptively applies the same fix to the hashtab code,
> since I could not find any reason why it shouldn't be possible to hit
> the same issue there. I haven't actually managed to trigger a crash
> there, though (I don't have any arm32 hardware to test this on), so in
> that sense it's a bit theoretical for hashtab. So up to you if you want
> to take this, but even if you don't, could you please apply the first
> patch? That does fix the issue reported by syzbot (cf the
> reported-and-tested-by tag).

I see.
Since roundup_pow_of_two() is non deterministic on 32-bit archs,
let's fix them all.

We have at least 5 to fix:
bloom_filter.c:                 nr_bits = roundup_pow_of_two(nr_bits);
devmap.c:               dtab->n_buckets =
roundup_pow_of_two(dtab->map.max_entries);
hashtab.c:      htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
stackmap.c:     n_buckets = roundup_pow_of_two(attr->max_entries);

hashtab.c:           htab->map.max_entries = roundup(attr->max_entries,
                                                num_possible_cpus());

bloom_filter looks ok as-is,
but stack_map has the same issue as devmap and hashtab.

Let's check for
if (max_entries > (1u << 31))
in 3 maps and that should be enough to cover all 5 cases?

imo 1u << 31 is much easier to visualize than U32_MAX/2+1

and don't touch other checks.
This patch is removing U32_MAX / sizeof(struct bucket) check
and with that introduces overflow just few lines below in bpf_map_area_alloc.
Toke Høiland-Jørgensen March 6, 2024, 10:32 a.m. UTC | #8
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Mar 4, 2024 at 5:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> John Fastabend <john.fastabend@gmail.com> writes:
>> >>
>> >> > Alexei Starovoitov wrote:
>> >> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >> >
>> >> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
>> >> >> > hash buckets, and contains an overflow check by checking if the resulting
>> >> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
>> >> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
>> >> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
>> >> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
>> >> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
>> >> >> > overflow check to before the roundup.
>> >> >> >
>> >> >> > The hashtab code also contained a check that prevents the total allocation
>> >> >> > size for the buckets from overflowing a 32-bit value, but since all the
>> >> >> > allocation code uses u64s, this does not really seem to be necessary, so
>> >> >> > drop it and keep only the strict overflow check of the n_buckets variable.
>> >> >> >
>> >> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
>> >> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> > ---
>> >> >> >  kernel/bpf/hashtab.c | 10 +++++-----
>> >> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
>> >> >> >
>> >> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> >> >> > index 03a6a2500b6a..4caf8dab18b0 100644
>> >> >> > --- a/kernel/bpf/hashtab.c
>> >> >> > +++ b/kernel/bpf/hashtab.c
>> >> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >                                                           num_possible_cpus());
>> >> >> >         }
>> >> >> >
>> >> >> > -       /* hash table size must be power of 2 */
>> >> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> >> >> >
>> >> >> >         htab->elem_size = sizeof(struct htab_elem) +
>> >> >> >                           round_up(htab->map.key_size, 8);
>> >> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
>> >> >> >
>> >> >> >         err = -E2BIG;
>> >> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
>> >> >> > -       if (htab->n_buckets == 0 ||
>> >> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
>> >> >> > +       /* prevent overflow in roundup below */
>> >> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>> >> >> >                 goto free_htab;
>> >> >>
>> >> >> No. We cannot artificially reduce max_entries that will break real users.
>> >> >> Hash table with 4B elements is not that uncommon.
>> >>
>> >> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
>> >> bucket) check, which limits max_entries to 134M (0x8000000). This patch
>> >> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
>> >> 0x80000000).
>> >>
>> >> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
>> >> > let user figure it out. That makes more sense to me.
>> >>
>> >> Isn't that exactly what this patch does? What am I missing here?
>> >
>> > I see. Then what are you fixing?
>> > roundup_pow_of_two() will return 0 and existing code is fine as-is.
>>
>> On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
>> 32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
>> UB, so there's no guarantee that it truncates down to 0. And it seems at
>> least on arm32 it does not: syzbot managed to trigger a crash in the
>> DEVMAP_HASH code by creating a map with more than 0x80000000 entries:
>>
>> https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
>>
>> This patch just preemptively applies the same fix to the hashtab code,
>> since I could not find any reason why it shouldn't be possible to hit
>> the same issue there. I haven't actually managed to trigger a crash
>> there, though (I don't have any arm32 hardware to test this on), so in
>> that sense it's a bit theoretical for hashtab. So up to you if you want
>> to take this, but even if you don't, could you please apply the first
>> patch? That does fix the issue reported by syzbot (cf the
>> reported-and-tested-by tag).
>
> I see.
> Since roundup_pow_of_two() is non deterministic on 32-bit archs,
> let's fix them all.
>
> We have at least 5 to fix:
> bloom_filter.c:                 nr_bits = roundup_pow_of_two(nr_bits);
> devmap.c:               dtab->n_buckets =
> roundup_pow_of_two(dtab->map.max_entries);
> hashtab.c:      htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> stackmap.c:     n_buckets = roundup_pow_of_two(attr->max_entries);
>
> hashtab.c:           htab->map.max_entries = roundup(attr->max_entries,
>                                                 num_possible_cpus());
>
> bloom_filter looks ok as-is,
> but stack_map has the same issue as devmap and hashtab.
>
> Let's check for
> if (max_entries > (1u << 31))
> in 3 maps and that should be enough to cover all 5 cases?
>
> imo 1u << 31 is much easier to visualize than U32_MAX/2+1
>
> and don't touch other checks.
> This patch is removing U32_MAX / sizeof(struct bucket) check
> and with that introduces overflow just few lines below in bpf_map_area_alloc.

Are you sure there's an overflow there? I did look at that and concluded
that since bpf_map_area_alloc() uses a u64 for the size that it would
not actually overflow even with n_buckets == 1<<31. There's a check in
__bpf_map_area_alloc() for the size:

	if (size >= SIZE_MAX)
		return NULL;

with

#define SIZE_MAX	(~(size_t)0)

in limits.h. So if sizeof(size_t) == 4, that check against SIZE_MAX
should trip and the allocation will just fail; but there's no overflow
anywhere AFAICT?

Anyway, I'm OK with keeping the check; I'll respin with the changed
constant and add the check to stackmap.c as well.

-Toke
Alexei Starovoitov March 6, 2024, 4:53 p.m. UTC | #9
On Wed, Mar 6, 2024 at 2:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Mar 4, 2024 at 5:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> John Fastabend <john.fastabend@gmail.com> writes:
> >> >>
> >> >> > Alexei Starovoitov wrote:
> >> >> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >> >> >
> >> >> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
> >> >> >> > hash buckets, and contains an overflow check by checking if the resulting
> >> >> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
> >> >> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
> >> >> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
> >> >> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
> >> >> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
> >> >> >> > overflow check to before the roundup.
> >> >> >> >
> >> >> >> > The hashtab code also contained a check that prevents the total allocation
> >> >> >> > size for the buckets from overflowing a 32-bit value, but since all the
> >> >> >> > allocation code uses u64s, this does not really seem to be necessary, so
> >> >> >> > drop it and keep only the strict overflow check of the n_buckets variable.
> >> >> >> >
> >> >> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
> >> >> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> > ---
> >> >> >> >  kernel/bpf/hashtab.c | 10 +++++-----
> >> >> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> >> >> > index 03a6a2500b6a..4caf8dab18b0 100644
> >> >> >> > --- a/kernel/bpf/hashtab.c
> >> >> >> > +++ b/kernel/bpf/hashtab.c
> >> >> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >> >> >                                                           num_possible_cpus());
> >> >> >> >         }
> >> >> >> >
> >> >> >> > -       /* hash table size must be power of 2 */
> >> >> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> >> >> >> >
> >> >> >> >         htab->elem_size = sizeof(struct htab_elem) +
> >> >> >> >                           round_up(htab->map.key_size, 8);
> >> >> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >> >> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
> >> >> >> >
> >> >> >> >         err = -E2BIG;
> >> >> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
> >> >> >> > -       if (htab->n_buckets == 0 ||
> >> >> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
> >> >> >> > +       /* prevent overflow in roundup below */
> >> >> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
> >> >> >> >                 goto free_htab;
> >> >> >>
> >> >> >> No. We cannot artificially reduce max_entries that will break real users.
> >> >> >> Hash table with 4B elements is not that uncommon.
> >> >>
> >> >> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
> >> >> bucket) check, which limits max_entries to 134M (0x8000000). This patch
> >> >> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
> >> >> 0x80000000).
> >> >>
> >> >> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
> >> >> > let user figure it out. That makes more sense to me.
> >> >>
> >> >> Isn't that exactly what this patch does? What am I missing here?
> >> >
> >> > I see. Then what are you fixing?
> >> > roundup_pow_of_two() will return 0 and existing code is fine as-is.
> >>
> >> On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
> >> 32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
> >> UB, so there's no guarantee that it truncates down to 0. And it seems at
> >> least on arm32 it does not: syzbot managed to trigger a crash in the
> >> DEVMAP_HASH code by creating a map with more than 0x80000000 entries:
> >>
> >> https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
> >>
> >> This patch just preemptively applies the same fix to the hashtab code,
> >> since I could not find any reason why it shouldn't be possible to hit
> >> the same issue there. I haven't actually managed to trigger a crash
> >> there, though (I don't have any arm32 hardware to test this on), so in
> >> that sense it's a bit theoretical for hashtab. So up to you if you want
> >> to take this, but even if you don't, could you please apply the first
> >> patch? That does fix the issue reported by syzbot (cf the
> >> reported-and-tested-by tag).
> >
> > I see.
> > Since roundup_pow_of_two() is non deterministic on 32-bit archs,
> > let's fix them all.
> >
> > We have at least 5 to fix:
> > bloom_filter.c:                 nr_bits = roundup_pow_of_two(nr_bits);
> > devmap.c:               dtab->n_buckets =
> > roundup_pow_of_two(dtab->map.max_entries);
> > hashtab.c:      htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
> > stackmap.c:     n_buckets = roundup_pow_of_two(attr->max_entries);
> >
> > hashtab.c:           htab->map.max_entries = roundup(attr->max_entries,
> >                                                 num_possible_cpus());
> >
> > bloom_filter looks ok as-is,
> > but stack_map has the same issue as devmap and hashtab.
> >
> > Let's check for
> > if (max_entries > (1u << 31))
> > in 3 maps and that should be enough to cover all 5 cases?
> >
> > imo 1u << 31 is much easier to visualize than U32_MAX/2+1
> >
> > and don't touch other checks.
> > This patch is removing U32_MAX / sizeof(struct bucket) check
> > and with that introduces overflow just few lines below in bpf_map_area_alloc.
>
> Are you sure there's an overflow there? I did look at that and concluded
> that since bpf_map_area_alloc() uses a u64 for the size that it would
> not actually overflow even with n_buckets == 1<<31. There's a check in
> __bpf_map_area_alloc() for the size:
>
>         if (size >= SIZE_MAX)
>                 return NULL;
>
> with
>
> #define SIZE_MAX        (~(size_t)0)
>
> in limits.h. So if sizeof(size_t) == 4, that check against SIZE_MAX
> should trip and the allocation will just fail; but there's no overflow
> anywhere AFAICT?

There is an overflow _before_ it calls into bpf_map_area_alloc().
Here is the line:
        htab->buckets = bpf_map_area_alloc(htab->n_buckets *
                                           sizeof(struct bucket),
                                           htab->map.numa_node);
that's why we have:
if (htab->n_buckets > U32_MAX / sizeof(struct bucket))
before that.


> Anyway, I'm OK with keeping the check; I'll respin with the changed
> constant and add the check to stackmap.c as well.

Thanks!
Toke Høiland-Jørgensen March 7, 2024, noon UTC | #10
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Mar 6, 2024 at 2:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Mon, Mar 4, 2024 at 5:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> >>
>> >> > On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> John Fastabend <john.fastabend@gmail.com> writes:
>> >> >>
>> >> >> > Alexei Starovoitov wrote:
>> >> >> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >> >> >
>> >> >> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
>> >> >> >> > hash buckets, and contains an overflow check by checking if the resulting
>> >> >> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
>> >> >> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
>> >> >> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
>> >> >> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
>> >> >> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
>> >> >> >> > overflow check to before the roundup.
>> >> >> >> >
>> >> >> >> > The hashtab code also contained a check that prevents the total allocation
>> >> >> >> > size for the buckets from overflowing a 32-bit value, but since all the
>> >> >> >> > allocation code uses u64s, this does not really seem to be necessary, so
>> >> >> >> > drop it and keep only the strict overflow check of the n_buckets variable.
>> >> >> >> >
>> >> >> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
>> >> >> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > ---
>> >> >> >> >  kernel/bpf/hashtab.c | 10 +++++-----
>> >> >> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
>> >> >> >> >
>> >> >> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> >> >> >> > index 03a6a2500b6a..4caf8dab18b0 100644
>> >> >> >> > --- a/kernel/bpf/hashtab.c
>> >> >> >> > +++ b/kernel/bpf/hashtab.c
>> >> >> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >> >                                                           num_possible_cpus());
>> >> >> >> >         }
>> >> >> >> >
>> >> >> >> > -       /* hash table size must be power of 2 */
>> >> >> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> >> >> >> >
>> >> >> >> >         htab->elem_size = sizeof(struct htab_elem) +
>> >> >> >> >                           round_up(htab->map.key_size, 8);
>> >> >> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
>> >> >> >> >
>> >> >> >> >         err = -E2BIG;
>> >> >> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
>> >> >> >> > -       if (htab->n_buckets == 0 ||
>> >> >> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
>> >> >> >> > +       /* prevent overflow in roundup below */
>> >> >> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>> >> >> >> >                 goto free_htab;
>> >> >> >>
>> >> >> >> No. We cannot artificially reduce max_entries that will break real users.
>> >> >> >> Hash table with 4B elements is not that uncommon.
>> >> >>
>> >> >> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
>> >> >> bucket) check, which limits max_entries to 134M (0x8000000). This patch
>> >> >> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
>> >> >> 0x80000000).
>> >> >>
>> >> >> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
>> >> >> > let user figure it out. That makes more sense to me.
>> >> >>
>> >> >> Isn't that exactly what this patch does? What am I missing here?
>> >> >
>> >> > I see. Then what are you fixing?
>> >> > roundup_pow_of_two() will return 0 and existing code is fine as-is.
>> >>
>> >> On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
>> >> 32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
>> >> UB, so there's no guarantee that it truncates down to 0. And it seems at
>> >> least on arm32 it does not: syzbot managed to trigger a crash in the
>> >> DEVMAP_HASH code by creating a map with more than 0x80000000 entries:
>> >>
>> >> https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
>> >>
>> >> This patch just preemptively applies the same fix to the hashtab code,
>> >> since I could not find any reason why it shouldn't be possible to hit
>> >> the same issue there. I haven't actually managed to trigger a crash
>> >> there, though (I don't have any arm32 hardware to test this on), so in
>> >> that sense it's a bit theoretical for hashtab. So up to you if you want
>> >> to take this, but even if you don't, could you please apply the first
>> >> patch? That does fix the issue reported by syzbot (cf the
>> >> reported-and-tested-by tag).
>> >
>> > I see.
>> > Since roundup_pow_of_two() is non deterministic on 32-bit archs,
>> > let's fix them all.
>> >
>> > We have at least 5 to fix:
>> > bloom_filter.c:                 nr_bits = roundup_pow_of_two(nr_bits);
>> > devmap.c:               dtab->n_buckets =
>> > roundup_pow_of_two(dtab->map.max_entries);
>> > hashtab.c:      htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> > stackmap.c:     n_buckets = roundup_pow_of_two(attr->max_entries);
>> >
>> > hashtab.c:           htab->map.max_entries = roundup(attr->max_entries,
>> >                                                 num_possible_cpus());
>> >
>> > bloom_filter looks ok as-is,
>> > but stack_map has the same issue as devmap and hashtab.
>> >
>> > Let's check for
>> > if (max_entries > (1u << 31))
>> > in 3 maps and that should be enough to cover all 5 cases?
>> >
>> > imo 1u << 31 is much easier to visualize than U32_MAX/2+1
>> >
>> > and don't touch other checks.
>> > This patch is removing U32_MAX / sizeof(struct bucket) check
>> > and with that introduces overflow just few lines below in bpf_map_area_alloc.
>>
>> Are you sure there's an overflow there? I did look at that and concluded
>> that since bpf_map_area_alloc() uses a u64 for the size that it would
>> not actually overflow even with n_buckets == 1<<31. There's a check in
>> __bpf_map_area_alloc() for the size:
>>
>>         if (size >= SIZE_MAX)
>>                 return NULL;
>>
>> with
>>
>> #define SIZE_MAX        (~(size_t)0)
>>
>> in limits.h. So if sizeof(size_t) == 4, that check against SIZE_MAX
>> should trip and the allocation will just fail; but there's no overflow
>> anywhere AFAICT?
>
> There is an overflow _before_ it calls into bpf_map_area_alloc().
> Here is the line:
>         htab->buckets = bpf_map_area_alloc(htab->n_buckets *
>                                            sizeof(struct bucket),
>                                            htab->map.numa_node);
> that's why we have:
> if (htab->n_buckets > U32_MAX / sizeof(struct bucket))
> before that.

Ah, right. I was assuming that the compiler was smart enough to
implicitly convert that into the type of the function parameter before
doing the multiplication, but of course that's not the case. Thanks!

-Toke
diff mbox series

Patch

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 03a6a2500b6a..4caf8dab18b0 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -499,8 +499,6 @@  static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 							  num_possible_cpus());
 	}
 
-	/* hash table size must be power of 2 */
-	htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
 
 	htab->elem_size = sizeof(struct htab_elem) +
 			  round_up(htab->map.key_size, 8);
@@ -510,11 +508,13 @@  static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		htab->elem_size += round_up(htab->map.value_size, 8);
 
 	err = -E2BIG;
-	/* prevent zero size kmalloc and check for u32 overflow */
-	if (htab->n_buckets == 0 ||
-	    htab->n_buckets > U32_MAX / sizeof(struct bucket))
+	/* prevent overflow in roundup below */
+	if (htab->map.max_entries > U32_MAX / 2 + 1)
 		goto free_htab;
 
+	/* hash table size must be power of 2 */
+	htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
+
 	err = bpf_map_init_elem_count(&htab->map);
 	if (err)
 		goto free_htab;