diff mbox series

[bpf,v2,1/2] bpf: fix a bpf_timer initialization issue

Message ID 20220211152059.1584701-1-yhs@fb.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: fix a bpf_timer initialization issue | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1436 this patch: 1436
netdev/cc_maintainers fail 1 blamed authors not CCed: kafai@fb.com; 8 maintainers not CCed: kpsingh@kernel.org john.fastabend@gmail.com kafai@fb.com songliubraving@fb.com nathan@kernel.org llvm@lists.linux.dev netdev@vger.kernel.org ndesaulniers@google.com
netdev/build_clang success Errors and warnings before: 185 this patch: 185
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1453 this patch: 1453
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf success VM_Test
bpf/vmtest-bpf-PR success PR summary

Commit Message

Yonghong Song Feb. 11, 2022, 3:20 p.m. UTC
The patch in [1] intends to fix a bpf_timer related issue,
but the fix caused existing 'timer' selftest to fail with
hang or some random errors. After some debug, I found
an issue with check_and_init_map_value() in the hashtab.c.
More specifically, in hashtab.c, we have code
  l_new = bpf_map_kmalloc_node(&htab->map, ...)
  check_and_init_map_value(&htab->map, l_new...)
Note that bpf_map_kmalloc_node() does not do initialization
so l_new contains random value.

The function check_and_init_map_value() intends to zero the
bpf_spin_lock and bpf_timer if they exist in the map.
But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
With [1], later copy_map_value() skips copying of
bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
random failures for 'timer' selftest.
Without [1], for both bpf_spin_lock and bpf_timer case,
bpf_timer will be zero'ed, so 'timer' self test is okay.

For check_and_init_map_value(), why bpf_spin_lock is zero'ed
properly while bpf_timer not. In bpf uapi header, we have
  struct bpf_spin_lock {
        __u32   val;
  };
  struct bpf_timer {
        __u64 :64;
        __u64 :64;
  } __attribute__((aligned(8)));

The initialization code:
  *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
      (struct bpf_spin_lock){};
  *(struct bpf_timer *)(dst + map->timer_off) =
      (struct bpf_timer){};
It appears the compiler has no obligation to initialize anonymous fields.
For example, let us use clang with bpf target as below:
  $ cat t.c
  struct bpf_timer {
        unsigned long long :64;
  };
  struct bpf_timer2 {
        unsigned long long a;
  };

  void test(struct bpf_timer *t) {
    *t = (struct bpf_timer){};
  }
  void test2(struct bpf_timer2 *t) {
    *t = (struct bpf_timer2){};
  }
  $ clang -target bpf -O2 -c -g t.c
  $ llvm-objdump -d t.o
   ...
   0000000000000000 <test>:
       0:       95 00 00 00 00 00 00 00 exit
   0000000000000008 <test2>:
       1:       b7 02 00 00 00 00 00 00 r2 = 0
       2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
       3:       95 00 00 00 00 00 00 00 exit

To fix the problem, let use memset for bpf_timer case in
check_and_init_map_value(). For consistency, memset is also
used for bpf_spin_lock case.

  [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/

Fixes: 68134668c17f3 ("bpf: Add map side support for bpf timers.")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Alexei Starovoitov Feb. 11, 2022, 4:47 p.m. UTC | #1
On Fri, Feb 11, 2022 at 7:21 AM Yonghong Song <yhs@fb.com> wrote:
>
>   struct bpf_spin_lock {
>         __u32   val;
>   };
>   struct bpf_timer {
>         __u64 :64;
>         __u64 :64;
>   } __attribute__((aligned(8)));
>
> The initialization code:
>   *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
>       (struct bpf_spin_lock){};
>   *(struct bpf_timer *)(dst + map->timer_off) =
>       (struct bpf_timer){};
> It appears the compiler has no obligation to initialize anonymous fields.
> For example, let us use clang with bpf target as below:
>   $ cat t.c
>   struct bpf_timer {
>         unsigned long long :64;
>   };
>   struct bpf_timer2 {
>         unsigned long long a;
>   };
>
>   void test(struct bpf_timer *t) {
>     *t = (struct bpf_timer){};
>   }
>   void test2(struct bpf_timer2 *t) {
>     *t = (struct bpf_timer2){};
>   }
>   $ clang -target bpf -O2 -c -g t.c
>   $ llvm-objdump -d t.o
>    ...
>    0000000000000000 <test>:
>        0:       95 00 00 00 00 00 00 00 exit
>    0000000000000008 <test2>:
>        1:       b7 02 00 00 00 00 00 00 r2 = 0
>        2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
>        3:       95 00 00 00 00 00 00 00 exit

wow!
Is this a clang only behavior or gcc does the same "smart" optimization?

We've seen this issue with padding, but I could have never guessed
that compiler will do so for explicit anon fields.
I wonder what standard says and what other kernel code is broken
by this "optimization".
Yonghong Song Feb. 11, 2022, 5 p.m. UTC | #2
On 2/11/22 8:47 AM, Alexei Starovoitov wrote:
> On Fri, Feb 11, 2022 at 7:21 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>    struct bpf_spin_lock {
>>          __u32   val;
>>    };
>>    struct bpf_timer {
>>          __u64 :64;
>>          __u64 :64;
>>    } __attribute__((aligned(8)));
>>
>> The initialization code:
>>    *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
>>        (struct bpf_spin_lock){};
>>    *(struct bpf_timer *)(dst + map->timer_off) =
>>        (struct bpf_timer){};
>> It appears the compiler has no obligation to initialize anonymous fields.
>> For example, let us use clang with bpf target as below:
>>    $ cat t.c
>>    struct bpf_timer {
>>          unsigned long long :64;
>>    };
>>    struct bpf_timer2 {
>>          unsigned long long a;
>>    };
>>
>>    void test(struct bpf_timer *t) {
>>      *t = (struct bpf_timer){};
>>    }
>>    void test2(struct bpf_timer2 *t) {
>>      *t = (struct bpf_timer2){};
>>    }
>>    $ clang -target bpf -O2 -c -g t.c
>>    $ llvm-objdump -d t.o
>>     ...
>>     0000000000000000 <test>:
>>         0:       95 00 00 00 00 00 00 00 exit
>>     0000000000000008 <test2>:
>>         1:       b7 02 00 00 00 00 00 00 r2 = 0
>>         2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
>>         3:       95 00 00 00 00 00 00 00 exit
> 
> wow!
> Is this a clang only behavior or gcc does the same "smart" optimization?

gcc seems okay for my above test.

$ /home/yhs/work/gcc2/gcc11-2/install/bin/gcc --version
gcc (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ /home/yhs/work/gcc2/gcc11-2/install/bin/gcc -O2 -c t.c
$ llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test>:
        0: 48 c7 07 00 00 00 00          movq    $0, (%rdi)
        7: c3                            retq
        8: 0f 1f 84 00 00 00 00 00       nopl    (%rax,%rax)

0000000000000010 <test2>:
       10: 48 c7 07 00 00 00 00          movq    $0, (%rdi)
       17: c3                            retq
[yhs@devbig309.ftw3 ~/tmp2]

> 
> We've seen this issue with padding, but I could have never guessed
> that compiler will do so for explicit anon fields.
> I wonder what standard says and what other kernel code is broken
> by this "optimization".

Not familiar with the standard. Need to dig out.
Yonghong Song Feb. 11, 2022, 5:32 p.m. UTC | #3
On 2/11/22 8:47 AM, Alexei Starovoitov wrote:
> On Fri, Feb 11, 2022 at 7:21 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>    struct bpf_spin_lock {
>>          __u32   val;
>>    };
>>    struct bpf_timer {
>>          __u64 :64;
>>          __u64 :64;
>>    } __attribute__((aligned(8)));
>>
>> The initialization code:
>>    *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
>>        (struct bpf_spin_lock){};
>>    *(struct bpf_timer *)(dst + map->timer_off) =
>>        (struct bpf_timer){};
>> It appears the compiler has no obligation to initialize anonymous fields.
>> For example, let us use clang with bpf target as below:
>>    $ cat t.c
>>    struct bpf_timer {
>>          unsigned long long :64;
>>    };
>>    struct bpf_timer2 {
>>          unsigned long long a;
>>    };
>>
>>    void test(struct bpf_timer *t) {
>>      *t = (struct bpf_timer){};
>>    }
>>    void test2(struct bpf_timer2 *t) {
>>      *t = (struct bpf_timer2){};
>>    }
>>    $ clang -target bpf -O2 -c -g t.c
>>    $ llvm-objdump -d t.o
>>     ...
>>     0000000000000000 <test>:
>>         0:       95 00 00 00 00 00 00 00 exit
>>     0000000000000008 <test2>:
>>         1:       b7 02 00 00 00 00 00 00 r2 = 0
>>         2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
>>         3:       95 00 00 00 00 00 00 00 exit
> 
> wow!
> Is this a clang only behavior or gcc does the same "smart" optimization?
> 
> We've seen this issue with padding, but I could have never guessed
> that compiler will do so for explicit anon fields.
> I wonder what standard says and what other kernel code is broken
> by this "optimization".

Searched the internet, not sure whether this helps or not.

INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
Programming languages — C

http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
page 157:

Except where explicitly stated otherwise, for the purposes of this 
subclause unnamed
members of objects of structure and union type do not participate in 
initialization.
Unnamed members of structure objects have indeterminate value even after 
initialization.
Alexei Starovoitov Feb. 11, 2022, 5:36 p.m. UTC | #4
On Fri, Feb 11, 2022 at 9:32 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 2/11/22 8:47 AM, Alexei Starovoitov wrote:
> > On Fri, Feb 11, 2022 at 7:21 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>    struct bpf_spin_lock {
> >>          __u32   val;
> >>    };
> >>    struct bpf_timer {
> >>          __u64 :64;
> >>          __u64 :64;
> >>    } __attribute__((aligned(8)));
> >>
> >> The initialization code:
> >>    *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> >>        (struct bpf_spin_lock){};
> >>    *(struct bpf_timer *)(dst + map->timer_off) =
> >>        (struct bpf_timer){};
> >> It appears the compiler has no obligation to initialize anonymous fields.
> >> For example, let us use clang with bpf target as below:
> >>    $ cat t.c
> >>    struct bpf_timer {
> >>          unsigned long long :64;
> >>    };
> >>    struct bpf_timer2 {
> >>          unsigned long long a;
> >>    };
> >>
> >>    void test(struct bpf_timer *t) {
> >>      *t = (struct bpf_timer){};
> >>    }
> >>    void test2(struct bpf_timer2 *t) {
> >>      *t = (struct bpf_timer2){};
> >>    }
> >>    $ clang -target bpf -O2 -c -g t.c
> >>    $ llvm-objdump -d t.o
> >>     ...
> >>     0000000000000000 <test>:
> >>         0:       95 00 00 00 00 00 00 00 exit
> >>     0000000000000008 <test2>:
> >>         1:       b7 02 00 00 00 00 00 00 r2 = 0
> >>         2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
> >>         3:       95 00 00 00 00 00 00 00 exit
> >
> > wow!
> > Is this a clang only behavior or gcc does the same "smart" optimization?
> >
> > We've seen this issue with padding, but I could have never guessed
> > that compiler will do so for explicit anon fields.
> > I wonder what standard says and what other kernel code is broken
> > by this "optimization".
>
> Searched the internet, not sure whether this helps or not.
>
> INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
> Programming languages — C
>
> http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
> page 157:
>
> Except where explicitly stated otherwise, for the purposes of this
> subclause unnamed
> members of objects of structure and union type do not participate in
> initialization.
> Unnamed members of structure objects have indeterminate value even after
> initialization.

Thanks for clarifying! It means that gcc might do so eventually as well.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fa517ae604ad..1a4c73742a1f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -209,11 +209,9 @@  static inline bool map_value_has_timer(const struct bpf_map *map)
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
-		*(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
-			(struct bpf_spin_lock){};
+		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
 	if (unlikely(map_value_has_timer(map)))
-		*(struct bpf_timer *)(dst + map->timer_off) =
-			(struct bpf_timer){};
+		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
 }
 
 /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */