Message ID | 20191030142237.249532-3-glider@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add KernelMemorySanitizer infrastructure | expand |
On (19/10/30 15:22), glider@google.com wrote: > Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(), > which is exactly what we wanted to avoid creating stackdepot_memcmp(). > Add a compiler barrier() to prevent optimizations. [..] > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, > unsigned int n) > { > for ( ; n-- ; u1++, u2++) { > + /* > + * Prevent Clang from replacing this function with a bcmp() > + * call. > + */ > + barrier(); > if (*u1 != *u2) > return 1; > } Would 'volatile' do the trick? -ss
On Fri, Nov 1, 2019 at 6:50 AM Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > On (19/10/30 15:22), glider@google.com wrote: > > Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(), > > which is exactly what we wanted to avoid creating stackdepot_memcmp(). > > Add a compiler barrier() to prevent optimizations. > > [..] > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, > > unsigned int n) > > { > > for ( ; n-- ; u1++, u2++) { > > + /* > > + * Prevent Clang from replacing this function with a bcmp() > > + * call. > > + */ > > + barrier(); > > if (*u1 != *u2) > > return 1; > > } > > Would 'volatile' do the trick? It does. I can replace the barrier with a volatile if you think that's better. However this'll add a checkpatch warning, as volatiles are discouraged for synchronization (although in this case there's no synchronization) > > -ss
On (19/11/06 12:43), Alexander Potapenko wrote: > > On (19/10/30 15:22), glider@google.com wrote: > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, > > > unsigned int n) > > > { > > > for ( ; n-- ; u1++, u2++) { > > > + /* > > > + * Prevent Clang from replacing this function with a bcmp() > > > + * call. > > > + */ > > > + barrier(); > > > if (*u1 != *u2) > > > return 1; > > > } > > > > Would 'volatile' do the trick? > It does. I can replace the barrier with a volatile if you think that's better. > However this'll add a checkpatch warning, as volatiles are discouraged > for synchronization (although in this case there's no synchronization) Yeah, 'volatile' in this case will do what it sort of meant to do - prevent compiler optimizations. So, like you said, it's not a synchronization issue and we don't 'volatile' data structures. Do you need to do barrier() on every iteration? Does clang behave if you do one barrier() instead of 'n' barrier()-s? -ss
On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > On (19/11/06 12:43), Alexander Potapenko wrote: > > > On (19/10/30 15:22), glider@google.com wrote: > > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, > > > > unsigned int n) > > > > { > > > > for ( ; n-- ; u1++, u2++) { > > > > + /* > > > > + * Prevent Clang from replacing this function with a bcmp() > > > > + * call. > > > > + */ > > > > + barrier(); > > > > if (*u1 != *u2) > > > > return 1; > > > > } > > > > > > Would 'volatile' do the trick? > > It does. I can replace the barrier with a volatile if you think that's better. > > However this'll add a checkpatch warning, as volatiles are discouraged > > for synchronization (although in this case there's no synchronization) > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent > compiler optimizations. So, like you said, it's not a synchronization issue > and we don't 'volatile' data structures. The normal way to do a volatile access would be READ_ONCE()/WRITE_ONCE(), but that seems stronger than the barrier() here. I'd just stick to adding a barrier. > Do you need to do barrier() on every iteration? Does clang behave if > you do one barrier() instead of 'n' barrier()-s? If it does things right, it would make that a single-byte copy plus a call to bcmp(). I certainly wouldn't want to have an implementation that relies on the compiler making sub-optimal decisions ;-) Arnd
On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky > <sergey.senozhatsky.work@gmail.com> wrote: > > > > On (19/11/06 12:43), Alexander Potapenko wrote: > > > > On (19/10/30 15:22), glider@google.com wrote: > > > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, > > > > > unsigned int n) > > > > > { > > > > > for ( ; n-- ; u1++, u2++) { > > > > > + /* > > > > > + * Prevent Clang from replacing this function with a bcmp() > > > > > + * call. > > > > > + */ > > > > > + barrier(); > > > > > if (*u1 != *u2) > > > > > return 1; > > > > > } > > > > > > > > Would 'volatile' do the trick? > > > It does. I can replace the barrier with a volatile if you think that's better. > > > However this'll add a checkpatch warning, as volatiles are discouraged > > > for synchronization (although in this case there's no synchronization) > > > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent > > compiler optimizations. So, like you said, it's not a synchronization issue > > and we don't 'volatile' data structures. > > The normal way to do a volatile access would be > READ_ONCE()/WRITE_ONCE(), but that seems stronger than > the barrier() here. I'd just stick to adding a barrier. I actually like the READ_ONCE idea more, as READ_ONCE is really a documented way to prevent the compiler from merging reads, which is what we want here. I also thought that the original barrier() statement was just a compiler barrier, which didn't introduce any additional CPU instructions. Turns out I was wrong, and barrier() also serves as a memory barrier. This doesn't really matter in this case, because this memcmp function isn't performance-critical (we are preventing Clang from optimizing it, after all). Still, READ_ONCE and WRITE_ONCE might be even cheaper, as they are just relaxed memory accesses implemented using volatile. > > Do you need to do barrier() on every iteration? Does clang behave if > > you do one barrier() instead of 'n' barrier()-s? > > If it does things right, it would make that a single-byte copy plus a call > to bcmp(). I certainly wouldn't want to have an implementation that relies > on the compiler making sub-optimal decisions ;-) That's a really good point :) > Arnd
On Thu, Nov 7, 2019 at 10:22 AM Alexander Potapenko <glider@google.com> wrote: > On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky > > <sergey.senozhatsky.work@gmail.com> wrote: > > > > > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent > > > compiler optimizations. So, like you said, it's not a synchronization issue > > > and we don't 'volatile' data structures. > > > > The normal way to do a volatile access would be > > READ_ONCE()/WRITE_ONCE(), but that seems stronger than > > the barrier() here. I'd just stick to adding a barrier. > I actually like the READ_ONCE idea more, as READ_ONCE is really a > documented way to prevent the compiler from merging reads, which is > what we want here. Fair enough. > I also thought that the original barrier() statement was just a > compiler barrier, which didn't introduce any additional CPU > instructions. > Turns out I was wrong, and barrier() also serves as a memory barrier. The definition of barrier is #define barrier() __asm__ __volatile__("": : :"memory") which is no actual barrier instruction but is a full barrier to the compiler. Only for the Intel ecc compiler it is defined as an intrinsic that I don't know. Arnd
On Thu, Nov 7, 2019 at 10:29 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Nov 7, 2019 at 10:22 AM Alexander Potapenko <glider@google.com> wrote: > > On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky > > > <sergey.senozhatsky.work@gmail.com> wrote: > > > > > > > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent > > > > compiler optimizations. So, like you said, it's not a synchronization issue > > > > and we don't 'volatile' data structures. > > > > > > The normal way to do a volatile access would be > > > READ_ONCE()/WRITE_ONCE(), but that seems stronger than > > > the barrier() here. I'd just stick to adding a barrier. > > I actually like the READ_ONCE idea more, as READ_ONCE is really a > > documented way to prevent the compiler from merging reads, which is > > what we want here. > > Fair enough. > > > I also thought that the original barrier() statement was just a > > compiler barrier, which didn't introduce any additional CPU > > instructions. > > Turns out I was wrong, and barrier() also serves as a memory barrier. > > The definition of barrier is > > #define barrier() __asm__ __volatile__("": : :"memory") > > which is no actual barrier instruction but is a full barrier to the compiler. And you're right again, shame on me and my reading skills. > Only for the Intel ecc compiler it is defined as an intrinsic that I don't > know. > Arnd
On Thu, Nov 7, 2019 at 10:46 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > On 11/7/19 12:22 PM, Alexander Potapenko wrote: > > On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote: > >> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky > >> <sergey.senozhatsky.work@gmail.com> wrote: > >> The normal way to do a volatile access would be > >> READ_ONCE()/WRITE_ONCE(), but that seems stronger than > >> the barrier() here. I'd just stick to adding a barrier. > > I actually like the READ_ONCE idea more, as READ_ONCE is really a > > documented way to prevent the compiler from merging reads, which is > > what we want here. > > I would rather go with -fno-builtin-bcmp or maybe even -fno-builtin if that works. The commit message for 5f074f3e192f ("lib/string.c: implement a basic bcmp") mentions that -fno-builtin-bcmp did not work for LTO when the global bcmp() help was added. I don't know whether the same applies here, but my guess is that it's the same issue. Arnd
On Thu, Nov 7, 2019 at 11:19 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > > > > On 11/7/19 1:00 PM, Arnd Bergmann wrote: > > On Thu, Nov 7, 2019 at 10:46 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > >> On 11/7/19 12:22 PM, Alexander Potapenko wrote: > >>> On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote: > >>>> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky > >>>> <sergey.senozhatsky.work@gmail.com> wrote: > >>>> The normal way to do a volatile access would be > >>>> READ_ONCE()/WRITE_ONCE(), but that seems stronger than > >>>> the barrier() here. I'd just stick to adding a barrier. > >>> I actually like the READ_ONCE idea more, as READ_ONCE is really a > >>> documented way to prevent the compiler from merging reads, which is > >>> what we want here. > >> > >> I would rather go with -fno-builtin-bcmp or maybe even -fno-builtin if that works. > > > > The commit message for 5f074f3e192f ("lib/string.c: implement a basic bcmp") > > mentions that -fno-builtin-bcmp did not work for LTO when the global bcmp() > > help was added. I don't know whether the same applies here, but my guess is > > that it's the same issue. > > But we don't do LTO. I don't think not doing LTO now is a valid argument, as some distributions may start doing LTO in the future. (Android already does LTO, by the way) Regarding this particular case, -fno-builtin-bcmp is insufficient, as Clang falls back to memcmp() in that case. Building with -fno-builtin-bcmp -fno-builtin-memcmp does the trick, but we'd probably better use -fno-builtin just to avoid future surprises. > -- > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/6875c6e6-2f1f-f8e6-e5d7-d451c48397ff%40virtuozzo.com.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 0d00d2ac0c4b..785839298e08 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, unsigned int n) { for ( ; n-- ; u1++, u2++) { + /* + * Prevent Clang from replacing this function with a bcmp() + * call. + */ + barrier(); if (*u1 != *u2) return 1; }
Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(), which is exactly what we wanted to avoid creating stackdepot_memcmp(). Add a compiler barrier() to prevent optimizations. Signed-off-by: Alexander Potapenko <glider@google.com> To: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: linux-mm@kvack.org --- Change-Id: I4495b617b15c0ab003a61c1f0d54d0026fa8b144 --- lib/stackdepot.c | 5 +++++ 1 file changed, 5 insertions(+)