diff mbox series

[RFC,v2,02/25] stackdepot: prevent Clang from optimizing away stackdepot_memcmp()

Message ID 20191030142237.249532-3-glider@google.com (mailing list archive)
State New, archived
Headers show
Series Add KernelMemorySanitizer infrastructure | expand

Commit Message

Alexander Potapenko Oct. 30, 2019, 2:22 p.m. UTC
Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(),
which is exactly what we wanted to avoid creating stackdepot_memcmp().
Add a compiler barrier() to prevent optimizations.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: linux-mm@kvack.org
---

Change-Id: I4495b617b15c0ab003a61c1f0d54d0026fa8b144
---
 lib/stackdepot.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Sergey Senozhatsky Nov. 1, 2019, 5:50 a.m. UTC | #1
On (19/10/30 15:22), glider@google.com wrote:
> Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(),
> which is exactly what we wanted to avoid creating stackdepot_memcmp().
> Add a compiler barrier() to prevent optimizations.

[..]

> @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
>  			unsigned int n)
>  {
>  	for ( ; n-- ; u1++, u2++) {
> +		/*
> +		 * Prevent Clang from replacing this function with a bcmp()
> +		 * call.
> +		 */
> +		barrier();
>  		if (*u1 != *u2)
>  			return 1;
>  	}

Would 'volatile' do the trick?

	-ss
Alexander Potapenko Nov. 6, 2019, 11:43 a.m. UTC | #2
On Fri, Nov 1, 2019 at 6:50 AM Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> On (19/10/30 15:22), glider@google.com wrote:
> > Clang may replace stackdepot_memcmp() with a call to instrumented bcmp(),
> > which is exactly what we wanted to avoid creating stackdepot_memcmp().
> > Add a compiler barrier() to prevent optimizations.
>
> [..]
>
> > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
> >                       unsigned int n)
> >  {
> >       for ( ; n-- ; u1++, u2++) {
> > +             /*
> > +              * Prevent Clang from replacing this function with a bcmp()
> > +              * call.
> > +              */
> > +             barrier();
> >               if (*u1 != *u2)
> >                       return 1;
> >       }
>
> Would 'volatile' do the trick?
It does. I can replace the barrier with a volatile if you think that's better.
However this'll add a checkpatch warning, as volatiles are discouraged
for synchronization (although in this case there's no synchronization)
>
>         -ss
Sergey Senozhatsky Nov. 7, 2019, 6:08 a.m. UTC | #3
On (19/11/06 12:43), Alexander Potapenko wrote:
> > On (19/10/30 15:22), glider@google.com wrote:
> > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
> > >                       unsigned int n)
> > >  {
> > >       for ( ; n-- ; u1++, u2++) {
> > > +             /*
> > > +              * Prevent Clang from replacing this function with a bcmp()
> > > +              * call.
> > > +              */
> > > +             barrier();
> > >               if (*u1 != *u2)
> > >                       return 1;
> > >       }
> >
> > Would 'volatile' do the trick?
> It does. I can replace the barrier with a volatile if you think that's better.
> However this'll add a checkpatch warning, as volatiles are discouraged
> for synchronization (although in this case there's no synchronization)

Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
compiler optimizations. So, like you said, it's not a synchronization issue
and we don't 'volatile' data structures.

Do you need to do barrier() on every iteration? Does clang behave if
you do one barrier() instead of 'n' barrier()-s?

	-ss
Arnd Bergmann Nov. 7, 2019, 9:04 a.m. UTC | #4
On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> On (19/11/06 12:43), Alexander Potapenko wrote:
> > > On (19/10/30 15:22), glider@google.com wrote:
> > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
> > > >                       unsigned int n)
> > > >  {
> > > >       for ( ; n-- ; u1++, u2++) {
> > > > +             /*
> > > > +              * Prevent Clang from replacing this function with a bcmp()
> > > > +              * call.
> > > > +              */
> > > > +             barrier();
> > > >               if (*u1 != *u2)
> > > >                       return 1;
> > > >       }
> > >
> > > Would 'volatile' do the trick?
> > It does. I can replace the barrier with a volatile if you think that's better.
> > However this'll add a checkpatch warning, as volatiles are discouraged
> > for synchronization (although in this case there's no synchronization)
>
> Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
> compiler optimizations. So, like you said, it's not a synchronization issue
> and we don't 'volatile' data structures.

The normal way to do a volatile access would be
READ_ONCE()/WRITE_ONCE(), but that seems stronger than
the barrier() here. I'd just stick to adding a barrier.

> Do you need to do barrier() on every iteration? Does clang behave if
> you do one barrier() instead of 'n' barrier()-s?

If it does things right, it would make that a single-byte copy plus a call
to bcmp(). I certainly wouldn't want to have an implementation that relies
on the compiler making sub-optimal decisions ;-)

      Arnd
Alexander Potapenko Nov. 7, 2019, 9:22 a.m. UTC | #5
On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com> wrote:
> >
> > On (19/11/06 12:43), Alexander Potapenko wrote:
> > > > On (19/10/30 15:22), glider@google.com wrote:
> > > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
> > > > >                       unsigned int n)
> > > > >  {
> > > > >       for ( ; n-- ; u1++, u2++) {
> > > > > +             /*
> > > > > +              * Prevent Clang from replacing this function with a bcmp()
> > > > > +              * call.
> > > > > +              */
> > > > > +             barrier();
> > > > >               if (*u1 != *u2)
> > > > >                       return 1;
> > > > >       }
> > > >
> > > > Would 'volatile' do the trick?
> > > It does. I can replace the barrier with a volatile if you think that's better.
> > > However this'll add a checkpatch warning, as volatiles are discouraged
> > > for synchronization (although in this case there's no synchronization)
> >
> > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
> > compiler optimizations. So, like you said, it's not a synchronization issue
> > and we don't 'volatile' data structures.
>
> The normal way to do a volatile access would be
> READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> the barrier() here. I'd just stick to adding a barrier.
I actually like the READ_ONCE idea more, as READ_ONCE is really a
documented way to prevent the compiler from merging reads, which is
what we want here.

I also thought that the original barrier() statement was just a
compiler barrier, which didn't introduce any additional CPU
instructions.
Turns out I was wrong, and barrier() also serves as a memory barrier.
This doesn't really matter in this case, because this memcmp function
isn't performance-critical (we are preventing Clang from optimizing
it, after all).
Still, READ_ONCE and WRITE_ONCE might be even cheaper, as they are
just relaxed memory accesses implemented using volatile.
> > Do you need to do barrier() on every iteration? Does clang behave if
> > you do one barrier() instead of 'n' barrier()-s?
>
> If it does things right, it would make that a single-byte copy plus a call
> to bcmp(). I certainly wouldn't want to have an implementation that relies
> on the compiler making sub-optimal decisions ;-)
That's a really good point :)
>       Arnd
Arnd Bergmann Nov. 7, 2019, 9:28 a.m. UTC | #6
On Thu, Nov 7, 2019 at 10:22 AM Alexander Potapenko <glider@google.com> wrote:
> On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com> wrote:
> > >
> > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
> > > compiler optimizations. So, like you said, it's not a synchronization issue
> > > and we don't 'volatile' data structures.
> >
> > The normal way to do a volatile access would be
> > READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> > the barrier() here. I'd just stick to adding a barrier.
> I actually like the READ_ONCE idea more, as READ_ONCE is really a
> documented way to prevent the compiler from merging reads, which is
> what we want here.

Fair enough.

> I also thought that the original barrier() statement was just a
> compiler barrier, which didn't introduce any additional CPU
> instructions.
> Turns out I was wrong, and barrier() also serves as a memory barrier.

The definition of barrier is

#define barrier() __asm__ __volatile__("": : :"memory")

which is no actual barrier instruction but is a full barrier to the compiler.

Only for the Intel ecc compiler it is defined as an intrinsic that I don't
know.

      Arnd
Alexander Potapenko Nov. 7, 2019, 9:43 a.m. UTC | #7
On Thu, Nov 7, 2019 at 10:29 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Nov 7, 2019 at 10:22 AM Alexander Potapenko <glider@google.com> wrote:
> > On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> > > <sergey.senozhatsky.work@gmail.com> wrote:
> > > >
> > > > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
> > > > compiler optimizations. So, like you said, it's not a synchronization issue
> > > > and we don't 'volatile' data structures.
> > >
> > > The normal way to do a volatile access would be
> > > READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> > > the barrier() here. I'd just stick to adding a barrier.
> > I actually like the READ_ONCE idea more, as READ_ONCE is really a
> > documented way to prevent the compiler from merging reads, which is
> > what we want here.
>
> Fair enough.
>
> > I also thought that the original barrier() statement was just a
> > compiler barrier, which didn't introduce any additional CPU
> > instructions.
> > Turns out I was wrong, and barrier() also serves as a memory barrier.
>
> The definition of barrier is
>
> #define barrier() __asm__ __volatile__("": : :"memory")
>
> which is no actual barrier instruction but is a full barrier to the compiler.
And you're right again, shame on me and my reading skills.

> Only for the Intel ecc compiler it is defined as an intrinsic that I don't
> know.

>       Arnd
Arnd Bergmann Nov. 7, 2019, 10 a.m. UTC | #8
On Thu, Nov 7, 2019 at 10:46 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> On 11/7/19 12:22 PM, Alexander Potapenko wrote:
> > On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> >> <sergey.senozhatsky.work@gmail.com> wrote:
> >> The normal way to do a volatile access would be
> >> READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> >> the barrier() here. I'd just stick to adding a barrier.
> > I actually like the READ_ONCE idea more, as READ_ONCE is really a
> > documented way to prevent the compiler from merging reads, which is
> > what we want here.
>
> I would rather go with -fno-builtin-bcmp or maybe even -fno-builtin if that works.

The commit message for 5f074f3e192f ("lib/string.c: implement a basic bcmp")
mentions that  -fno-builtin-bcmp did not work for LTO when the global bcmp()
help was added. I don't know whether the same applies here, but my guess is
that it's the same issue.

      Arnd
Alexander Potapenko Nov. 7, 2019, 10:30 a.m. UTC | #9
On Thu, Nov 7, 2019 at 11:19 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 11/7/19 1:00 PM, Arnd Bergmann wrote:
> > On Thu, Nov 7, 2019 at 10:46 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >> On 11/7/19 12:22 PM, Alexander Potapenko wrote:
> >>> On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >>>> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> >>>> <sergey.senozhatsky.work@gmail.com> wrote:
> >>>> The normal way to do a volatile access would be
> >>>> READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> >>>> the barrier() here. I'd just stick to adding a barrier.
> >>> I actually like the READ_ONCE idea more, as READ_ONCE is really a
> >>> documented way to prevent the compiler from merging reads, which is
> >>> what we want here.
> >>
> >> I would rather go with -fno-builtin-bcmp or maybe even -fno-builtin if that works.
> >
> > The commit message for 5f074f3e192f ("lib/string.c: implement a basic bcmp")
> > mentions that  -fno-builtin-bcmp did not work for LTO when the global bcmp()
> > help was added. I don't know whether the same applies here, but my guess is
> > that it's the same issue.
>
> But we don't do LTO.
I don't think not doing LTO now is a valid argument, as some
distributions may start doing LTO in the future.
(Android already does LTO, by the way)

Regarding this particular case, -fno-builtin-bcmp is insufficient, as
Clang falls back to memcmp() in that case.
Building with -fno-builtin-bcmp -fno-builtin-memcmp does the trick,
but we'd probably better use -fno-builtin just to avoid future
surprises.
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/6875c6e6-2f1f-f8e6-e5d7-d451c48397ff%40virtuozzo.com.
diff mbox series

Patch

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0d00d2ac0c4b..785839298e08 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -163,6 +163,11 @@  int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
 			unsigned int n)
 {
 	for ( ; n-- ; u1++, u2++) {
+		/*
+		 * Prevent Clang from replacing this function with a bcmp()
+		 * call.
+		 */
+		barrier();
 		if (*u1 != *u2)
 			return 1;
 	}