diff mbox series

x86/string: correct memmove()'s forwarding to memcpy()

Message ID a23d148a-25d0-cc5b-6050-88345188ef5a@suse.com (mailing list archive)
State New
Headers show
Series x86/string: correct memmove()'s forwarding to memcpy() | expand

Commit Message

Jan Beulich Feb. 3, 2021, 2:22 p.m. UTC
With memcpy() expanding to the compiler builtin, we may not hand it
overlapping source and destination. We strictly mean to forward to our
own implementation (a few lines up in the same source file).

Fixes: 78825e1c60fa ("x86/string: Clean up x86/string.h")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
An alternative would be to "#undef memcpy" near the top of the file. But
I think the way it's done now is more explicit to the reader. An #undef
would be the only way if the macro was an object-like one.

At least with gcc10 this does alter generated code: The builtin gets
expanded into a tail call, while after this change our memcpy() gets
inlined into memmove(). This would change again once we separate the 3
functions here into their own CUs for placing them in an archive.

Comments

Andrew Cooper Feb. 3, 2021, 3:01 p.m. UTC | #1
On 03/02/2021 14:22, Jan Beulich wrote:
> With memcpy() expanding to the compiler builtin, we may not hand it
> overlapping source and destination. We strictly mean to forward to our
> own implementation (a few lines up in the same source file).
>
> Fixes: 78825e1c60fa ("x86/string: Clean up x86/string.h")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I agree that the current logic is buggy, but I'm not sure this is an
improvement.

You've switched from relying on GCC's builtin to operate forwards, to
relying on Xen's implementation operating forwards.

At the very least, can we get a code comment stating something like
"depends on Xen's implementation operating forwards" ?

> ---
> An alternative would be to "#undef memcpy" near the top of the file. But
> I think the way it's done now is more explicit to the reader. An #undef
> would be the only way if the macro was an object-like one.

I chose not to use undef's to avoid impacting the optimisation of other
functions in this file.  I can't remember if this made a difference in
practice.

> At least with gcc10 this does alter generated code: The builtin gets
> expanded into a tail call, while after this change our memcpy() gets
> inlined into memmove(). This would change again once we separate the 3
> functions here into their own CUs for placing them in an archive.

As (perhaps) a tangent, how do we plan to provide x86-optimised versions
in combination with the library work?  We're long overdue needing to
refresh our fast-strings support to include fast rep-mov/stosb.

~Andrew

>
> --- a/xen/arch/x86/string.c
> +++ b/xen/arch/x86/string.c
> @@ -43,7 +43,7 @@ void *(memmove)(void *dest, const void *
>          return dest;
>  
>      if ( dest < src )
> -        return memcpy(dest, src, n);
> +        return (memcpy)(dest, src, n);
>  
>      asm volatile (
>          "   std         ; "
Jan Beulich Feb. 3, 2021, 3:31 p.m. UTC | #2
On 03.02.2021 16:01, Andrew Cooper wrote:
> On 03/02/2021 14:22, Jan Beulich wrote:
>> With memcpy() expanding to the compiler builtin, we may not hand it
>> overlapping source and destination. We strictly mean to forward to our
>> own implementation (a few lines up in the same source file).
>>
>> Fixes: 78825e1c60fa ("x86/string: Clean up x86/string.h")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I agree that the current logic is buggy, but I'm not sure this is an
> improvement.
> 
> You've switched from relying on GCC's builtin to operate forwards, to
> relying on Xen's implementation operating forwards.

Is there such a guarantee for the compiler builtin? If so - no
need for this patch indeed. But I couldn't find any doc saying
so.

> At the very least, can we get a code comment stating something like
> "depends on Xen's implementation operating forwards" ?

No problem at all.

>> ---
>> An alternative would be to "#undef memcpy" near the top of the file. But
>> I think the way it's done now is more explicit to the reader. An #undef
>> would be the only way if the macro was an object-like one.
> 
> I chose not to use undef's to avoid impacting the optimisation of other
> functions in this file.  I can't remember if this made a difference in
> practice.
> 
>> At least with gcc10 this does alter generated code: The builtin gets
>> expanded into a tail call, while after this change our memcpy() gets
>> inlined into memmove(). This would change again once we separate the 3
>> functions here into their own CUs for placing them in an archive.
> 
> As (perhaps) a tangent, how do we plan to provide x86-optimised versions
> in combination with the library work?

By specifying the per-arch lib.a first.

>  We're long overdue needing to
> refresh our fast-strings support to include fast rep-mov/stosb.

That's pretty much orthogonal to the code movement though.

Jan
Andrew Cooper Feb. 3, 2021, 3:36 p.m. UTC | #3
On 03/02/2021 15:31, Jan Beulich wrote:
> On 03.02.2021 16:01, Andrew Cooper wrote:
>> On 03/02/2021 14:22, Jan Beulich wrote:
>>> With memcpy() expanding to the compiler builtin, we may not hand it
>>> overlapping source and destination. We strictly mean to forward to our
>>> own implementation (a few lines up in the same source file).
>>>
>>> Fixes: 78825e1c60fa ("x86/string: Clean up x86/string.h")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> I agree that the current logic is buggy, but I'm not sure this is an
>> improvement.
>>
>> You've switched from relying on GCC's builtin to operate forwards, to
>> relying on Xen's implementation operating forwards.
> Is there such a guarantee for the compiler builtin? If so - no
> need for this patch indeed. But I couldn't find any doc saying
> so.

I've never seen it emit anything which isn't a forwards operation (i.e.
I think the compiled result tended to be safe in practice), but C's
flexibility does explicitly permit a backwards implementation.

>
>> At the very least, can we get a code comment stating something like
>> "depends on Xen's implementation operating forwards" ?
> No problem at all.

In which case Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> to
avoid a round trip.

>
>>> ---
>>> An alternative would be to "#undef memcpy" near the top of the file. But
>>> I think the way it's done now is more explicit to the reader. An #undef
>>> would be the only way if the macro was an object-like one.
>> I chose not to use undef's to avoid impacting the optimisation of other
>> functions in this file.  I can't remember if this made a difference in
>> practice.
>>
>>> At least with gcc10 this does alter generated code: The builtin gets
>>> expanded into a tail call, while after this change our memcpy() gets
>>> inlined into memmove(). This would change again once we separate the 3
>>> functions here into their own CUs for placing them in an archive.
>> As (perhaps) a tangent, how do we plan to provide x86-optimised versions
>> in combination with the library work?
> By specifying the per-arch lib.a first.

Ok - good to hear.

>>   We're long overdue needing to
>> refresh our fast-strings support to include fast rep-mov/stosb.
> That's pretty much orthogonal to the code movement though.

Yes, but it does need doing.  We're perpetually playing catchup, and
there are meaningful improvements to be had for logic such as
clear_page() which is fairly poor, optimisation wise atm.

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/string.c
+++ b/xen/arch/x86/string.c
@@ -43,7 +43,7 @@  void *(memmove)(void *dest, const void *
         return dest;
 
     if ( dest < src )
-        return memcpy(dest, src, n);
+        return (memcpy)(dest, src, n);
 
     asm volatile (
         "   std         ; "