mbox series

[v1,0/2] x86: Avoid using INC and DEC instructions on hot paths

Message ID 20220307114558.1234494-1-ammarfaizi2@gnuweeb.org (mailing list archive)
Headers show
Series x86: Avoid using INC and DEC instructions on hot paths | expand

Message

Ammar Faizi March 7, 2022, 11:45 a.m. UTC
Hi,

In order to take maximum advantage of out-of-order execution,
avoid using INC/DEC instructions when appropriate. INC/DEC only
writes to part of the flags register, which can cause a partial
flag register stall. This series replaces INC/DEC with ADD/SUB.

Agner Fog's optimization manual says [1]:
"""
  The INC and DEC instructions are inefficient on some CPUs because they
  write to only part of the flags register (excluding the carry flag).
  Use ADD or SUB instead to avoid false dependences or inefficient
  splitting of the flags register, especially if they are followed by
  an instruction that reads the flags.
"""

Intel's optimization manual 3.5.1.1 says [2]:
"""
  The INC and DEC instructions modify only a subset of the bits in the
  flag register. This creates a dependence on all previous writes of
  the flag register. This is especially problematic when these
  instructions are on the critical path because they are used to change
  an address for a load on which many other instructions depend.

  Assembly/Compiler Coding Rule 33. (M impact, H generality) INC and DEC
  instructions should be replaced with ADD or SUB instructions, because
  ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
  creating false dependencies on earlier instructions that set the flags.
"""

Newer compilers also do it for generic x86-64 CPU (https://godbolt.org/z/rjsfbdx54).
# C code:

  int fy_inc(int a, int b, int c)
  {
      a++; b++; c++;
      return a * b * c;
  }

# ASM
## GCC 4.1.2 and older use INC (old).
fy_inc:
    incl    %edi
    incl    %esi
    leal    1(%rdx), %eax
    imull   %esi, %edi
    imull   %edi, %eax
    ret

## GCC 4.4.7 to GCC 11.2 use ADD (new).
fy_inc:
    addl    $1, %edi
    addl    $1, %esi
    addl    $1, %edx
    imull   %esi, %edi
    movl    %edi, %eax
    imull   %edx, %eax
    ret

## Clang 5.0.2 and older use INC (old).
fy_inc:
    incl    %edi
    leal    1(%rsi), %eax
    imull   %edi, %eax
    incl    %edx
    imull   %edx, %eax
    retq

## Clang 6.0.0 to Clang 13.0.1 use ADD (new).
fy_inc:
    addl    $1, %edi
    leal    1(%rsi), %eax
    imull   %edi, %eax
    addl    $1, %edx
    imull   %edx, %eax
    retq

[1]: https://www.agner.org/optimize/optimizing_assembly.pdf
[2]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
Ammar Faizi (2):
  x86/include/asm: Avoid using INC and DEC instructions on hot paths
  x86/lib: Avoid using INC and DEC instructions on hot paths

 arch/x86/include/asm/xor_32.h | 16 ++++++++--------
 arch/x86/lib/copy_mc_64.S     | 14 +++++++-------
 arch/x86/lib/copy_user_64.S   | 26 +++++++++++++-------------
 arch/x86/lib/memset_64.S      |  6 +++---
 arch/x86/lib/string_32.c      | 20 ++++++++++----------
 arch/x86/lib/strstr_32.c      |  4 ++--
 arch/x86/lib/usercopy_64.c    | 12 ++++++------
 7 files changed, 49 insertions(+), 49 deletions(-)


base-commit: ffb217a13a2eaf6d5bd974fc83036a53ca69f1e2

Comments

Borislav Petkov March 7, 2022, 12:38 p.m. UTC | #1
On Mon, Mar 07, 2022 at 06:45:56PM +0700, Ammar Faizi wrote:
> In order to take maximum advantage of out-of-order execution,
> avoid using INC/DEC instructions when appropriate. INC/DEC only
> writes to part of the flags register, which can cause a partial
> flag register stall. This series replaces INC/DEC with ADD/SUB.

"Improvements" like that need to show in benchmark runs - not
microbenchmark - that they bring anything. Just by looking at them, I'd
say they won't show any difference. But I'm always open to surprises.

Btw, you don't have to send all your patches directly to me - there are
other x86 maintainers. IOW, you can use scripts/get_maintainer.pl to
figure out who to send them to.

Also, I'd advise going over Documentation/process/ if you're new to this.
Especially Documentation/process/submitting-patches.rst.

Thx.
Ammar Faizi March 7, 2022, 1:37 p.m. UTC | #2
On 3/7/22 7:38 PM, Borislav Petkov wrote:
> On Mon, Mar 07, 2022 at 06:45:56PM +0700, Ammar Faizi wrote:
>> In order to take maximum advantage of out-of-order execution,
>> avoid using INC/DEC instructions when appropriate. INC/DEC only
>> writes to part of the flags register, which can cause a partial
>> flag register stall. This series replaces INC/DEC with ADD/SUB.
> 
> "Improvements" like that need to show in benchmark runs - not
> microbenchmark - that they bring anything. Just by looking at them, I'd
> say they won't show any difference. But I'm always open to surprises.

OK, thanks for taking a look. I will play a bit more with this. Not sure
how much the visible improvement. If I can win some numbers (probably can't),
I will be back to this thread.

> Btw, you don't have to send all your patches directly to me - there are
> other x86 maintainers. IOW, you can use scripts/get_maintainer.pl to
> figure out who to send them to.

I did anyway, all CC list here I took from that script. I will try to give
other maintainers a turn next time.

> Also, I'd advise going over Documentation/process/ if you're new to this.
> Especially Documentation/process/submitting-patches.rst.
I might've missed the benchmark backup part. Will review those documents again.
Borislav Petkov March 9, 2022, 9:33 a.m. UTC | #3
On Mon, Mar 07, 2022 at 08:37:59PM +0700, Ammar Faizi wrote:
> > Also, I'd advise going over Documentation/process/ if you're new to this.
> > Especially Documentation/process/submitting-patches.rst.
> I might've missed the benchmark backup part. Will review those documents again.

The "Describe your changes" section in the abovementioned file has some
good explanations on what to pay attention to.