Message ID | 20221208053649.540891-1-almasrymina@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1] arch: Enable function alignment for arm64 | expand |
On Wed, Dec 7, 2022 at 9:36 PM Mina Almasry <almasrymina@google.com> wrote: > > We recently ran into a double-digit percentage hackbench regression > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > before decrementing h->resv_huge_pages") to an older kernel. This was > surprising since hackbench does use hugetlb pages at all and the > modified code is not invoked. After some debugging we found that the > regression can be fixed by back-porting commit d49a0626216b ("arch: > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > for arm64. I suggest enabling it by default for arm64 if possible. > > Tested by examing function alignment on a compiled object file > before/after: > > After this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 0 > > Before this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 94 > > Cc: Peter Zijlstra <peterz@infradead.org> I missed the Signed-off-by: Mina Almasry <almasrymina@google.com> tag here. I can add with the next version. > --- > arch/arm64/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index cf6d1cd8b6dc..bcc9e1578937 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -235,6 +235,7 @@ config ARM64 > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > + select FUNCTION_ALIGNMENT_16B > help > ARM 64-bit (AArch64) Linux support. > > -- > 2.39.0.rc0.267.gcb52ba06e7-goog
On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > We recently ran into a double-digit percentage hackbench regression > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > before decrementing h->resv_huge_pages") to an older kernel. This was > surprising since hackbench does use hugetlb pages at all and the > modified code is not invoked. After some debugging we found that the > regression can be fixed by back-porting commit d49a0626216b ("arch: > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > for arm64. I suggest enabling it by default for arm64 if possible. > > Tested by examing function alignment on a compiled object file > before/after: > > After this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 0 > > Before this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 94 > > Cc: Peter Zijlstra <peterz@infradead.org> > --- > arch/arm64/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index cf6d1cd8b6dc..bcc9e1578937 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -235,6 +235,7 @@ config ARM64 > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > + select FUNCTION_ALIGNMENT_16B > help > ARM 64-bit (AArch64) Linux support. This increases the size of .text for a defconfig build by ~2%, so I think it would be nice to have some real numbers for the performance uplift. Are you able to elaborate beyond "double-digit percentage hackbench regression"? In general, however, I'm supportive of the patch (and it seems that x86 does the same thing) so: Acked-by: Will Deacon <will@kernel.org> Will
From: Will Deacon <will@kernel.org> > Sent: 24 January 2023 12:09 > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > We recently ran into a double-digit percentage hackbench regression > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > before decrementing h->resv_huge_pages") to an older kernel. This was > > surprising since hackbench does use hugetlb pages at all and the > > modified code is not invoked. After some debugging we found that the > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > for arm64. I suggest enabling it by default for arm64 if possible. > > ... > > This increases the size of .text for a defconfig build by ~2%, so I think it > would be nice to have some real numbers for the performance uplift. Are you > able to elaborate beyond "double-digit percentage hackbench regression"? > > In general, however, I'm supportive of the patch (and it seems that x86 > does the same thing) so: I bet it just changes the alignment of the code so that more functions are using different cache lines. All sorts of other random changes are likely to have a similar effect. Cache-line aligning the start of a function probably reduces the number of cache lines the functions needs - but that isn't guaranteed. It also slightly reduces the delay on a cache miss - but they are so slow it probably makes almost no difference. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Jan 24, 2023 at 4:09 AM Will Deacon <will@kernel.org> wrote: > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > We recently ran into a double-digit percentage hackbench regression > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > before decrementing h->resv_huge_pages") to an older kernel. This was > > surprising since hackbench does use hugetlb pages at all and the > > modified code is not invoked. After some debugging we found that the > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > Tested by examing function alignment on a compiled object file > > before/after: > > > > After this patch: > > > > $ ~/is-aligned.sh mm/hugetlb.o 16 > > file=mm/hugetlb.o, alignment=16 > > total number of functions: 146 > > total number of unaligned: 0 > > > > Before this patch: > > > > $ ~/is-aligned.sh mm/hugetlb.o 16 > > file=mm/hugetlb.o, alignment=16 > > total number of functions: 146 > > total number of unaligned: 94 > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > --- > > arch/arm64/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index cf6d1cd8b6dc..bcc9e1578937 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -235,6 +235,7 @@ config ARM64 > > select TRACE_IRQFLAGS_SUPPORT > > select TRACE_IRQFLAGS_NMI_SUPPORT > > select HAVE_SOFTIRQ_ON_OWN_STACK > > + select FUNCTION_ALIGNMENT_16B > > help > > ARM 64-bit (AArch64) Linux support. > > This increases the size of .text for a defconfig build by ~2%, so I think it > would be nice to have some real numbers for the performance uplift. Are you > able to elaborate beyond "double-digit percentage hackbench regression"? > (Sorry for the late reply) The full story is already in the commit message. The only details I omitted are the exact regression numbers we saw: -16% on hackbench_process_pipes_234 (which should be `hackbench -pipe 234 process 1000`) -23% on hackbench_process_sockets_234 (which should be `hackbnech 234 process 1000`) Like the commit message says it doesn't make much sense that cherry-picking 12df140f0bdf ("mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages") to our kernel causes such a huge regression, because hackbench doesn't use hugetlb at all. > In general, however, I'm supportive of the patch (and it seems that x86 > does the same thing) so: > > Acked-by: Will Deacon <will@kernel.org> > > Will
On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote: > > From: Will Deacon <will@kernel.org> > > Sent: 24 January 2023 12:09 > > > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > > We recently ran into a double-digit percentage hackbench regression > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > > before decrementing h->resv_huge_pages") to an older kernel. This was > > > surprising since hackbench does use hugetlb pages at all and the > > > modified code is not invoked. After some debugging we found that the > > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > ... > > > > This increases the size of .text for a defconfig build by ~2%, so I think it > > would be nice to have some real numbers for the performance uplift. Are you > > able to elaborate beyond "double-digit percentage hackbench regression"? > > > > In general, however, I'm supportive of the patch (and it seems that x86 > > does the same thing) so: > > I bet it just changes the alignment of the code so that more > functions are using different cache lines. > > All sorts of other random changes are likely to have a similar effect. > > Cache-line aligning the start of a function probably reduces the > number of cache lines the functions needs - but that isn't guaranteed. > It also slightly reduces the delay on a cache miss - but they are so > slow it probably makes almost no difference. > David, my understanding is similar to yours. I.e. without explicit alignment: 1. Random changes to the code can cause critical path functions to become aligned or unaligned which will cause perf regressions/improvements. 2. Random changes to the code can cause critical path functions to be placed near a cache line boundary, causing one more cache line to be loaded when they are run, which will cause perf regressions. So for these very reasons function alignment is a good thing. > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
From: Mina Almasry > Sent: 22 February 2023 22:16 > > On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote: > > > > From: Will Deacon <will@kernel.org> > > > Sent: 24 January 2023 12:09 > > > > > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > > > We recently ran into a double-digit percentage hackbench regression > > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > > > before decrementing h->resv_huge_pages") to an older kernel. This was > > > > surprising since hackbench does use hugetlb pages at all and the > > > > modified code is not invoked. After some debugging we found that the > > > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > > > ... > > > > > > This increases the size of .text for a defconfig build by ~2%, so I think it > > > would be nice to have some real numbers for the performance uplift. Are you > > > able to elaborate beyond "double-digit percentage hackbench regression"? > > > > > > In general, however, I'm supportive of the patch (and it seems that x86 > > > does the same thing) so: > > > > I bet it just changes the alignment of the code so that more > > functions are using different cache lines. > > > > All sorts of other random changes are likely to have a similar effect. > > > > Cache-line aligning the start of a function probably reduces the > > number of cache lines the functions needs - but that isn't guaranteed. > > It also slightly reduces the delay on a cache miss - but they are so > > slow it probably makes almost no difference. > > > > David, my understanding is similar to yours. I.e. without explicit alignment: > > 1. Random changes to the code can cause critical path functions to > become aligned or unaligned which will cause perf > regressions/improvements. > 2. Random changes to the code can cause critical path functions to be > placed near a cache line boundary, causing one more cache line to be > loaded when they are run, which will cause perf regressions. > > So for these very reasons function alignment is a good thing. Except that aligning functions doesn't necessarily improve things. Even within a function the alignment of the top of a loop (that is executed a lot) might matter more than the alignment of the function itself. Any change will affect which code 'share' cache lines so can stop the working set of some test (or code loop with deep function calls) from fitting in the I-cache so making it much slower. Changing the size also affects where the TLB boundaries are (especially if not using very large pages). If the number of pages exceeds the number of TLB things will slow down. I think one version of gcc used to align most labels. While the code might be slightly faster, the bloat actually made it a net loss. So aligning functions might help, but it might just make things worse. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index cf6d1cd8b6dc..bcc9e1578937 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -235,6 +235,7 @@ config ARM64 select TRACE_IRQFLAGS_SUPPORT select TRACE_IRQFLAGS_NMI_SUPPORT select HAVE_SOFTIRQ_ON_OWN_STACK + select FUNCTION_ALIGNMENT_16B help ARM 64-bit (AArch64) Linux support.