diff mbox series

[RFC] makefile: add debug option to enable function aligned on 32 bytes

Message ID 1595475001-90945-1-git-send-email-feng.tang@intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] makefile: add debug option to enable function aligned on 32 bytes | expand

Commit Message

Feng Tang July 23, 2020, 3:30 a.m. UTC
Recently 0day reported many strange performance changes (regression
or improvement), in which there was no obvious relation between
the culprit commit and the benchmark at the first look, and it causes
people to doubt the test itself is wrong.

Upon further check, many of these cases are caused by the change
to the alignment of kernel text or data, as whole text/data of kernel
are linked together, change in one domain may affect alignments of
other domains.

gcc has an option '-falign-functions=n' to force text aligned, and with
that option enabled, some of those performance changes will be gone,
like [1][2][3].

Add this option so that developers and 0day can easily find performance
bump caused by text alignment change, as tracking these strange bump
is quite time consuming. Though it can't help in other cases like data
alignment changes like [4].

Following is some size data for v5.7 kernel built with a RHEL config
used in 0day:

    text      data      bss	 dec	   filename
  19738771  13292906  5554236  38585913	 vmlinux.noalign
  19758591  13297002  5529660  38585253	 vmlinux.align32

Raw vmlinux size in bytes:

	v5.7		v5.7+align32
	253950832	254018000	+0.02%

Some benchmark data, most of them have no big change:

  * hackbench:		[ -1.8%,  +0.5%]

  * fsmark:		[ -3.2%,  +3.4%]  # ext4/xfs/btrfs

  * kbuild:		[ -2.0%,  +0.9%]

  * will-it-scale:	[ -0.5%,  +1.8%]  # mmap1/pagefault3

  * netperf:
    - TCP_CRR		[+16.6%, +97.4%]
    - TCP_RR		[-18.5%,  -1.8%]
    - TCP_STREAM	[ -1.1%,  +1.9%]

[1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
[2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
[4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 Makefile          |  4 ++++
 lib/Kconfig.debug | 11 +++++++++++
 2 files changed, 15 insertions(+)

Comments

Andrew Morton July 23, 2020, 3:39 a.m. UTC | #1
On Thu, 23 Jul 2020 11:30:01 +0800 Feng Tang <feng.tang@intel.com> wrote:

> Recently 0day reported many strange performance changes (regression
> or improvement), in which there was no obvious relation between
> the culprit commit and the benchmark at the first look, and it causes
> people to doubt the test itself is wrong.
> 
> Upon further check, many of these cases are caused by the change
> to the alignment of kernel text or data, as whole text/data of kernel
> are linked together, change in one domain may affect alignments of
> other domains.
> 
> gcc has an option '-falign-functions=n' to force text aligned, and with
> that option enabled, some of those performance changes will be gone,
> like [1][2][3].
> 
> Add this option so that developers and 0day can easily find performance
> bump caused by text alignment change,

Would they use it this way, or would they simply always enable the
option to reduce the variability?

It makes sense, but is it actually known that this does reduce the
variability?

> as tracking these strange bump
> is quite time consuming. Though it can't help in other cases like data
> alignment changes like [4].
> 
> Following is some size data for v5.7 kernel built with a RHEL config
> used in 0day:
> 
>     text      data      bss	 dec	   filename
>   19738771  13292906  5554236  38585913	 vmlinux.noalign
>   19758591  13297002  5529660  38585253	 vmlinux.align32
> 
> Raw vmlinux size in bytes:
> 
> 	v5.7		v5.7+align32
> 	253950832	254018000	+0.02%
> 
> Some benchmark data, most of them have no big change:
> 
>   * hackbench:		[ -1.8%,  +0.5%]
> 
>   * fsmark:		[ -3.2%,  +3.4%]  # ext4/xfs/btrfs
> 
>   * kbuild:		[ -2.0%,  +0.9%]
> 
>   * will-it-scale:	[ -0.5%,  +1.8%]  # mmap1/pagefault3
> 
>   * netperf:
>     - TCP_CRR		[+16.6%, +97.4%]
>     - TCP_RR		[-18.5%,  -1.8%]
>     - TCP_STREAM	[ -1.1%,  +1.9%]

What do the numbers in [] mean?  The TCP_CRR results look remarkable?

> [1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
> [2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> [3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
> [4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
>
Feng Tang July 23, 2020, 5:13 a.m. UTC | #2
Hi Andrew,

Thanks for the review.

On Wed, Jul 22, 2020 at 08:39:19PM -0700, Andrew Morton wrote:
> On Thu, 23 Jul 2020 11:30:01 +0800 Feng Tang <feng.tang@intel.com> wrote:
> 
> > Recently 0day reported many strange performance changes (regression
> > or improvement), in which there was no obvious relation between
> > the culprit commit and the benchmark at the first look, and it causes
> > people to doubt the test itself is wrong.
> > 
> > Upon further check, many of these cases are caused by the change
> > to the alignment of kernel text or data, as whole text/data of kernel
> > are linked together, change in one domain may affect alignments of
> > other domains.
> > 
> > gcc has an option '-falign-functions=n' to force text aligned, and with
> > that option enabled, some of those performance changes will be gone,
> > like [1][2][3].
> > 
> > Add this option so that developers and 0day can easily find performance
> > bump caused by text alignment change,
> 
> Would they use it this way, or would they simply always enable the
> option to reduce the variability

We've had concerns about side effects, like increased kernel size won't be
accepted by embedded system, the possible i-cache usage/contention increase.

And I've only done limited benchmark test, so I thought it may be safer
to be off by default. Though my bold thought was it could be default on :)

> It makes sense, but is it actually known that this does reduce the
> variability?

Yes, at lease for the strange performance bumps reported by 0day, like
in [1][2][3].

> > as tracking these strange bump
> > is quite time consuming. Though it can't help in other cases like data
> > alignment changes like [4].
> > 
> > Following is some size data for v5.7 kernel built with a RHEL config
> > used in 0day:
> > 
> >     text      data      bss	 dec	   filename
> >   19738771  13292906  5554236  38585913	 vmlinux.noalign
> >   19758591  13297002  5529660  38585253	 vmlinux.align32
> > 
> > Raw vmlinux size in bytes:
> > 
> > 	v5.7		v5.7+align32
> > 	253950832	254018000	+0.02%
> > 
> > Some benchmark data, most of them have no big change:
> > 
> >   * hackbench:		[ -1.8%,  +0.5%]
> > 
> >   * fsmark:		[ -3.2%,  +3.4%]  # ext4/xfs/btrfs
> > 
> >   * kbuild:		[ -2.0%,  +0.9%]
> > 
> >   * will-it-scale:	[ -0.5%,  +1.8%]  # mmap1/pagefault3
> > 
> >   * netperf:
> >     - TCP_CRR		[+16.6%, +97.4%]
> >     - TCP_RR		[-18.5%,  -1.8%]
> >     - TCP_STREAM	[ -1.1%,  +1.9%]
> 
> What do the numbers in [] mean?  The TCP_CRR results look remarkable?
 
For each of the benchmark listed above, I took 2 or 3 test platforms
and run it with different parameters. So each of the benchmark will
have several cases run, and [] lists the lowest and highest result.

For the netperf/TCP_CRR case, the lowest is +16.6% on a Skylake server
with 16 testing threads, and highest is +97.4 on a Cascadelake server
with 96 testing threads.

Thanks,
Feng

> > [1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
> > [2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> > [3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
> > [4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
> >
Feng Tang July 23, 2020, 6:29 a.m. UTC | #3
On Wed, Jul 22, 2020 at 08:39:19PM -0700, Andrew Morton wrote:
> On Thu, 23 Jul 2020 11:30:01 +0800 Feng Tang <feng.tang@intel.com> wrote:
> 
> > Recently 0day reported many strange performance changes (regression
> > or improvement), in which there was no obvious relation between
> > the culprit commit and the benchmark at the first look, and it causes
> > people to doubt the test itself is wrong.
> > 
> > Upon further check, many of these cases are caused by the change
> > to the alignment of kernel text or data, as whole text/data of kernel
> > are linked together, change in one domain may affect alignments of
> > other domains.
> > 
> > gcc has an option '-falign-functions=n' to force text aligned, and with
> > that option enabled, some of those performance changes will be gone,
> > like [1][2][3].
> > 
> > Add this option so that developers and 0day can easily find performance
> > bump caused by text alignment change,
> 
> Would they use it this way, or would they simply always enable the
> option to reduce the variability?

I may mis-understood it in my last reply. If you are asking about how
will developers and 0day use this option, for 0day, I've talked with
0day folks, they may just enable it by default, as 0day cares more about
the performance delta caused by a commit (Adding Philip and Rong from
0day).

Thanks,
Feng


> It makes sense, but is it actually known that this does reduce the
> variability?
> 
> > as tracking these strange bump
> > is quite time consuming. Though it can't help in other cases like data
> > alignment changes like [4].
> > 
> > Following is some size data for v5.7 kernel built with a RHEL config
> > used in 0day:
> > 
> >     text      data      bss	 dec	   filename
> >   19738771  13292906  5554236  38585913	 vmlinux.noalign
> >   19758591  13297002  5529660  38585253	 vmlinux.align32
> > 
> > Raw vmlinux size in bytes:
> > 
> > 	v5.7		v5.7+align32
> > 	253950832	254018000	+0.02%
> > 
> > Some benchmark data, most of them have no big change:
> > 
> >   * hackbench:		[ -1.8%,  +0.5%]
> > 
> >   * fsmark:		[ -3.2%,  +3.4%]  # ext4/xfs/btrfs
> > 
> >   * kbuild:		[ -2.0%,  +0.9%]
> > 
> >   * will-it-scale:	[ -0.5%,  +1.8%]  # mmap1/pagefault3
> > 
> >   * netperf:
> >     - TCP_CRR		[+16.6%, +97.4%]
> >     - TCP_RR		[-18.5%,  -1.8%]
> >     - TCP_STREAM	[ -1.1%,  +1.9%]
> 
> What do the numbers in [] mean?  The TCP_CRR results look remarkable?
> 
> > [1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
> > [2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> > [3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
> > [4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
> >
Andrew Morton July 24, 2020, 12:57 a.m. UTC | #4
On Thu, 23 Jul 2020 14:29:33 +0800 Feng Tang <feng.tang@intel.com> wrote:

> > > gcc has an option '-falign-functions=n' to force text aligned, and with
> > > that option enabled, some of those performance changes will be gone,
> > > like [1][2][3].
> > > 
> > > Add this option so that developers and 0day can easily find performance
> > > bump caused by text alignment change,
> > 
> > Would they use it this way, or would they simply always enable the
> > option to reduce the variability?
> 
> I may mis-understood it in my last reply. If you are asking about how
> will developers and 0day use this option, for 0day, I've talked with
> 0day folks, they may just enable it by default, as 0day cares more about
> the performance delta caused by a commit (Adding Philip and Rong from
> 0day).

OK, thanks, I suspected as much.

The patch is so simple and probably-will-work, I guess we toss it in
there and see.

However it would be good if the 0day people could use it for a while
and then provide some feedback on whether it is actually proving
useful.  If not, we get to remove some stuff.
Feng Tang July 24, 2020, 1:06 a.m. UTC | #5
On Thu, Jul 23, 2020 at 05:57:04PM -0700, Andrew Morton wrote:
> On Thu, 23 Jul 2020 14:29:33 +0800 Feng Tang <feng.tang@intel.com> wrote:
> 
> > > > gcc has an option '-falign-functions=n' to force text aligned, and with
> > > > that option enabled, some of those performance changes will be gone,
> > > > like [1][2][3].
> > > > 
> > > > Add this option so that developers and 0day can easily find performance
> > > > bump caused by text alignment change,
> > > 
> > > Would they use it this way, or would they simply always enable the
> > > option to reduce the variability?
> > 
> > I may mis-understood it in my last reply. If you are asking about how
> > will developers and 0day use this option, for 0day, I've talked with
> > 0day folks, they may just enable it by default, as 0day cares more about
> > the performance delta caused by a commit (Adding Philip and Rong from
> > 0day).
> 
> OK, thanks, I suspected as much.
> 
> The patch is so simple and probably-will-work, I guess we toss it in
> there and see.

Thanks!

> However it would be good if the 0day people could use it for a while
> and then provide some feedback on whether it is actually proving
> useful.  If not, we get to remove some stuff.

Yes, 0day is a good user to try this.

Thanks,
Feng
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 249a51d25c63..a59105e6f573 100644
--- a/Makefile
+++ b/Makefile
@@ -886,6 +886,10 @@  KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
 export CC_FLAGS_SCS
 endif
 
+ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
+KBUILD_CFLAGS += -falign-functions=32
+endif
+
 # arch Makefile may override CC so keep this after arch Makefile is included
 NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9ad9210d70a1..c1d52c4f120f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -365,6 +365,17 @@  config SECTION_MISMATCH_WARN_ONLY
 
 	  If unsure, say Y.
 
+config DEBUG_FORCE_FUNCTION_ALIGN_32B
+	bool "Force all function address 32B aligned" if EXPERT
+	help
+	  There are cases that a commit from one domain changes the function
+	  address alignment of other domains, and cause magic performance
+	  bump (regression or improvement). Enable this option will help to
+	  verify if the bump is caused by function alignment changes, while
+	  it will slightly increase the kernel size and affect icache usage.
+
+	  It is mainly for debug and performance tuning use.
+
 #
 # Select this config option from the architecture Kconfig, if it
 # is preferred to always offer frame pointers as a config