diff mbox series

kbuild: Add support for Clang's polyhedral loop optimizer.

Message ID 20210120174146.12287-1-lazerl0rd@thezest.dev (mailing list archive)
State New, archived
Headers show
Series kbuild: Add support for Clang's polyhedral loop optimizer. | expand

Commit Message

Diab Neiroukh Jan. 20, 2021, 5:41 p.m. UTC
Polly is able to optimize various loops throughout the kernel for cache
locality. A mathematical representation of the program, based on
polyhedra, is analysed to find opportunistic optimisations in memory
access patterns which then leads to loop transformations.

Polly is not built with LLVM by default, and requires LLVM to be compiled
with the Polly "project". This can be done by adding Polly to
-DLLVM_ENABLE_PROJECTS, for example:

-DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;polly"

Preliminary benchmarking seems to show an improvement of around two
percent across perf benchmarks:

Benchmark                         | Control    | Polly
--------------------------------------------------------
bonnie++ -x 2 -s 4096 -r 0        | 12.610s    | 12.547s
perf bench futex requeue          | 33.553s    | 33.094s
perf bench futex wake             |  1.032s    |  1.021s
perf bench futex wake-parallel    |  1.049s    |  1.025s
perf bench futex requeue          |  1.037s    |  1.020s

Furthermore, Polly does not produce a much larger image size netting it
to be a "free" optimisation. A comparison of a bzImage for a kernel with
and without Polly is shown below:

bzImage        | stat --printf="%s\n"
-------------------------------------
Control        | 9333728
Polly          | 9345792

Compile times were one percent different at best, which is well within
the range of noise. Therefore, I can say with certainty that Polly has
a minimal effect on compile times, if none.

Suggested-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
---
 Makefile     | 16 ++++++++++++++++
 init/Kconfig | 13 +++++++++++++
 2 files changed, 29 insertions(+)

Comments

Masahiro Yamada Feb. 18, 2021, 2:23 a.m. UTC | #1
On Thu, Jan 21, 2021 at 2:42 AM 'Diab Neiroukh' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> Polly is able to optimize various loops throughout the kernel for cache
> locality. A mathematical representation of the program, based on
> polyhedra, is analysed to find opportunistic optimisations in memory
> access patterns which then leads to loop transformations.
>
> Polly is not built with LLVM by default, and requires LLVM to be compiled
> with the Polly "project". This can be done by adding Polly to
> -DLLVM_ENABLE_PROJECTS, for example:
>
> -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;polly"
>
> Preliminary benchmarking seems to show an improvement of around two
> percent across perf benchmarks:
>
> Benchmark                         | Control    | Polly
> --------------------------------------------------------
> bonnie++ -x 2 -s 4096 -r 0        | 12.610s    | 12.547s
> perf bench futex requeue          | 33.553s    | 33.094s
> perf bench futex wake             |  1.032s    |  1.021s
> perf bench futex wake-parallel    |  1.049s    |  1.025s
> perf bench futex requeue          |  1.037s    |  1.020s
>
> Furthermore, Polly does not produce a much larger image size netting it
> to be a "free" optimisation. A comparison of a bzImage for a kernel with
> and without Polly is shown below:
>
> bzImage        | stat --printf="%s\n"
> -------------------------------------
> Control        | 9333728
> Polly          | 9345792
>
> Compile times were one percent different at best, which is well within
> the range of noise. Therefore, I can say with certainty that Polly has
> a minimal effect on compile times, if none.
>
> Suggested-by: Danny Lin <danny@kdrag0n.dev>
> Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>



This patch was correctly sent to clang-built-linux ML,
but did not get any attention.


I did not evaluate anything about this patch, but
this patch is just adding several flags.

Please try to collect Reviewed-by, Tested-by, etc.
from Clang folks if you want this merged.


Just a minor comment:

Typos in the Makefile changes.
"beyound" -> "beyond"
"perfom" -> "perform"




> ---
>  Makefile     | 16 ++++++++++++++++
>  init/Kconfig | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index b9d3a47c57cf..00f15bde5f8b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -740,6 +740,22 @@ else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>  KBUILD_CFLAGS += -Os
>  endif
>
> +ifdef CONFIG_POLLY_CLANG
> +KBUILD_CFLAGS  += -mllvm -polly \
> +                  -mllvm -polly-ast-use-context \
> +                  -mllvm -polly-invariant-load-hoisting \
> +                  -mllvm -polly-opt-fusion=max \
> +                  -mllvm -polly-run-inliner \
> +                  -mllvm -polly-vectorizer=stripmine
> +# Polly may optimise loops with dead paths beyound what the linker
> +# can understand. This may negate the effect of the linker's DCE
> +# so we tell Polly to perfom proven DCE on the loops it optimises
> +# in order to preserve the overall effect of the linker's DCE.
> +ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> +KBUILD_CFLAGS  += -mllvm -polly-run-dce
> +endif
> +endif
> +
>  # Tell gcc to never replace conditional load with a non-conditional one
>  KBUILD_CFLAGS  += $(call cc-option,--param=allow-store-data-races=0)
>  KBUILD_CFLAGS  += $(call cc-option,-fno-allow-store-data-races)
> diff --git a/init/Kconfig b/init/Kconfig
> index 05131b3ad0f2..266d7d03ccd1 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -177,6 +177,19 @@ config BUILD_SALT
>           This is mostly useful for distributions which want to ensure the
>           build is unique between builds. It's safe to leave the default.
>
> +config POLLY_CLANG
> +       bool "Use Clang Polly optimizations"
> +       depends on CC_IS_CLANG && $(cc-option,-mllvm -polly)
> +       depends on !COMPILE_TEST
> +       help
> +         This option enables Clang's polyhedral loop optimizer known as
> +         Polly. Polly is able to optimize various loops throughout the
> +         kernel for cache locality. This requires a Clang toolchain
> +         compiled with support for Polly. More information can be found
> +         from Polly's website:
> +
> +           https://polly.llvm.org
> +
>  config HAVE_KERNEL_GZIP
>         bool
>
> --
> 2.30.0
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210120174146.12287-1-lazerl0rd%40thezest.dev.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index b9d3a47c57cf..00f15bde5f8b 100644
--- a/Makefile
+++ b/Makefile
@@ -740,6 +740,22 @@  else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
 KBUILD_CFLAGS += -Os
 endif
 
+ifdef CONFIG_POLLY_CLANG
+KBUILD_CFLAGS	+= -mllvm -polly \
+		   -mllvm -polly-ast-use-context \
+		   -mllvm -polly-invariant-load-hoisting \
+		   -mllvm -polly-opt-fusion=max \
+		   -mllvm -polly-run-inliner \
+		   -mllvm -polly-vectorizer=stripmine
+# Polly may optimise loops with dead paths beyound what the linker
+# can understand. This may negate the effect of the linker's DCE
+# so we tell Polly to perfom proven DCE on the loops it optimises
+# in order to preserve the overall effect of the linker's DCE.
+ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
+KBUILD_CFLAGS	+= -mllvm -polly-run-dce
+endif
+endif
+
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS	+= $(call cc-option,--param=allow-store-data-races=0)
 KBUILD_CFLAGS	+= $(call cc-option,-fno-allow-store-data-races)
diff --git a/init/Kconfig b/init/Kconfig
index 05131b3ad0f2..266d7d03ccd1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -177,6 +177,19 @@  config BUILD_SALT
 	  This is mostly useful for distributions which want to ensure the
 	  build is unique between builds. It's safe to leave the default.
 
+config POLLY_CLANG
+	bool "Use Clang Polly optimizations"
+	depends on CC_IS_CLANG && $(cc-option,-mllvm -polly)
+	depends on !COMPILE_TEST
+	help
+	  This option enables Clang's polyhedral loop optimizer known as
+	  Polly. Polly is able to optimize various loops throughout the
+	  kernel for cache locality. This requires a Clang toolchain
+	  compiled with support for Polly. More information can be found
+	  from Polly's website:
+
+	    https://polly.llvm.org
+
 config HAVE_KERNEL_GZIP
 	bool