diff mbox series

[net-next,v2,1/9] unroll: add generic loop unroll helpers

Message ID 20240819223442.48013-2-anthony.l.nguyen@intel.com (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series idpf: XDP chapter II: convert Tx completion to libeth | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 5 maintainers not CCed: justinstitt@google.com llvm@lists.linux.dev morbo@google.com ndesaulniers@google.com nathan@kernel.org
netdev/build_clang success Errors and warnings before: 17 this patch: 17
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 22 this patch: 22
netdev/checkpatch warning WARNING: Argument 'x' is not used in function-like macro WARNING: Argument 'y' is not used in function-like macro WARNING: Co-developed-by and Signed-off-by: name/email do not match WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-20--09-00 (tests: 712)

Commit Message

Tony Nguyen Aug. 19, 2024, 10:34 p.m. UTC
From: Alexander Lobakin <aleksander.lobakin@intel.com>

There are cases when we need to explicitly unroll loops. For example,
cache operations, filling DMA descriptors on very high speeds etc.
Add compiler-specific attribute macros to give the compiler a hint
that we'd like to unroll a loop.
Example usage:

 #define UNROLL_BATCH 8

	unrolled_count(UNROLL_BATCH)
	for (u32 i = 0; i < UNROLL_BATCH; i++)
		op(priv, i);

Note that sometimes the compilers won't unroll loops if they think this
would have worse optimization and perf than without unrolling, and that
unroll attributes are available only starting GCC 8. For older compiler
versions, no hints/attributes will be applied.
For better unrolling/parallelization, don't have any variables that
interfere between iterations except for the iterator itself.

Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 include/linux/unroll.h | 50 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 include/linux/unroll.h

Comments

Jakub Kicinski Aug. 21, 2024, 12:55 a.m. UTC | #1
On Mon, 19 Aug 2024 15:34:33 -0700 Tony Nguyen wrote:
> There are cases when we need to explicitly unroll loops. For example,
> cache operations, filling DMA descriptors on very high speeds etc.
> Add compiler-specific attribute macros to give the compiler a hint
> that we'd like to unroll a loop.
> Example usage:
> 
>  #define UNROLL_BATCH 8
> 
> 	unrolled_count(UNROLL_BATCH)
> 	for (u32 i = 0; i < UNROLL_BATCH; i++)
> 		op(priv, i);
> 
> Note that sometimes the compilers won't unroll loops if they think this
> would have worse optimization and perf than without unrolling, and that
> unroll attributes are available only starting GCC 8. For older compiler
> versions, no hints/attributes will be applied.
> For better unrolling/parallelization, don't have any variables that
> interfere between iterations except for the iterator itself.

Please run the submissions thru get_maintainers
Alexander Lobakin Aug. 22, 2024, 3:15 p.m. UTC | #2
From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 20 Aug 2024 17:55:39 -0700

> On Mon, 19 Aug 2024 15:34:33 -0700 Tony Nguyen wrote:
>> There are cases when we need to explicitly unroll loops. For example,
>> cache operations, filling DMA descriptors on very high speeds etc.
>> Add compiler-specific attribute macros to give the compiler a hint
>> that we'd like to unroll a loop.
>> Example usage:
>>
>>  #define UNROLL_BATCH 8
>>
>> 	unrolled_count(UNROLL_BATCH)
>> 	for (u32 i = 0; i < UNROLL_BATCH; i++)
>> 		op(priv, i);
>>
>> Note that sometimes the compilers won't unroll loops if they think this
>> would have worse optimization and perf than without unrolling, and that
>> unroll attributes are available only starting GCC 8. For older compiler
>> versions, no hints/attributes will be applied.
>> For better unrolling/parallelization, don't have any variables that
>> interfere between iterations except for the iterator itself.
> 
> Please run the submissions thru get_maintainers

I always do that. get_maintainers.pl gives nobody for linux/unroll.h.

Thanks,
Olek
Jakub Kicinski Aug. 22, 2024, 10:59 p.m. UTC | #3
On Thu, 22 Aug 2024 17:15:25 +0200 Alexander Lobakin wrote:
> > Please run the submissions thru get_maintainers  
> 
> I always do that. get_maintainers.pl gives nobody for linux/unroll.h.

You gotta feed it the *patch*, not the path. For keyword matching on the
contents. I wanted to print a warning when people use get_maintainer
with a path but Linus blocked it. I'm convinced 99% of such uses are
misguided.

But TBH I was directing the message at Tony as well. Please just feed
the patches to get_maintainer when posting.
Tony Nguyen Aug. 22, 2024, 11:35 p.m. UTC | #4
On 8/22/2024 3:59 PM, Jakub Kicinski wrote:
> On Thu, 22 Aug 2024 17:15:25 +0200 Alexander Lobakin wrote:
>>> Please run the submissions thru get_maintainers
>>
>> I always do that. get_maintainers.pl gives nobody for linux/unroll.h.
> 
> You gotta feed it the *patch*, not the path. For keyword matching on the
> contents. I wanted to print a warning when people use get_maintainer
> with a path but Linus blocked it. I'm convinced 99% of such uses are
> misguided.
> 
> But TBH I was directing the message at Tony as well. Please just feed
> the patches to get_maintainer when posting.

Ack.

Thanks,
Tony
Alexander Lobakin Aug. 23, 2024, 12:37 p.m. UTC | #5
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 22 Aug 2024 15:59:46 -0700

> On Thu, 22 Aug 2024 17:15:25 +0200 Alexander Lobakin wrote:
>>> Please run the submissions thru get_maintainers  
>>
>> I always do that. get_maintainers.pl gives nobody for linux/unroll.h.
> 
> You gotta feed it the *patch*, not the path. For keyword matching on the

Oops, sorry. I always do that for patches, not paths, but here I wanted
to use a shortcut not thinking that it may give completely different
results =\

> contents. I wanted to print a warning when people use get_maintainer
> with a path but Linus blocked it. I'm convinced 99% of such uses are
> misguided.
> 
> But TBH I was directing the message at Tony as well. Please just feed
> the patches to get_maintainer when posting.

Thanks,
Olek
diff mbox series

Patch

diff --git a/include/linux/unroll.h b/include/linux/unroll.h
new file mode 100644
index 000000000000..e305d155faa6
--- /dev/null
+++ b/include/linux/unroll.h
@@ -0,0 +1,50 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2024 Intel Corporation */
+
+#ifndef _LINUX_UNROLL_H
+#define _LINUX_UNROLL_H
+
+#ifdef CONFIG_CC_IS_CLANG
+#define __pick_unrolled(x, y)	_Pragma(#x)
+#elif CONFIG_GCC_VERSION >= 80000
+#define __pick_unrolled(x, y)	_Pragma(#y)
+#else
+#define __pick_unrolled(x, y)	/* not supported */
+#endif
+
+/**
+ * unrolled - loop attributes to ask the compiler to unroll it
+ *
+ * Usage:
+ *
+ * #define BATCH 4
+ *	unrolled_count(BATCH)
+ *	for (u32 i = 0; i < BATCH; i++)
+ *		// loop body without cross-iteration dependencies
+ *
+ * This is only a hint and the compiler is free to disable unrolling if it
+ * thinks the count is suboptimal and may hurt performance and/or hugely
+ * increase object code size.
+ * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends
+ * on what iter x will do with variables) is not a strict requirement, but
+ * provides best performance and object code size.
+ * Available only on Clang and GCC 8.x onwards.
+ */
+
+/* Ask the compiler to pick an optimal unroll count, Clang only */
+#define unrolled							    \
+	__pick_unrolled(clang loop unroll(enable), /* nothing */)
+
+/* Unroll each @n iterations of a loop */
+#define unrolled_count(n)						    \
+	__pick_unrolled(clang loop unroll_count(n), GCC unroll n)
+
+/* Unroll the whole loop */
+#define unrolled_full							    \
+	__pick_unrolled(clang loop unroll(full), GCC unroll 65534)
+
+/* Never unroll a loop */
+#define unrolled_none							    \
+	__pick_unrolled(clang loop unroll(disable), GCC unroll 1)
+
+#endif /* _LINUX_UNROLL_H */