diff mbox series

[net-next,1/4] unroll: add generic loop unroll helpers

Message ID 20250206182630.3914318-2-aleksander.lobakin@intel.com (mailing list archive)
State Accepted
Commit c6594d64271704b335378e7b74c39fe4d4fcc777
Delegated to: Netdev Maintainers
Headers show
Series xsk: the lost bits from Chapter III | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 11 this patch: 11
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 6 maintainers not CCed: ndesaulniers@google.com morbo@google.com john.johansen@canonical.com kpsingh@kernel.org llvm@lists.linux.dev justinstitt@google.com
netdev/build_clang success Errors and warnings before: 294 this patch: 294
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 177 this patch: 177
netdev/checkpatch warning WARNING: Argument 'x' is not used in function-like macro WARNING: Argument 'y' is not used in function-like macro WARNING: Co-developed-by and Signed-off-by: name/email do not match
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-07--21-00 (tests: 890)

Commit Message

Alexander Lobakin Feb. 6, 2025, 6:26 p.m. UTC
There are cases when we need to explicitly unroll loops. For example,
cache operations, filling DMA descriptors on very high speeds etc.
Add compiler-specific attribute macros to give the compiler a hint
that we'd like to unroll a loop.
Example usage:

 #define UNROLL_BATCH 8

	unrolled_count(UNROLL_BATCH)
	for (u32 i = 0; i < UNROLL_BATCH; i++)
		op(priv, i);

Note that sometimes the compilers won't unroll loops if they think this
would have worse optimization and perf than without unrolling, and that
unroll attributes are available only starting GCC 8. For older compiler
versions, no hints/attributes will be applied.
For better unrolling/parallelization, don't have any variables that
interfere between iterations except for the iterator itself.

Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/unroll.h | 44 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

Comments

Simon Horman Feb. 9, 2025, 11:07 a.m. UTC | #1
On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote:
> There are cases when we need to explicitly unroll loops. For example,
> cache operations, filling DMA descriptors on very high speeds etc.
> Add compiler-specific attribute macros to give the compiler a hint
> that we'd like to unroll a loop.
> Example usage:
> 
>  #define UNROLL_BATCH 8
> 
> 	unrolled_count(UNROLL_BATCH)
> 	for (u32 i = 0; i < UNROLL_BATCH; i++)
> 		op(priv, i);
> 
> Note that sometimes the compilers won't unroll loops if they think this
> would have worse optimization and perf than without unrolling, and that
> unroll attributes are available only starting GCC 8. For older compiler
> versions, no hints/attributes will be applied.
> For better unrolling/parallelization, don't have any variables that
> interfere between iterations except for the iterator itself.
> 
> Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Hi Alexander,

This patch adds four variants of the unrolled helper.  But as far as I can
tell the patch-set only makes use of one of them, unrolled_count().

I think it would be best if this patch only added helpers that are used.

...
Maciej Fijalkowski Feb. 10, 2025, 12:28 p.m. UTC | #2
On Sun, Feb 09, 2025 at 11:07:25AM +0000, Simon Horman wrote:
> On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote:
> > There are cases when we need to explicitly unroll loops. For example,
> > cache operations, filling DMA descriptors on very high speeds etc.
> > Add compiler-specific attribute macros to give the compiler a hint
> > that we'd like to unroll a loop.
> > Example usage:
> > 
> >  #define UNROLL_BATCH 8
> > 
> > 	unrolled_count(UNROLL_BATCH)
> > 	for (u32 i = 0; i < UNROLL_BATCH; i++)
> > 		op(priv, i);
> > 
> > Note that sometimes the compilers won't unroll loops if they think this
> > would have worse optimization and perf than without unrolling, and that
> > unroll attributes are available only starting GCC 8. For older compiler
> > versions, no hints/attributes will be applied.
> > For better unrolling/parallelization, don't have any variables that
> > interfere between iterations except for the iterator itself.
> > 
> > Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
> > Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 
> Hi Alexander,
> 
> This patch adds four variants of the unrolled helper.  But as far as I can
> tell the patch-set only makes use of one of them, unrolled_count().
> 
> I think it would be best if this patch only added helpers that are used.

That is debatable but I think I tend to agree here. If we add say 3 unused
macros then nothing stops someone from coming up with a patch that deletes
them because they are unused, right?

> 
> ...
Alexander Lobakin Feb. 10, 2025, 3:49 p.m. UTC | #3
From: Simon Horman <horms@kernel.org>
Date: Sun, 9 Feb 2025 11:07:25 +0000

> On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote:
>> There are cases when we need to explicitly unroll loops. For example,
>> cache operations, filling DMA descriptors on very high speeds etc.
>> Add compiler-specific attribute macros to give the compiler a hint
>> that we'd like to unroll a loop.
>> Example usage:
>>
>>  #define UNROLL_BATCH 8
>>
>> 	unrolled_count(UNROLL_BATCH)
>> 	for (u32 i = 0; i < UNROLL_BATCH; i++)
>> 		op(priv, i);
>>
>> Note that sometimes the compilers won't unroll loops if they think this
>> would have worse optimization and perf than without unrolling, and that
>> unroll attributes are available only starting GCC 8. For older compiler
>> versions, no hints/attributes will be applied.
>> For better unrolling/parallelization, don't have any variables that
>> interfere between iterations except for the iterator itself.
>>
>> Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 
> Hi Alexander,
> 
> This patch adds four variants of the unrolled helper.  But as far as I can
> tell the patch-set only makes use of one of them, unrolled_count().
> 
> I think it would be best if this patch only added helpers that are used.

I thought they might help people in future.
I can remove them if you insist. BTW the original patch from Jose also
added several variants.

Thanks,
Olek
Simon Horman Feb. 10, 2025, 9:08 p.m. UTC | #4
On Mon, Feb 10, 2025 at 04:49:14PM +0100, Alexander Lobakin wrote:
> From: Simon Horman <horms@kernel.org>
> Date: Sun, 9 Feb 2025 11:07:25 +0000
> 
> > On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote:
> >> There are cases when we need to explicitly unroll loops. For example,
> >> cache operations, filling DMA descriptors on very high speeds etc.
> >> Add compiler-specific attribute macros to give the compiler a hint
> >> that we'd like to unroll a loop.
> >> Example usage:
> >>
> >>  #define UNROLL_BATCH 8
> >>
> >> 	unrolled_count(UNROLL_BATCH)
> >> 	for (u32 i = 0; i < UNROLL_BATCH; i++)
> >> 		op(priv, i);
> >>
> >> Note that sometimes the compilers won't unroll loops if they think this
> >> would have worse optimization and perf than without unrolling, and that
> >> unroll attributes are available only starting GCC 8. For older compiler
> >> versions, no hints/attributes will be applied.
> >> For better unrolling/parallelization, don't have any variables that
> >> interfere between iterations except for the iterator itself.
> >>
> >> Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
> >> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> >> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> > 
> > Hi Alexander,
> > 
> > This patch adds four variants of the unrolled helper.  But as far as I can
> > tell the patch-set only makes use of one of them, unrolled_count().
> > 
> > I think it would be best if this patch only added helpers that are used.
> 
> I thought they might help people in future.
> I can remove them if you insist. BTW the original patch from Jose also
> added several variants.

I do slightly prefer only adding what is used.
Jakub Kicinski Feb. 11, 2025, 1:56 a.m. UTC | #5
On Mon, 10 Feb 2025 21:08:19 +0000 Simon Horman wrote:
> > > This patch adds four variants of the unrolled helper.  But as far as I can
> > > tell the patch-set only makes use of one of them, unrolled_count().
> > > 
> > > I think it would be best if this patch only added helpers that are used.  
> > 
> > I thought they might help people in future.
> > I can remove them if you insist. BTW the original patch from Jose also
> > added several variants.  
> 
> I do slightly prefer only adding what is used.

Hm, I'm a bit on the fence. IDK how trivial it is to figure out how 
to get the equivalent behavior from the compilers... 

Let's keep it, if someone feels strongly I guess they could post 
a patch to delete the unused variants.
diff mbox series

Patch

diff --git a/include/linux/unroll.h b/include/linux/unroll.h
index d42fd6366373..863fb69f6a7e 100644
--- a/include/linux/unroll.h
+++ b/include/linux/unroll.h
@@ -9,6 +9,50 @@ 
 
 #include <linux/args.h>
 
+#ifdef CONFIG_CC_IS_CLANG
+#define __pick_unrolled(x, y)		_Pragma(#x)
+#elif CONFIG_GCC_VERSION >= 80000
+#define __pick_unrolled(x, y)		_Pragma(#y)
+#else
+#define __pick_unrolled(x, y)		/* not supported */
+#endif
+
+/**
+ * unrolled - loop attributes to ask the compiler to unroll it
+ *
+ * Usage:
+ *
+ * #define BATCH 8
+ *
+ *	unrolled_count(BATCH)
+ *	for (u32 i = 0; i < BATCH; i++)
+ *		// loop body without cross-iteration dependencies
+ *
+ * This is only a hint and the compiler is free to disable unrolling if it
+ * thinks the count is suboptimal and may hurt performance and/or hugely
+ * increase object code size.
+ * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends
+ * on what iter x will do with variables) is not a strict requirement, but
+ * provides best performance and object code size.
+ * Available only on Clang and GCC 8.x onwards.
+ */
+
+/* Ask the compiler to pick an optimal unroll count, Clang only */
+#define unrolled							\
+	__pick_unrolled(clang loop unroll(enable), /* nothing */)
+
+/* Unroll each @n iterations of the loop */
+#define unrolled_count(n)						\
+	__pick_unrolled(clang loop unroll_count(n), GCC unroll n)
+
+/* Unroll the whole loop */
+#define unrolled_full							\
+	__pick_unrolled(clang loop unroll(full), GCC unroll 65534)
+
+/* Never unroll the loop */
+#define unrolled_none							\
+	__pick_unrolled(clang loop unroll(disable), GCC unroll 1)
+
 #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args)
 
 #define __UNROLL_0(MACRO, args...)