Message ID | 20250206182630.3914318-2-aleksander.lobakin@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c6594d64271704b335378e7b74c39fe4d4fcc777 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | xsk: the lost bits from Chapter III | expand |
On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote: > There are cases when we need to explicitly unroll loops. For example, > cache operations, filling DMA descriptors on very high speeds etc. > Add compiler-specific attribute macros to give the compiler a hint > that we'd like to unroll a loop. > Example usage: > > #define UNROLL_BATCH 8 > > unrolled_count(UNROLL_BATCH) > for (u32 i = 0; i < UNROLL_BATCH; i++) > op(priv, i); > > Note that sometimes the compilers won't unroll loops if they think this > would have worse optimization and perf than without unrolling, and that > unroll attributes are available only starting GCC 8. For older compiler > versions, no hints/attributes will be applied. > For better unrolling/parallelization, don't have any variables that > interfere between iterations except for the iterator itself. > > Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas > Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Hi Alexander, This patch adds four variants of the unrolled helper. But as far as I can tell the patch-set only makes use of one of them, unrolled_count(). I think it would be best if this patch only added helpers that are used. ...
On Sun, Feb 09, 2025 at 11:07:25AM +0000, Simon Horman wrote: > On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote: > > There are cases when we need to explicitly unroll loops. For example, > > cache operations, filling DMA descriptors on very high speeds etc. > > Add compiler-specific attribute macros to give the compiler a hint > > that we'd like to unroll a loop. > > Example usage: > > > > #define UNROLL_BATCH 8 > > > > unrolled_count(UNROLL_BATCH) > > for (u32 i = 0; i < UNROLL_BATCH; i++) > > op(priv, i); > > > > Note that sometimes the compilers won't unroll loops if they think this > > would have worse optimization and perf than without unrolling, and that > > unroll attributes are available only starting GCC 8. For older compiler > > versions, no hints/attributes will be applied. > > For better unrolling/parallelization, don't have any variables that > > interfere between iterations except for the iterator itself. > > > > Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas > > Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > > Hi Alexander, > > This patch adds four variants of the unrolled helper. But as far as I can > tell the patch-set only makes use of one of them, unrolled_count(). > > I think it would be best if this patch only added helpers that are used. That is debatable but I think I tend to agree here. If we add say 3 unused macros then nothing stops someone from coming up with a patch that deletes them because they are unused, right? > > ...
From: Simon Horman <horms@kernel.org> Date: Sun, 9 Feb 2025 11:07:25 +0000 > On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote: >> There are cases when we need to explicitly unroll loops. For example, >> cache operations, filling DMA descriptors on very high speeds etc. >> Add compiler-specific attribute macros to give the compiler a hint >> that we'd like to unroll a loop. >> Example usage: >> >> #define UNROLL_BATCH 8 >> >> unrolled_count(UNROLL_BATCH) >> for (u32 i = 0; i < UNROLL_BATCH; i++) >> op(priv, i); >> >> Note that sometimes the compilers won't unroll loops if they think this >> would have worse optimization and perf than without unrolling, and that >> unroll attributes are available only starting GCC 8. For older compiler >> versions, no hints/attributes will be applied. >> For better unrolling/parallelization, don't have any variables that >> interfere between iterations except for the iterator itself. >> >> Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas >> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> >> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > > Hi Alexander, > > This patch adds four variants of the unrolled helper. But as far as I can > tell the patch-set only makes use of one of them, unrolled_count(). > > I think it would be best if this patch only added helpers that are used. I thought they might help people in future. I can remove them if you insist. BTW the original patch from Jose also added several variants. Thanks, Olek
On Mon, Feb 10, 2025 at 04:49:14PM +0100, Alexander Lobakin wrote: > From: Simon Horman <horms@kernel.org> > Date: Sun, 9 Feb 2025 11:07:25 +0000 > > > On Thu, Feb 06, 2025 at 07:26:26PM +0100, Alexander Lobakin wrote: > >> There are cases when we need to explicitly unroll loops. For example, > >> cache operations, filling DMA descriptors on very high speeds etc. > >> Add compiler-specific attribute macros to give the compiler a hint > >> that we'd like to unroll a loop. > >> Example usage: > >> > >> #define UNROLL_BATCH 8 > >> > >> unrolled_count(UNROLL_BATCH) > >> for (u32 i = 0; i < UNROLL_BATCH; i++) > >> op(priv, i); > >> > >> Note that sometimes the compilers won't unroll loops if they think this > >> would have worse optimization and perf than without unrolling, and that > >> unroll attributes are available only starting GCC 8. For older compiler > >> versions, no hints/attributes will be applied. > >> For better unrolling/parallelization, don't have any variables that > >> interfere between iterations except for the iterator itself. > >> > >> Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas > >> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> > >> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > > > > Hi Alexander, > > > > This patch adds four variants of the unrolled helper. But as far as I can > > tell the patch-set only makes use of one of them, unrolled_count(). > > > > I think it would be best if this patch only added helpers that are used. > > I thought they might help people in future. > I can remove them if you insist. BTW the original patch from Jose also > added several variants. I do slightly prefer only adding what is used.
On Mon, 10 Feb 2025 21:08:19 +0000 Simon Horman wrote: > > > This patch adds four variants of the unrolled helper. But as far as I can > > > tell the patch-set only makes use of one of them, unrolled_count(). > > > > > > I think it would be best if this patch only added helpers that are used. > > > > I thought they might help people in future. > > I can remove them if you insist. BTW the original patch from Jose also > > added several variants. > > I do slightly prefer only adding what is used. Hm, I'm a bit on the fence. IDK how trivial it is to figure out how to get the equivalent behavior from the compilers... Let's keep it, if someone feels strongly I guess they could post a patch to delete the unused variants.
diff --git a/include/linux/unroll.h b/include/linux/unroll.h index d42fd6366373..863fb69f6a7e 100644 --- a/include/linux/unroll.h +++ b/include/linux/unroll.h @@ -9,6 +9,50 @@ #include <linux/args.h> +#ifdef CONFIG_CC_IS_CLANG +#define __pick_unrolled(x, y) _Pragma(#x) +#elif CONFIG_GCC_VERSION >= 80000 +#define __pick_unrolled(x, y) _Pragma(#y) +#else +#define __pick_unrolled(x, y) /* not supported */ +#endif + +/** + * unrolled - loop attributes to ask the compiler to unroll it + * + * Usage: + * + * #define BATCH 8 + * + * unrolled_count(BATCH) + * for (u32 i = 0; i < BATCH; i++) + * // loop body without cross-iteration dependencies + * + * This is only a hint and the compiler is free to disable unrolling if it + * thinks the count is suboptimal and may hurt performance and/or hugely + * increase object code size. + * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends + * on what iter x will do with variables) is not a strict requirement, but + * provides best performance and object code size. + * Available only on Clang and GCC 8.x onwards. + */ + +/* Ask the compiler to pick an optimal unroll count, Clang only */ +#define unrolled \ + __pick_unrolled(clang loop unroll(enable), /* nothing */) + +/* Unroll each @n iterations of the loop */ +#define unrolled_count(n) \ + __pick_unrolled(clang loop unroll_count(n), GCC unroll n) + +/* Unroll the whole loop */ +#define unrolled_full \ + __pick_unrolled(clang loop unroll(full), GCC unroll 65534) + +/* Never unroll the loop */ +#define unrolled_none \ + __pick_unrolled(clang loop unroll(disable), GCC unroll 1) + #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args) #define __UNROLL_0(MACRO, args...)