From patchwork Wed Dec 11 17:26:49 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13904007
X-Patchwork-Delegate: kuba@kernel.org
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B68E4225A5E;
	Wed, 11 Dec 2024 17:29:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1733938177; cv=none;
 b=f81sewMlP75qWBPYCnG/9Bg/Ue8FeGf7SKYWBSG0qkivwETeEXD3ov8PoagsL/JpvOMlckw+PjN0nHPML8FY8P/SEiBxDOtQZd/PUPjeeCVHTKnndoxLTzztNEq6nhhStro2gm2kFUnabDe8Sz+KAWqYL2oUzE/9UHbsvQBTLe0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1733938177; c=relaxed/simple;
	bh=KHzL8QEqHZUQrHH+gxbt09/SaI0VfgXWHYbRV+8/Ixg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=IVN5CKo7M2xPztWU9S62pr6cjAUGUY86dfHJX4MVn9wt/YP/Yx2jtLT7UJX5hbLpE9dzcbpA3zX4tA8ZSWw9pFCWWZUoYFEmiMvCT2vfWIzikvAlK7cQptFeYDjyRufrBGAFtTXr3NRDTMyf6unlMYOgXCbyjSeMXuDrPAxcs0w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=HMbpwB7a; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="HMbpwB7a"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1733938176; x=1765474176;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KHzL8QEqHZUQrHH+gxbt09/SaI0VfgXWHYbRV+8/Ixg=;
  b=HMbpwB7aBw3LZ+PRKhaaJ0+qGR3IccXJPC5HHc1wqHPGy6GCVA/p5qO+
   Pwn8szHGVW17QhM4FS8Q/MXvMSFS3YLzqMmrA9kGDvtpR0EeEv1IfA8Zq
   queMdkIvmIUuklpAfBeHTVrWNDSTQ+SMTRIlS4F0gvMbQOhEj+S/09SnQ
   FJ9Ifm1kfd4OXSg5kS2stgy+CnNCMCWA6/W6UfIw78/p7Mx/nrOcaLY4y
   viMhDp1R0nCbau+TllnTtZQ+ZFUtJWIxyS+RTsNt+Jx26RA2UgxBTVWvk
   W4bATmxJgKgiVB7XsnJ3fThwI7PrJ1gy2nMTX+18zJvBGYuBA5+aQYZDv
   A==;
X-CSE-ConnectionGUID: GoCxghfLRTuxiuEut/f1eQ==
X-CSE-MsgGUID: Te8BwVVyQB24o+ivG7F7JA==
X-IronPort-AV: E=McAfee;i="6700,10204,11283"; a="51859718"
X-IronPort-AV: E=Sophos;i="6.12,226,1728975600";
   d="scan'208";a="51859718"
Received: from fmviesa002.fm.intel.com ([10.60.135.142])
  by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 11 Dec 2024 09:29:35 -0800
X-CSE-ConnectionGUID: zG45zWfkTGWCrfVGa+iRjw==
X-CSE-MsgGUID: gsX/4hw6SW2Y9O8yYzyNnQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600";
   d="scan'208";a="119122423"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa002.fm.intel.com with ESMTP; 11 Dec 2024 09:29:30 -0800
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
 Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>,
 John Fastabend <john.fastabend@gmail.com>,
 Andrii Nakryiko <andrii@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Josh Poimboeuf <jpoimboe@kernel.org>,
 "Jose E. Marchesi" <jose.marchesi@oracle.com>, =?utf-8?q?Toke_H=C3=B8iland-?=
	=?utf-8?q?J=C3=B8rgensen?= <toke@redhat.com>,
 Magnus Karlsson <magnus.karlsson@intel.com>,
 Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
 Przemek Kitszel <przemyslaw.kitszel@intel.com>,
 Jason Baron <jbaron@akamai.com>, Casey Schaufler <casey@schaufler-ca.com>,
 Nathan Chancellor <nathan@kernel.org>,
 nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org,
 netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH net-next 12/12] unroll: add generic loop unroll helpers
Date: Wed, 11 Dec 2024 18:26:49 +0100
Message-ID: <20241211172649.761483-13-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241211172649.761483-1-aleksander.lobakin@intel.com>
References: <20241211172649.761483-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: kuba@kernel.org

There are cases when we need to explicitly unroll loops. For example,
cache operations, filling DMA descriptors on very high speeds etc.
Add compiler-specific attribute macros to give the compiler a hint
that we'd like to unroll a loop.
Example usage:

 #define UNROLL_BATCH 8

	unrolled_count(UNROLL_BATCH)
	for (u32 i = 0; i < UNROLL_BATCH; i++)
		op(priv, i);

Note that sometimes the compilers won't unroll loops if they think this
would have worse optimization and perf than without unrolling, and that
unroll attributes are available only starting GCC 8. For older compiler
versions, no hints/attributes will be applied.
For better unrolling/parallelization, don't have any variables that
interfere between iterations except for the iterator itself.

Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/unroll.h | 44 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/linux/unroll.h b/include/linux/unroll.h
index d42fd6366373..69b6ea74d7c1 100644
--- a/include/linux/unroll.h
+++ b/include/linux/unroll.h
@@ -9,6 +9,50 @@
 
 #include <linux/args.h>
 
+#ifdef CONFIG_CC_IS_CLANG
+#define __pick_unrolled(x, y)		_Pragma(#x)
+#elif CONFIG_GCC_VERSION >= 80000
+#define __pick_unrolled(x, y)		_Pragma(#y)
+#else
+#define __pick_unrolled(x, y)		/* not supported */
+#endif
+
+/**
+ * unrolled - loop attributes to ask the compiler to unroll it
+ *
+ * Usage:
+ *
+ * #define BATCH 8
+ *
+ *	unrolled_count(BATCH)
+ *	for (u32 i = 0; i < BATCH; i++)
+ *		// loop body without cross-iteration dependencies
+ *
+ * This is only a hint and the compiler is free to disable unrolling if it
+ * thinks the count is suboptimal and may hurt performance and/or hugely
+ * increase object code size.
+ * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends
+ * on what iter x will do with variables) is not a strict requirement, but
+ * provides best performance and object code size.
+ * Available only on Clang and GCC 8.x onwards.
+ */
+
+/* Ask the compiler to pick an optimal unroll count, Clang only */
+#define unrolled							    \
+	__pick_unrolled(clang loop unroll(enable), /* nothing */)
+
+/* Unroll each @n iterations of a loop */
+#define unrolled_count(n)						    \
+	__pick_unrolled(clang loop unroll_count(n), GCC unroll n)
+
+/* Unroll the whole loop */
+#define unrolled_full							    \
+	__pick_unrolled(clang loop unroll(full), GCC unroll 65534)
+
+/* Never unroll a loop */
+#define unrolled_none							    \
+	__pick_unrolled(clang loop unroll(disable), GCC unroll 1)
+
 #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args)
 
 #define __UNROLL_0(MACRO, args...)