From patchwork Thu Feb 6 18:26:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13963539 X-Patchwork-Delegate: kuba@kernel.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EB0A1A2381; Thu, 6 Feb 2025 18:30:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738866632; cv=none; b=ZFCh3RIFXhgxnEA83/DkGqJgTVkdgIkSnX5HRh9fnireOVVMHDshPd4Rg7E6Qc/6ULaog1kJG/JJIWZbTt8hi0bPNHEKaGdPpO+kZhkAIP7+/cYCEVO4ZZmWZkD+xLlH+VUE0mTcB9wh3jr8DV0mEGfHXWOA2Ys+kwPsk7cju3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738866632; c=relaxed/simple; bh=eT8n979ZpwPw5e/QeuHsD+kqpySQIoGeQuEIuBIocjs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cZ8veOieeBUXV5vOAnDZnCiSLbit6RBujKOdbKXKqPZs6dx8Nk7KnQIh/r23brg1uBF2m7XEdWD8lCYfW+K1uJXCfQLBYgJRfc9Jb0H5zPEbWWfMNqfasCPynRzeTIJyLiwwYVvCeynLSUrDGcDhiiR1qJFi3YO/c84zlUI5TFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Q075m2xd; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Q075m2xd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738866630; x=1770402630; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eT8n979ZpwPw5e/QeuHsD+kqpySQIoGeQuEIuBIocjs=; b=Q075m2xdCjkhWyh78xFUrV0jtlM3GaJ6dxiF+K95ti/vdebh8SXJmuX5 laLovz7ujm8DcunMD0QclXvPwzbhX10KPjbR3435FXWeB+j9/4OaGie+3 mpAE/DABOIMOf9Zaz9T/yqqfJdqycUPJDYNW1spG9EQ6D1vlo6qQbAY82 wXc4ziIuicNg1w1Q55sxp71QJw8qpwwL40GLoQPV7KVQ3jj3LKuixFkLp 0MH+I3PleDe5jcqF36QF0tm2Hj8yctaCuXF0np0NflYWqBLCG1Rs9geyK gUagnfx5VCjgGgv7RgRO1HEFam6RZskKafHp9gEgcn/gV3zhS6qzPoNwD Q==; X-CSE-ConnectionGUID: sUPoJAzSSeyW0ICul4P+zQ== X-CSE-MsgGUID: VuH3xO97QcqWyvE7/sFUSg== X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="49734441" X-IronPort-AV: E=Sophos;i="6.13,264,1732608000"; d="scan'208";a="49734441" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2025 10:30:30 -0800 X-CSE-ConnectionGUID: ijf33W/vTv2rmI7xLcDLWQ== X-CSE-MsgGUID: xCFex8EfQqeSQDK9gI+bsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,264,1732608000"; d="scan'208";a="111065875" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa009.jf.intel.com with ESMTP; 06 Feb 2025 10:30:25 -0800 From: Alexander Lobakin To: Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , "Jose E. Marchesi" , =?utf-8?q?Toke_H=C3=B8iland-?= =?utf-8?q?J=C3=B8rgensen?= , Magnus Karlsson , Maciej Fijalkowski , Przemek Kitszel , Jason Baron , Casey Schaufler , Nathan Chancellor , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 1/4] unroll: add generic loop unroll helpers Date: Thu, 6 Feb 2025 19:26:26 +0100 Message-ID: <20250206182630.3914318-2-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250206182630.3914318-1-aleksander.lobakin@intel.com> References: <20250206182630.3914318-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Add compiler-specific attribute macros to give the compiler a hint that we'd like to unroll a loop. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i = 0; i < UNROLL_BATCH; i++) op(priv, i); Note that sometimes the compilers won't unroll loops if they think this would have worse optimization and perf than without unrolling, and that unroll attributes are available only starting GCC 8. For older compiler versions, no hints/attributes will be applied. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi # pragmas Signed-off-by: Jose E. Marchesi Reviewed-by: Przemek Kitszel Signed-off-by: Alexander Lobakin --- include/linux/unroll.h | 44 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/include/linux/unroll.h b/include/linux/unroll.h index d42fd6366373..863fb69f6a7e 100644 --- a/include/linux/unroll.h +++ b/include/linux/unroll.h @@ -9,6 +9,50 @@ #include +#ifdef CONFIG_CC_IS_CLANG +#define __pick_unrolled(x, y) _Pragma(#x) +#elif CONFIG_GCC_VERSION >= 80000 +#define __pick_unrolled(x, y) _Pragma(#y) +#else +#define __pick_unrolled(x, y) /* not supported */ +#endif + +/** + * unrolled - loop attributes to ask the compiler to unroll it + * + * Usage: + * + * #define BATCH 8 + * + * unrolled_count(BATCH) + * for (u32 i = 0; i < BATCH; i++) + * // loop body without cross-iteration dependencies + * + * This is only a hint and the compiler is free to disable unrolling if it + * thinks the count is suboptimal and may hurt performance and/or hugely + * increase object code size. + * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends + * on what iter x will do with variables) is not a strict requirement, but + * provides best performance and object code size. + * Available only on Clang and GCC 8.x onwards. + */ + +/* Ask the compiler to pick an optimal unroll count, Clang only */ +#define unrolled \ + __pick_unrolled(clang loop unroll(enable), /* nothing */) + +/* Unroll each @n iterations of the loop */ +#define unrolled_count(n) \ + __pick_unrolled(clang loop unroll_count(n), GCC unroll n) + +/* Unroll the whole loop */ +#define unrolled_full \ + __pick_unrolled(clang loop unroll(full), GCC unroll 65534) + +/* Never unroll the loop */ +#define unrolled_none \ + __pick_unrolled(clang loop unroll(disable), GCC unroll 1) + #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args) #define __UNROLL_0(MACRO, args...)