From patchwork Sat Apr  1 15:17:55 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?b?T25kcmVqIE1vc27DocSNZWs=?=
 <omosnacek@gmail.com>
X-Patchwork-Id: 9658075
X-Patchwork-Delegate: herbert@gondor.apana.org.au
Return-Path: <linux-crypto-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	C24B560353 for <patchwork-linux-crypto@patchwork.kernel.org>;
	Sat,  1 Apr 2017 15:18:19 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBF39285F5
	for <patchwork-linux-crypto@patchwork.kernel.org>;
	Sat,  1 Apr 2017 15:18:19 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B09BB2860C; Sat,  1 Apr 2017 15:18:19 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45FD6285F5
	for <patchwork-linux-crypto@patchwork.kernel.org>;
	Sat,  1 Apr 2017 15:18:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751560AbdDAPSS (ORCPT
	<rfc822;patchwork-linux-crypto@patchwork.kernel.org>);
	Sat, 1 Apr 2017 11:18:18 -0400
Received: from mail-wr0-f195.google.com ([209.85.128.195]:35500 "EHLO
	mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751545AbdDAPSR (ORCPT
	<rfc822;linux-crypto@vger.kernel.org>);
	Sat, 1 Apr 2017 11:18:17 -0400
Received: by mail-wr0-f195.google.com with SMTP id p52so24252525wrc.2
	for <linux-crypto@vger.kernel.org>;
	Sat, 01 Apr 2017 08:18:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20161025;
	h=from:to:cc:subject:date:message-id;
	bh=r1W2bAlx9F4r17cQG+UMhKW0wNUOMe+2WZmjt9PLZqw=;
	b=hWSE7BYFc4yRqLanQi+3KkYGOc7Bsx+tGB3fqrNRbzxkUWq4W8axg4+lFlnAyo4MAM
	43kOyRN9OlimVQhNXhfwwIvT1184s6jUCLVffhXL3M4WnEG/HUDwn7Q3G2c63nkhyhrI
	yKQprw1yr9dmH0wopZqPnLRxu2Lx6ccCVoOKxXz4OIJmhiOPxNnGqIs4KeXL/S/s8IV8
	Q5D4CIutGnNZ8mR/2tLfAd/+p5/A9SBWfVnzxzTMFrxHhKaaDNvZU3L9fBUGbpAe7xiI
	To8PMR4n8VBAYzbTmFkZ+SPEyihzIJu+kRJNIfsiycHqS57OCyujP4YOtUuO0gnItfQm
	PwSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=r1W2bAlx9F4r17cQG+UMhKW0wNUOMe+2WZmjt9PLZqw=;
	b=c/V0/Fs+slD3+Mwt2dZucU1gpLMCed+AtFMJnNyJ2lxQuIPgC00wk6WmZgiZHhARND
	wVb/aKVuzlNw/V1rOLb8qJHr8SePX8duLka9zCOnsC07o4ymYvm1HBo/bXgnB1k4Cq5d
	9kFs/W+39rjHvOjsVjPgikEeRRYOdNCcPpGPd4D3wkUcirLR/wyOnvIMoTQhLqHcaDgc
	Tq6nc7+EHjNUttLk4dr8/V6vrMzFzQ/wJ7FV+hVSUsBu6RSHOrEvv45782R5484k+N50
	9bhBgtv+VpXREtXnK2aDnPCNShCMjdr4sQfcwtFyTnC85X+gVnwy0jl+N7wG4bGdX2c7
	0vRg==
X-Gm-Message-State: 
 AFeK/H2uVgtIiiZ27mwNXlC5/BGlgkyhGjvDbRWc6QO497SbSRbphNSbIJHGdRporeH/XQ==
X-Received: by 10.223.163.75 with SMTP id d11mr8335612wrb.127.1491059895939;
	Sat, 01 Apr 2017 08:18:15 -0700 (PDT)
Received: from localhost.localdomain (bband-dyn32.178-41-80.t-com.sk.
	[178.41.80.32]) by smtp.gmail.com with ESMTPSA id
	q72sm10864522wrb.54.2017.04.01.08.18.14
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Sat, 01 Apr 2017 08:18:15 -0700 (PDT)
From: Ondrej Mosnacek <omosnacek@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>,
	linux-crypto@vger.kernel.org, Jeffrey Walton <noloader@gmail.com>,
	Milan Broz <gmazyland@gmail.com>, Ondrej Mosnacek <omosnacek@gmail.com>,
	Eric Biggers <ebiggers@google.com>
Subject: [PATCH v4] crypto: gf128mul - define gf128mul_x_* in gf128mul.h
Date: Sat,  1 Apr 2017 17:17:55 +0200
Message-Id: <20170401151755.11875-1-omosnacek@gmail.com>
X-Mailer: git-send-email 2.9.3
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The gf128mul_x_ble function is currently defined in gf128mul.c, because
it depends on the gf128mul_table_be multiplication table.

However, since the function is very small and only uses two values from
the table, it is better for it to be defined as inline function in
gf128mul.h. That way, the function can be inlined by the compiler for
better performance.

For consistency, the other gf128mul_x_* functions are also moved to the
header file. In addition, the code is rewritten to be constant-time.

After this change, the speed of the generic 'xts(aes)' implementation
increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
and CRYPTO_AES_NI_INTEL disabled).

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
---
v3 -> v4: a faster version of gf128mul_x_lle
v2 -> v3: constant-time implementation
v1 -> v2: move all _x_ functions to the header, not just gf128mul_x_ble

 crypto/gf128mul.c         | 33 +---------------------------
 include/crypto/gf128mul.h | 55 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index 04facc0..dc01212 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -130,43 +130,12 @@ static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le);
 static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be);
 
 /*
- * The following functions multiply a field element by x or by x^8 in
+ * The following functions multiply a field element by x^8 in
  * the polynomial field representation.  They use 64-bit word operations
  * to gain speed but compensate for machine endianness and hence work
  * correctly on both styles of machine.
  */
 
-static void gf128mul_x_lle(be128 *r, const be128 *x)
-{
-	u64 a = be64_to_cpu(x->a);
-	u64 b = be64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_le[(b << 7) & 0xff];
-
-	r->b = cpu_to_be64((b >> 1) | (a << 63));
-	r->a = cpu_to_be64((a >> 1) ^ (_tt << 48));
-}
-
-static void gf128mul_x_bbe(be128 *r, const be128 *x)
-{
-	u64 a = be64_to_cpu(x->a);
-	u64 b = be64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_be[a >> 63];
-
-	r->a = cpu_to_be64((a << 1) | (b >> 63));
-	r->b = cpu_to_be64((b << 1) ^ _tt);
-}
-
-void gf128mul_x_ble(be128 *r, const be128 *x)
-{
-	u64 a = le64_to_cpu(x->a);
-	u64 b = le64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_be[b >> 63];
-
-	r->a = cpu_to_le64((a << 1) ^ _tt);
-	r->b = cpu_to_le64((b << 1) | (a >> 63));
-}
-EXPORT_SYMBOL(gf128mul_x_ble);
-
 static void gf128mul_x8_lle(be128 *x)
 {
 	u64 a = be64_to_cpu(x->a);
diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
index 0bc9b5f..35ced9d 100644
--- a/include/crypto/gf128mul.h
+++ b/include/crypto/gf128mul.h
@@ -49,6 +49,7 @@
 #ifndef _CRYPTO_GF128MUL_H
 #define _CRYPTO_GF128MUL_H
 
+#include <asm/byteorder.h>
 #include <crypto/b128ops.h>
 #include <linux/slab.h>
 
@@ -163,8 +164,58 @@ void gf128mul_lle(be128 *a, const be128 *b);
 
 void gf128mul_bbe(be128 *a, const be128 *b);
 
-/* multiply by x in ble format, needed by XTS */
-void gf128mul_x_ble(be128 *a, const be128 *b);
+/*
+ * The following functions multiply a field element by x in
+ * the polynomial field representation.  They use 64-bit word operations
+ * to gain speed but compensate for machine endianness and hence work
+ * correctly on both styles of machine.
+ *
+ * They are defined here for performance.
+ */
+
+static inline u64 gf128mul_mask_from_bit(u64 x, int which)
+{
+	/* a constant-time version of 'x & ((u64)1 << which) ? (u64)-1 : 0' */
+	return ((s64)(x << (63 - which)) >> 63);
+}
+
+static inline void gf128mul_x_lle(be128 *r, const be128 *x)
+{
+	u64 a = be64_to_cpu(x->a);
+	u64 b = be64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_le[(b << 7) & 0xff] << 48
+	 * (see crypto/gf128mul.c): */
+	u64 _tt = gf128mul_mask_from_bit(b, 0) & ((u64)0xe1 << 56);
+
+	r->b = cpu_to_be64((b >> 1) | (a << 63));
+	r->a = cpu_to_be64((a >> 1) ^ _tt);
+}
+
+static inline void gf128mul_x_bbe(be128 *r, const be128 *x)
+{
+	u64 a = be64_to_cpu(x->a);
+	u64 b = be64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_be[a >> 63] (see crypto/gf128mul.c): */
+	u64 _tt = gf128mul_mask_from_bit(a, 63) & 0x87;
+
+	r->a = cpu_to_be64((a << 1) | (b >> 63));
+	r->b = cpu_to_be64((b << 1) ^ _tt);
+}
+
+/* needed by XTS */
+static inline void gf128mul_x_ble(be128 *r, const be128 *x)
+{
+	u64 a = le64_to_cpu(x->a);
+	u64 b = le64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_be[b >> 63] (see crypto/gf128mul.c): */
+	u64 _tt = gf128mul_mask_from_bit(b, 63) & 0x87;
+
+	r->a = cpu_to_le64((a << 1) ^ _tt);
+	r->b = cpu_to_le64((b << 1) | (a >> 63));
+}
 
 /* 4k table optimization */