From patchwork Thu Mar 30 22:04:42 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?b?T25kcmVqIE1vc27DocSNZWs=?=
 <omosnacek@gmail.com>
X-Patchwork-Id: 9655177
X-Patchwork-Delegate: herbert@gondor.apana.org.au
Return-Path: <linux-crypto-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	4E4C960113 for <patchwork-linux-crypto@patchwork.kernel.org>;
	Thu, 30 Mar 2017 22:08:01 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F8A8285EC
	for <patchwork-linux-crypto@patchwork.kernel.org>;
	Thu, 30 Mar 2017 22:08:01 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 3405E285F0; Thu, 30 Mar 2017 22:08:01 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B278E285EC
	for <patchwork-linux-crypto@patchwork.kernel.org>;
	Thu, 30 Mar 2017 22:08:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934391AbdC3WHx (ORCPT
	<rfc822;patchwork-linux-crypto@patchwork.kernel.org>);
	Thu, 30 Mar 2017 18:07:53 -0400
Received: from mail-wr0-f193.google.com ([209.85.128.193]:36824 "EHLO
	mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750903AbdC3WHw (ORCPT
	<rfc822;linux-crypto@vger.kernel.org>);
	Thu, 30 Mar 2017 18:07:52 -0400
Received: by mail-wr0-f193.google.com with SMTP id k6so14490050wre.3
	for <linux-crypto@vger.kernel.org>;
	Thu, 30 Mar 2017 15:07:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20161025;
	h=from:to:cc:subject:date:message-id;
	bh=AbKl2odrnN9zcjXF4QSIOrKVhY40tCM5lsHm8uikBS4=;
	b=ICZnmsjnqE5bpXJFSg/p6/lVoRG+ufk6RNuMm6suMfnj4Ypc0108mStPaPhHiMV5W2
	ySonDyKuwrk+O9/gbzQgyDnE4Pd95EJ4GCdNnb58SIrQRf8CQGsVhHWR9s65j2FlpFoR
	EcH2OOHZrXoQLcqTJ5sY2YOO2+EBS/RFflKJFwhVJVRVreuO/PW9zf6EJI1a9e+kjP3v
	F/XJ1/xvYBm3u99mDcsDxeCXp0twIFt7qZ856OXxaA3TcOtul+4Nwj7Rur8w72aIN5oJ
	gugszVqir10V5cR20hgz8hPqYBiWkmW2Bah6Obta6Oec6WBFKKLuHZJ7foA8p+15KM3G
	/69A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=AbKl2odrnN9zcjXF4QSIOrKVhY40tCM5lsHm8uikBS4=;
	b=SPw3K+CuhSM8u89F7VCwneRWz2+wd6kUn8Sl2VxQrbFuBiJmmrZomvTR5pw7QuCbwR
	9YXh4DwWeOGIbPKXo7QhRZjoWDUpGYgtFtTexsn+9eFosbu42P+lQeKNH1ycTSUnTV9p
	Q+FuIioKciZn8ExzbPxBc056vdTC87TWV2ok43/tD9GOgn9pS/B97OE6kY2BZ5r7ftlJ
	J8QGI/5C0IPnixMr6Z//dcXc/ig+DO6/weIRZjkvBF/4sk5cQ94KKcleslZXoF0XR1T4
	gnLwEnckgvMVnsWr1dTV9L3/tSVsjeOIFZ39IK7FI4az6M9U80xBPvjRtxfAbeh/Ahuh
	+sYw==
X-Gm-Message-State: 
 AFeK/H3GCCGW5+He6obYh/R4xwlYImMF7QcOMiWLMWqCaDTMmuSIxd1D/Gy1d4RhqTmVCA==
X-Received: by 10.223.154.11 with SMTP id z11mr1659498wrb.76.1490911670057;
	Thu, 30 Mar 2017 15:07:50 -0700 (PDT)
Received: from localhost.localdomain (bband-dyn32.178-41-80.t-com.sk.
	[178.41.80.32]) by smtp.gmail.com with ESMTPSA id
	y190sm448379wmy.15.2017.03.30.15.07.48
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 30 Mar 2017 15:07:48 -0700 (PDT)
From: Ondrej Mosnacek <omosnacek@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>,
	linux-crypto@vger.kernel.org, Milan Broz <gmazyland@gmail.com>,
	Ondrej Mosnacek <omosnacek@gmail.com>, Eric Biggers <ebiggers@google.com>
Subject: [PATCH v2] crypto: gf128mul - define gf128mul_x_* in gf128mul.h
Date: Fri, 31 Mar 2017 00:04:42 +0200
Message-Id: <20170330220442.11012-1-omosnacek@gmail.com>
X-Mailer: git-send-email 2.9.3
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The gf128mul_x_ble function is currently defined in gf128mul.c, because
it depends on the gf128mul_table_be multiplication table.

However, since the function is very small and only uses two values from
the table, it is better for it to be defined as inline function in
gf128mul.h. That way, the function can be inlined by the compiler for
better performance.

For consistency, the other gf128mul_x_* functions are also moved to the
header file.

After this change, the speed of the generic 'xts(aes)' implementation
increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
and CRYPTO_AES_NI_INTEL disabled).

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
---
 crypto/gf128mul.c         | 33 +------------------------------
 include/crypto/gf128mul.h | 49 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index 04facc0..dc01212 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -130,43 +130,12 @@ static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le);
 static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be);
 
 /*
- * The following functions multiply a field element by x or by x^8 in
+ * The following functions multiply a field element by x^8 in
  * the polynomial field representation.  They use 64-bit word operations
  * to gain speed but compensate for machine endianness and hence work
  * correctly on both styles of machine.
  */
 
-static void gf128mul_x_lle(be128 *r, const be128 *x)
-{
-	u64 a = be64_to_cpu(x->a);
-	u64 b = be64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_le[(b << 7) & 0xff];
-
-	r->b = cpu_to_be64((b >> 1) | (a << 63));
-	r->a = cpu_to_be64((a >> 1) ^ (_tt << 48));
-}
-
-static void gf128mul_x_bbe(be128 *r, const be128 *x)
-{
-	u64 a = be64_to_cpu(x->a);
-	u64 b = be64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_be[a >> 63];
-
-	r->a = cpu_to_be64((a << 1) | (b >> 63));
-	r->b = cpu_to_be64((b << 1) ^ _tt);
-}
-
-void gf128mul_x_ble(be128 *r, const be128 *x)
-{
-	u64 a = le64_to_cpu(x->a);
-	u64 b = le64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_be[b >> 63];
-
-	r->a = cpu_to_le64((a << 1) ^ _tt);
-	r->b = cpu_to_le64((b << 1) | (a >> 63));
-}
-EXPORT_SYMBOL(gf128mul_x_ble);
-
 static void gf128mul_x8_lle(be128 *x)
 {
 	u64 a = be64_to_cpu(x->a);
diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
index 0bc9b5f..2a24553 100644
--- a/include/crypto/gf128mul.h
+++ b/include/crypto/gf128mul.h
@@ -49,6 +49,7 @@
 #ifndef _CRYPTO_GF128MUL_H
 #define _CRYPTO_GF128MUL_H
 
+#include <asm/byteorder.h>
 #include <crypto/b128ops.h>
 #include <linux/slab.h>
 
@@ -163,8 +164,52 @@ void gf128mul_lle(be128 *a, const be128 *b);
 
 void gf128mul_bbe(be128 *a, const be128 *b);
 
-/* multiply by x in ble format, needed by XTS */
-void gf128mul_x_ble(be128 *a, const be128 *b);
+/*
+ * The following functions multiply a field element by x in
+ * the polynomial field representation.  They use 64-bit word operations
+ * to gain speed but compensate for machine endianness and hence work
+ * correctly on both styles of machine.
+ *
+ * They are defined here for performance.
+ */
+
+static inline void gf128mul_x_lle(be128 *r, const be128 *x)
+{
+	u64 a = be64_to_cpu(x->a);
+	u64 b = be64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_le[(b << 7) & 0xff] >> 8
+	 * (see crypto/gf128mul.c): */
+	u64 _tt = (b & (u64)1) ? 0xe1 : 0x00;
+
+	r->b = cpu_to_be64((b >> 1) | (a << 63));
+	r->a = cpu_to_be64((a >> 1) ^ (_tt << 56));
+}
+
+static inline void gf128mul_x_bbe(be128 *r, const be128 *x)
+{
+	u64 a = be64_to_cpu(x->a);
+	u64 b = be64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_be[a >> 63] (see crypto/gf128mul.c): */
+	u64 _tt = (a & ((u64)1 << 63)) ? 0x87 : 0x00;
+
+	r->a = cpu_to_be64((a << 1) | (b >> 63));
+	r->b = cpu_to_be64((b << 1) ^ _tt);
+}
+
+/* needed by XTS */
+static inline void gf128mul_x_ble(be128 *r, const be128 *x)
+{
+	u64 a = le64_to_cpu(x->a);
+	u64 b = le64_to_cpu(x->b);
+
+	/* equivalent to gf128mul_table_be[b >> 63] (see crypto/gf128mul.c): */
+	u64 _tt = (b & ((u64)1 << 63)) ? 0x87 : 0x00;
+
+	r->a = cpu_to_le64((a << 1) ^ _tt);
+	r->b = cpu_to_le64((b << 1) | (a >> 63));
+}
 
 /* 4k table optimization */