From patchwork Fri Mar 31 09:27:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?T25kcmVqIE1vc27DocSNZWs=?= X-Patchwork-Id: 9655781 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 76F2B60351 for ; Fri, 31 Mar 2017 09:28:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63D372861B for ; Fri, 31 Mar 2017 09:28:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54BC328629; Fri, 31 Mar 2017 09:28:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD4182861B for ; Fri, 31 Mar 2017 09:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932648AbdCaJ2H (ORCPT ); Fri, 31 Mar 2017 05:28:07 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:33222 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932554AbdCaJ2G (ORCPT ); Fri, 31 Mar 2017 05:28:06 -0400 Received: by mail-wr0-f195.google.com with SMTP id u18so18935502wrc.0 for ; Fri, 31 Mar 2017 02:28:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=tPVvSVoG+LS3nIXIXgwHTaTJVtEb8FG5rTjvp5VhMNw=; b=fIq3PsAr/r+8LmiWN1BpKFXpieREsCj1xwgjETJA0cqZBkHTQdesKk6kHF1Rx3q7xX EEx+E+dy0rD8Rs4n0aHCQ7d0QyLKOKrMFP+G3vykfyhY2U6rxyyMOjbl4ZolFOOesuQn jCYssh5VSN5acZ8SXmT59bixwtgZ7tiKcZhWEW/dlxf/SzhyLr6iGDsyRRvrNlk/0+m2 ctpUs1X1OVJrqkY4dGQ7UpsVfWUZ6QDObeleT4FIIs5CeBMUh5B0o8GIUzgaeoeyqHLz 3TFO3XNjvHWdFSH4HvxzGLPdJbIBLbeWSvbfVmWgksWkyEr8E42da0YJtCIHTfVYUMX/ 03kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=tPVvSVoG+LS3nIXIXgwHTaTJVtEb8FG5rTjvp5VhMNw=; b=j1JgRSOzwaLIpsV6f+wm55qNgMmFmzcCcciQR1N/NL4qOM3uCKU9u4IRZrxDryYHlr eYf2Ve5WhOSCjDD7dWdp48y1QxQ8PZG6To0aMdeceBbmECqyWZW1Du/DqDc3XYj/wUmY GLtHMjIFDIA/QpMf7sDFGcNK2rj1NwRZ1Ssd2YjMGdNxrEJ60hGFfIHJvaEB9Wa5Zbfa yR93R4RDEooynu2BrmVGoXtEBqBPfaqlaRiciNbCUqvoQc4mvTD0iZn7S45FeJpaOpio UbadF4H8DnXJvy/MNNh5feEvmmMYf7OzuKN8DzwSCIsIFMq+cWBLyvUUsT86hQZKUPui Ypsg== X-Gm-Message-State: AFeK/H1rZLmWky/KBfPb28qSchgGaq6AP6fNxLZ6Qz0UiU9pjvAkPKHzuTIhP2LG4cd/FQ== X-Received: by 10.223.160.27 with SMTP id k27mr1882034wrk.106.1490952483934; Fri, 31 Mar 2017 02:28:03 -0700 (PDT) Received: from localhost.localdomain (bband-dyn32.178-41-80.t-com.sk. [178.41.80.32]) by smtp.gmail.com with ESMTPSA id g23sm2174566wme.8.2017.03.31.02.28.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 31 Mar 2017 02:28:02 -0700 (PDT) From: Ondrej Mosnacek To: Herbert Xu Cc: "David S. Miller" , linux-crypto@vger.kernel.org, Jeffrey Walton , Milan Broz , Ondrej Mosnacek , Eric Biggers Subject: [PATCH v3] crypto: gf128mul - define gf128mul_x_* in gf128mul.h Date: Fri, 31 Mar 2017 11:27:03 +0200 Message-Id: <20170331092703.2520-1-omosnacek@gmail.com> X-Mailer: git-send-email 2.9.3 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The gf128mul_x_ble function is currently defined in gf128mul.c, because it depends on the gf128mul_table_be multiplication table. However, since the function is very small and only uses two values from the table, it is better for it to be defined as inline function in gf128mul.h. That way, the function can be inlined by the compiler for better performance. For consistency, the other gf128mul_x_* functions are also moved to the header file. In addition, the code is rewritten to be constant-time. After this change, the speed of the generic 'xts(aes)' implementation increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64 and CRYPTO_AES_NI_INTEL disabled). Signed-off-by: Ondrej Mosnacek Cc: Eric Biggers Reviewed-by: Eric Biggers --- v2 -> v3: constant-time implementation v1 -> v2: move all _x_ functions to the header, not just gf128mul_x_ble crypto/gf128mul.c | 33 +--------------------------- include/crypto/gf128mul.h | 55 +++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 54 insertions(+), 34 deletions(-) diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c index 04facc0..dc01212 100644 --- a/crypto/gf128mul.c +++ b/crypto/gf128mul.c @@ -130,43 +130,12 @@ static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le); static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be); /* - * The following functions multiply a field element by x or by x^8 in + * The following functions multiply a field element by x^8 in * the polynomial field representation. They use 64-bit word operations * to gain speed but compensate for machine endianness and hence work * correctly on both styles of machine. */ -static void gf128mul_x_lle(be128 *r, const be128 *x) -{ - u64 a = be64_to_cpu(x->a); - u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_le[(b << 7) & 0xff]; - - r->b = cpu_to_be64((b >> 1) | (a << 63)); - r->a = cpu_to_be64((a >> 1) ^ (_tt << 48)); -} - -static void gf128mul_x_bbe(be128 *r, const be128 *x) -{ - u64 a = be64_to_cpu(x->a); - u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_be[a >> 63]; - - r->a = cpu_to_be64((a << 1) | (b >> 63)); - r->b = cpu_to_be64((b << 1) ^ _tt); -} - -void gf128mul_x_ble(be128 *r, const be128 *x) -{ - u64 a = le64_to_cpu(x->a); - u64 b = le64_to_cpu(x->b); - u64 _tt = gf128mul_table_be[b >> 63]; - - r->a = cpu_to_le64((a << 1) ^ _tt); - r->b = cpu_to_le64((b << 1) | (a >> 63)); -} -EXPORT_SYMBOL(gf128mul_x_ble); - static void gf128mul_x8_lle(be128 *x) { u64 a = be64_to_cpu(x->a); diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h index 0bc9b5f..6e43be5 100644 --- a/include/crypto/gf128mul.h +++ b/include/crypto/gf128mul.h @@ -49,6 +49,7 @@ #ifndef _CRYPTO_GF128MUL_H #define _CRYPTO_GF128MUL_H +#include #include #include @@ -163,8 +164,58 @@ void gf128mul_lle(be128 *a, const be128 *b); void gf128mul_bbe(be128 *a, const be128 *b); -/* multiply by x in ble format, needed by XTS */ -void gf128mul_x_ble(be128 *a, const be128 *b); +/* + * The following functions multiply a field element by x in + * the polynomial field representation. They use 64-bit word operations + * to gain speed but compensate for machine endianness and hence work + * correctly on both styles of machine. + * + * They are defined here for performance. + */ + +static inline u64 gf128mul_mask_from_bit(u64 x, int which) +{ + /* a constant-time version of 'x & ((u64)1 << which) ? (u64)-1 : 0' */ + return ((s64)(x << (63 - which)) >> 63); +} + +static inline void gf128mul_x_lle(be128 *r, const be128 *x) +{ + u64 a = be64_to_cpu(x->a); + u64 b = be64_to_cpu(x->b); + + /* equivalent to gf128mul_table_le[(b << 7) & 0xff] >> 8 + * (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(b, 0) & 0xe1; + + r->b = cpu_to_be64((b >> 1) | (a << 63)); + r->a = cpu_to_be64((a >> 1) ^ (_tt << 56)); +} + +static inline void gf128mul_x_bbe(be128 *r, const be128 *x) +{ + u64 a = be64_to_cpu(x->a); + u64 b = be64_to_cpu(x->b); + + /* equivalent to gf128mul_table_be[a >> 63] (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(a, 63) & 0x87; + + r->a = cpu_to_be64((a << 1) | (b >> 63)); + r->b = cpu_to_be64((b << 1) ^ _tt); +} + +/* needed by XTS */ +static inline void gf128mul_x_ble(be128 *r, const be128 *x) +{ + u64 a = le64_to_cpu(x->a); + u64 b = le64_to_cpu(x->b); + + /* equivalent to gf128mul_table_be[b >> 63] (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(b, 63) & 0x87; + + r->a = cpu_to_le64((a << 1) ^ _tt); + r->b = cpu_to_le64((b << 1) | (a >> 63)); +} /* 4k table optimization */