From patchwork Tue Oct 20 20:39:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847871 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D7CAC4363A for ; Tue, 20 Oct 2020 20:40:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E650522256 for ; Tue, 20 Oct 2020 20:40:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438559AbgJTUkC (ORCPT ); Tue, 20 Oct 2020 16:40:02 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:41472 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438557AbgJTUkC (ORCPT ); Tue, 20 Oct 2020 16:40:02 -0400 Received: by mail-qk1-f194.google.com with SMTP id b69so3100863qkg.8; Tue, 20 Oct 2020 13:40:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oRMf8eK1KvMjfAflD4Pf4h9XYH++ind//dYyDMVAMdw=; b=W6pPbOPEUGg0pRmE2oaHDqY3VKncBAopnMkqOSMAwa4xqUVtTNue08J069+J3vY6Xb WM2l0pAjdGuXsKOZfHWvvBPw5KCrx1u1UNX8d+46PVGDsPbrV4jYIaO0QqMQOHpp4jKe oPEB/Nz74vzTqocNseVkwxPSQNcio2fiSVxh/ImWwIduQQs0lIIJIugjFN1Eq0sAA/Wx ez+/JwVyn2iXips0SgEBVj5qZRq2UeA9mxNG7vKVrk0hBWtPi0iZ84v22iioTb0Aiv4N /m/jvS5m7nOLZTiwCuKR+DTlQJAEjcJJ0DeEi5ShST5iU3NpBj0j+i4YnJmW0eOLJ4sw Z4JA== X-Gm-Message-State: AOAM533LaVLz41kBVkhnbXhT4AgT1hZ2gWrEh0nJfqBneF5eKlJeGfg8 iMlrd8+55h6ubaGAVTL8vFc= X-Google-Smtp-Source: ABdhPJz4tvqVqHLDMnlnleuvqD40m9l2fUpJCovASQvKnNfxaQXFPeJtjG24MU2UqPAT2ANEid6shA== X-Received: by 2002:a05:620a:986:: with SMTP id x6mr72820qkx.434.1603226400783; Tue, 20 Oct 2020 13:40:00 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.39.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:39:59 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state Date: Tue, 20 Oct 2020 16:39:52 -0400 Message-Id: <20201020203957.3512851-2-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the function can get inlined into sha256(), in which case the memset is optimized away. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- include/crypto/sha1_base.h | 3 ++- include/crypto/sha256_base.h | 3 ++- include/crypto/sha512_base.h | 3 ++- include/crypto/sm3_base.h | 3 ++- lib/crypto/sha256.c | 2 +- 5 files changed, 9 insertions(+), 5 deletions(-) diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h index 20fd1f7468af..a5d6033efef7 100644 --- a/include/crypto/sha1_base.h +++ b/include/crypto/sha1_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -101,7 +102,7 @@ static inline int sha1_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha1_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h index 6ded110783ae..93f9fd21cc06 100644 --- a/include/crypto/sha256_base.h +++ b/include/crypto/sha256_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -105,7 +106,7 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32)) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha256_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h index fb19c77494dc..93ab73baa38e 100644 --- a/include/crypto/sha512_base.h +++ b/include/crypto/sha512_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -126,7 +127,7 @@ static inline int sha512_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64)) put_unaligned_be64(sctx->state[i], digest++); - *sctx = (struct sha512_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sm3_base.h b/include/crypto/sm3_base.h index 1cbf9aa1fe52..2f3a32ab97bb 100644 --- a/include/crypto/sm3_base.h +++ b/include/crypto/sm3_base.h @@ -13,6 +13,7 @@ #include #include #include +#include #include typedef void (sm3_block_fn)(struct sm3_state *sst, u8 const *src, int blocks); @@ -104,7 +105,7 @@ static inline int sm3_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SM3_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sm3_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 2321f6cb322f..d43bc39ab05e 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -265,7 +265,7 @@ static void __sha256_final(struct sha256_state *sctx, u8 *out, int digest_words) put_unaligned_be32(sctx->state[i], &dst[i]); /* Zeroize sensitive information. */ - memset(sctx, 0, sizeof(*sctx)); + memzero_explicit(sctx, sizeof(*sctx)); } void sha256_final(struct sha256_state *sctx, u8 *out) From patchwork Tue Oct 20 20:39:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847869 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5366BC4363A for ; Tue, 20 Oct 2020 20:40:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0117B2224A for ; Tue, 20 Oct 2020 20:40:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438580AbgJTUkF (ORCPT ); Tue, 20 Oct 2020 16:40:05 -0400 Received: from mail-qv1-f67.google.com ([209.85.219.67]:43333 "EHLO mail-qv1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438558AbgJTUkE (ORCPT ); Tue, 20 Oct 2020 16:40:04 -0400 Received: by mail-qv1-f67.google.com with SMTP id bl9so1607851qvb.10; Tue, 20 Oct 2020 13:40:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9+fYvZA6OGxCjC49pxceAMWASEc8BRotkYG4ADhk4ho=; b=JWLvaS8wCjKWexcPc3BoI4sBdi39bUd8uh5pVd1l//M3aILYIbzthEMxWRZUZL4fXK 3/z4gbycqaVuCIlfDJLS5uWoBxQkx1BbqfXiMqoYxFdRskVJBLBaNQZMz+wLx+amwu8+ 8im9MTThU+STMa6RtJJFtuoN9shhB1zpuSNyw5qT8rbzxwjj6NYkEea84/zsqCGO4GaU wI+BmjLazwGahgYs89fUNXrTerz2+8qEmBYXJk8ct2zLqGjh9w7r1JyM4iMTgSbltbpx LlkD870psX3MWg0hMjZz+soyytjjKnMLBnNknq0wz1ZOd7ygcWuEiXPi33ugdFQP1Yde hsFQ== X-Gm-Message-State: AOAM533Q+f5DtHLodhN5x4lWNa+dfzegqrw9sPi5yIAthpG/Y7o6liRG Nnbai84dpdHZav59ioyGWuA= X-Google-Smtp-Source: ABdhPJzfLl4jXSjc8Hggbj0SmPNpo+dqw+5034gdZpKS9UNyuPzF4ug4+ejto4Uo70CiFe9OknsVMQ== X-Received: by 2002:a05:6214:125:: with SMTP id w5mr5110433qvs.3.1603226401734; Tue, 20 Oct 2020 13:40:01 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.40.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:40:01 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables Date: Tue, 20 Oct 2020 16:39:53 -0400 Message-Id: <20201020203957.3512851-3-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The assignments to clear a through h and t1/t2 are optimized out by the compiler because they are unused after the assignments. These variables shouldn't be very sensitive: t1/t2 can be calculated from a through h, so they don't reveal any additional information. Knowing a through h is equivalent to knowing one 64-byte block's SHA256 hash (with non-standard initial value) which, assuming SHA256 is secure, doesn't reveal any information about the input. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index d43bc39ab05e..099cd11f83c1 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input) state[4] += e; state[5] += f; state[6] += g; state[7] += h; /* clear any sensitive info... */ - a = b = c = d = e = f = g = h = t1 = t2 = 0; memzero_explicit(W, 64 * sizeof(u32)); } From patchwork Tue Oct 20 20:39:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847881 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63C8DC4363A for ; Tue, 20 Oct 2020 20:40:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 197062224A for ; Tue, 20 Oct 2020 20:40:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438583AbgJTUkF (ORCPT ); Tue, 20 Oct 2020 16:40:05 -0400 Received: from mail-qv1-f66.google.com ([209.85.219.66]:37877 "EHLO mail-qv1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438579AbgJTUkE (ORCPT ); Tue, 20 Oct 2020 16:40:04 -0400 Received: by mail-qv1-f66.google.com with SMTP id t6so1621570qvz.4; Tue, 20 Oct 2020 13:40:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aC0csafA10XEKxluLw+TGr+3B4XctKQH20XSmRK1UrE=; b=l+N9kGRbHb5p1FzQmgPSf84R+mww8Tb+Xfig/BDNCb7hSiqh8jJeIoWu0A9oCH9a3x iWTmJ5cWMYDTUGArKprzRpZ7evoUPbpEtUcl3F3KvoJ6bFuVecHbj7ZaZ5Xx8LwkOdlQ ogHVLrfyJQo6jFz7G/3XZK2SUPVlIydhc+vX0cmtL5nqefHTLf4BBgv/nH5GMnp8MQwA eyC8m5z5nXYQWNMqH+G51SwcqMZQqUXRLaAR1um7XaPdAdCRFZ8LkZ7ARAxNuO1qFLkO FNG9vIWzJZWLyS+IMpcYNNximhKUduXHcjSW9RQgFwYU/v+7K3DU9Uv0pUPJ3EUcxPaZ AM0Q== X-Gm-Message-State: AOAM533Q5AO/3BrBHImD81pjvNjo2s0LXmy6cSeph81a+OpbVaUSZmxH 5f+Kw7KQNuUIMoWnC3nWlbM= X-Google-Smtp-Source: ABdhPJyOA39VOZwj2F+W2KoaKoywOL3Eaq+ZXvEAADKHxqRXeiimO49WrxbVPnzs+e+tjKdDhGXv+A== X-Received: by 2002:a0c:f102:: with SMTP id i2mr5264617qvl.29.1603226403277; Tue, 20 Oct 2020 13:40:03 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.40.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:40:02 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Date: Tue, 20 Oct 2020 16:39:54 -0400 Message-Id: <20201020203957.3512851-4-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The temporary W[] array is currently zeroed out once every call to sha256_transform(), i.e. once every 64 bytes of input data. Moving it to sha256_update() instead so that it is cleared only once per update can save about 2-3% of the total time taken to compute the digest, with a reasonable memset() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 099cd11f83c1..c6bfeacc5b81 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -43,10 +43,9 @@ static inline void BLEND_OP(int I, u32 *W) W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; } -static void sha256_transform(u32 *state, const u8 *input) +static void sha256_transform(u32 *state, const u8 *input, u32 *W) { u32 a, b, c, d, e, f, g, h, t1, t2; - u32 W[64]; int i; /* load the input */ @@ -200,15 +199,13 @@ static void sha256_transform(u32 *state, const u8 *input) state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; state[5] += f; state[6] += g; state[7] += h; - - /* clear any sensitive info... */ - memzero_explicit(W, 64 * sizeof(u32)); } void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) { unsigned int partial, done; const u8 *src; + u32 W[64]; partial = sctx->count & 0x3f; sctx->count += len; @@ -223,11 +220,13 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) } do { - sha256_transform(sctx->state, src); + sha256_transform(sctx->state, src, W); done += 64; src = data + done; } while (done + 63 < len); + memzero_explicit(W, sizeof(W)); + partial = 0; } memcpy(sctx->buf + partial, src, len - done); From patchwork Tue Oct 20 20:39:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847875 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E66EC41604 for ; Tue, 20 Oct 2020 20:40:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1083522283 for ; Tue, 20 Oct 2020 20:40:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438602AbgJTUkI (ORCPT ); Tue, 20 Oct 2020 16:40:08 -0400 Received: from mail-qv1-f68.google.com ([209.85.219.68]:34880 "EHLO mail-qv1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438581AbgJTUkG (ORCPT ); Tue, 20 Oct 2020 16:40:06 -0400 Received: by mail-qv1-f68.google.com with SMTP id cv1so1625255qvb.2; Tue, 20 Oct 2020 13:40:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z4iHrcJ2rc/d4t8J47bnne3HGhnZovnOfKdRyGwsHeY=; b=e9inG+h6dm2MeCm9d3cQw6B69k/xFgQlhxrJIf9Wv7sUK554tYojv94bbLw0DfxNCM 8cAnMpUjPktXADX91wZ4X1BYfiJ/FmhiDY7X2AfJ8aHEktkJTqmzl2jLk84wZNuffnp7 lj1HJzFEsCe7a2QQ3YccBnbvwvKGokz/yz71DwAn4NF9bfCfe5iuN7h17H83RjtJGr/0 XsYbNiiuJRuBCa+kqgfEN3ac78qAs/+/riiwaRxgHf5P2b4ondBiZO7OCt0IGKPhHYGS v6SxrwdRzmTAQvvThIKxM1M2RNnzu1YoKLkS1Hn68rfhr9AY5FyMSER0Ps2SfRfXAKyK D7MA== X-Gm-Message-State: AOAM532kzFPOLGuGlDcsg4dUnEknf1kX0Sj5TepYq3bQlwUjGZg/ZJbc pr7D1emoG0wC3J5Efg6YVPQ= X-Google-Smtp-Source: ABdhPJyPjwvvP07MCczUqR+yKnlaZwC2q2s9w1FlUFPIL6oCrV/ALWC1vfJNUSl0r50rdrFVgUnaCg== X-Received: by 2002:a05:6214:a94:: with SMTP id ev20mr5343251qvb.4.1603226404443; Tue, 20 Oct 2020 13:40:04 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.40.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:40:03 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Date: Tue, 20 Oct 2020 16:39:55 -0400 Message-Id: <20201020203957.3512851-5-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll (tesed on Broadwell Xeon). Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 166 ++++++++------------------------------------ 1 file changed, 30 insertions(+), 136 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index c6bfeacc5b81..5efd390706c6 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -18,6 +18,17 @@ #include #include +static const u32 SHA256_K[] = { + 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, + 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, + 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, + 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, + 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, + 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, + 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, + 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2, +}; + static inline u32 Ch(u32 x, u32 y, u32 z) { return z ^ (x & (y ^ z)); @@ -43,9 +54,17 @@ static inline void BLEND_OP(int I, u32 *W) W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; } +#define SHA256_ROUND(i, a, b, c, d, e, f, g, h) do { \ + u32 t1, t2; \ + t1 = h + e1(e) + Ch(e, f, g) + SHA256_K[i] + W[i]; \ + t2 = e0(a) + Maj(a, b, c); \ + d += t1; \ + h = t1 + t2; \ +} while (0) + static void sha256_transform(u32 *state, const u8 *input, u32 *W) { - u32 a, b, c, d, e, f, g, h, t1, t2; + u32 a, b, c, d, e, f, g, h; int i; /* load the input */ @@ -61,141 +80,16 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W) e = state[4]; f = state[5]; g = state[6]; h = state[7]; /* now iterate */ - t1 = h + e1(e) + Ch(e, f, g) + 0x428a2f98 + W[0]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x71374491 + W[1]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xb5c0fbcf + W[2]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xe9b5dba5 + W[3]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x3956c25b + W[4]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x59f111f1 + W[5]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x923f82a4 + W[6]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xab1c5ed5 + W[7]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xd807aa98 + W[8]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x12835b01 + W[9]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x243185be + W[10]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x550c7dc3 + W[11]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x72be5d74 + W[12]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x80deb1fe + W[13]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x9bdc06a7 + W[14]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xc19bf174 + W[15]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xe49b69c1 + W[16]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xefbe4786 + W[17]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x0fc19dc6 + W[18]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x240ca1cc + W[19]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x2de92c6f + W[20]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x4a7484aa + W[21]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x5cb0a9dc + W[22]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x76f988da + W[23]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x983e5152 + W[24]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xa831c66d + W[25]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xb00327c8 + W[26]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xbf597fc7 + W[27]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0xc6e00bf3 + W[28]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xd5a79147 + W[29]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x06ca6351 + W[30]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x14292967 + W[31]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x27b70a85 + W[32]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x2e1b2138 + W[33]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x4d2c6dfc + W[34]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x53380d13 + W[35]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x650a7354 + W[36]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x766a0abb + W[37]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x81c2c92e + W[38]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x92722c85 + W[39]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xa2bfe8a1 + W[40]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xa81a664b + W[41]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xc24b8b70 + W[42]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xc76c51a3 + W[43]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0xd192e819 + W[44]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xd6990624 + W[45]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0xf40e3585 + W[46]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x106aa070 + W[47]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x19a4c116 + W[48]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x1e376c08 + W[49]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x2748774c + W[50]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x34b0bcb5 + W[51]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x391c0cb3 + W[52]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x4ed8aa4a + W[53]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x5b9cca4f + W[54]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x682e6ff3 + W[55]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x748f82ee + W[56]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x78a5636f + W[57]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x84c87814 + W[58]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x8cc70208 + W[59]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x90befffa + W[60]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xa4506ceb + W[61]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0xbef9a3f7 + W[62]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xc67178f2 + W[63]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; + for (i = 0; i < 64; i += 8) { + SHA256_ROUND(i + 0, a, b, c, d, e, f, g, h); + SHA256_ROUND(i + 1, h, a, b, c, d, e, f, g); + SHA256_ROUND(i + 2, g, h, a, b, c, d, e, f); + SHA256_ROUND(i + 3, f, g, h, a, b, c, d, e); + SHA256_ROUND(i + 4, e, f, g, h, a, b, c, d); + SHA256_ROUND(i + 5, d, e, f, g, h, a, b, c); + SHA256_ROUND(i + 6, c, d, e, f, g, h, a, b); + SHA256_ROUND(i + 7, b, c, d, e, f, g, h, a); + } state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; state[5] += f; state[6] += g; state[7] += h; From patchwork Tue Oct 20 20:39:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847873 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6611C4363A for ; Tue, 20 Oct 2020 20:40:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9288B22256 for ; Tue, 20 Oct 2020 20:40:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438600AbgJTUkI (ORCPT ); Tue, 20 Oct 2020 16:40:08 -0400 Received: from mail-qv1-f67.google.com ([209.85.219.67]:37881 "EHLO mail-qv1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438591AbgJTUkG (ORCPT ); Tue, 20 Oct 2020 16:40:06 -0400 Received: by mail-qv1-f67.google.com with SMTP id t6so1621668qvz.4; Tue, 20 Oct 2020 13:40:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mKXeGSjs2PX4ElXU18mmsTyU03ZjqgEm7GsyImxM4Qk=; b=G0B+hpiok9ELahrTCbQds+rnuWxqyPTSBxvFqvFZPRJdts0AoAtrlDdp7UsLdu4PFo QDiWAw/GSQ8meH4SN+m+oQaxrsMeoa8MAFyOVwzyspK1AXnamiyfk/p7mMD8YLDNgaWi vtsEhpmOUzYF05Aerv7qCRa+L2Xc3g+6g30m5q3mpc+Mt4qzzakQkdx+7b6A3glj0iOV 4asqep6K9dph0XJke/PSmKRa3gyzn6kF7akCgdZjIXFzGMGSpx2LccIiZXUuCP7kcmlN B5iMR55em9yAL9r1gyHs6Y0IifcimBFzw8SqoXoQ4VzWnSkgP/bhyu9slYKWnU5ruRF8 pm1Q== X-Gm-Message-State: AOAM533z1/8CAPBsNENVMlykdXZXobXqOZj4nggmi7/9UWa47Lk1SXpX vu9ozAmSgOPcV6kcwtAsAjw= X-Google-Smtp-Source: ABdhPJzbiQdHTKHTglY3uXedvh48idzcfsU1X2+x9hmS3FzlcmWND9xbypFUtrhgsdG64HOLlL7nZg== X-Received: by 2002:a0c:b251:: with SMTP id k17mr5301626qve.53.1603226405645; Tue, 20 Oct 2020 13:40:05 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.40.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:40:04 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops Date: Tue, 20 Oct 2020 16:39:56 -0400 Message-Id: <20201020203957.3512851-6-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 (tested on Broadwell Xeon) while not increasing code size too much. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 5efd390706c6..3a8802d5f747 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -68,12 +68,28 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W) int i; /* load the input */ - for (i = 0; i < 16; i++) - LOAD_OP(i, W, input); + for (i = 0; i < 16; i += 8) { + LOAD_OP(i + 0, W, input); + LOAD_OP(i + 1, W, input); + LOAD_OP(i + 2, W, input); + LOAD_OP(i + 3, W, input); + LOAD_OP(i + 4, W, input); + LOAD_OP(i + 5, W, input); + LOAD_OP(i + 6, W, input); + LOAD_OP(i + 7, W, input); + } /* now blend */ - for (i = 16; i < 64; i++) - BLEND_OP(i, W); + for (i = 16; i < 64; i += 8) { + BLEND_OP(i + 0, W); + BLEND_OP(i + 1, W); + BLEND_OP(i + 2, W); + BLEND_OP(i + 3, W); + BLEND_OP(i + 4, W); + BLEND_OP(i + 5, W); + BLEND_OP(i + 6, W); + BLEND_OP(i + 7, W); + } /* load the state into our registers */ a = state[0]; b = state[1]; c = state[2]; d = state[3]; From patchwork Tue Oct 20 20:39:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11847879 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8191C4363A for ; Tue, 20 Oct 2020 20:40:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FCAC22256 for ; Tue, 20 Oct 2020 20:40:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438624AbgJTUkS (ORCPT ); Tue, 20 Oct 2020 16:40:18 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:43952 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438558AbgJTUkI (ORCPT ); Tue, 20 Oct 2020 16:40:08 -0400 Received: by mail-qk1-f194.google.com with SMTP id c2so3093684qkf.10; Tue, 20 Oct 2020 13:40:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BCw7VQquYIPd9JPnAjoiF0ZfNAHI16hdt3AF4EdaI8s=; b=IQjd+HED8RY9unoapsDgf7A7S33hMhYzTsKtudNxEsHuJ/hjWfNZXwj+unzJsqmOjn /viROiPLpLJXs0uMfKZzP0w+IqcLQhfezRnWjYweJgHoVIQkdt7zrCQ7PAedfy4l2tmO J8799aMFhtQ9FMc5A4MqGMBWop5/aVICetW231cTirC++A38JVhLW8MVaCBkai64FUO0 xn7VVrlDkl2+b8vJHGiPvJ19p8/RFRBn/5RbGnsMC+Ps139qGnReNJJamgM60ONYSp9O ebYJ0jP6zAv7SqCGTF3E39RHskpjgJuVSsyPoFuxqbpAIAzPflwKJeG1QpP6IjGCFSIs Jb6Q== X-Gm-Message-State: AOAM530leu4qt3awrX1TOD8rL5NkjpIIoMpaVXIKAL6EUI1JhLvfR21z c1Q69jJcFn0/osHL94x8Log= X-Google-Smtp-Source: ABdhPJxMGCQ4K0SCS+bCPOy6kEHVB4i5UbRyQ4MWuaMTJynumL/mEmyGcFgJ6Tmfop097cyVGLGslQ== X-Received: by 2002:a05:620a:40f:: with SMTP id 15mr73478qkp.398.1603226406685; Tue, 20 Oct 2020 13:40:06 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id m18sm1411165qkk.102.2020.10.20.13.40.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 13:40:05 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule Date: Tue, 20 Oct 2020 16:39:57 -0400 Message-Id: <20201020203957.3512851-7-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201020203957.3512851-1-nivedita@alum.mit.edu> References: <20201020203957.3512851-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Putting the round constants and the message schedule arrays together in one structure saves one register, which can be a significant benefit on register-constrained architectures. On x86-32 (tested on Broadwell Xeon), this gives a 10% performance benefit. Signed-off-by: Arvind Sankar Suggested-by: David Laight --- lib/crypto/sha256.c | 49 ++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 3a8802d5f747..985cd0560d79 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -29,6 +29,11 @@ static const u32 SHA256_K[] = { 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2, }; +struct KW { + u32 K[64]; + u32 W[64]; +}; + static inline u32 Ch(u32 x, u32 y, u32 z) { return z ^ (x & (y ^ z)); @@ -56,39 +61,39 @@ static inline void BLEND_OP(int I, u32 *W) #define SHA256_ROUND(i, a, b, c, d, e, f, g, h) do { \ u32 t1, t2; \ - t1 = h + e1(e) + Ch(e, f, g) + SHA256_K[i] + W[i]; \ + t1 = h + e1(e) + Ch(e, f, g) + KW->K[i] + KW->W[i]; \ t2 = e0(a) + Maj(a, b, c); \ d += t1; \ h = t1 + t2; \ } while (0) -static void sha256_transform(u32 *state, const u8 *input, u32 *W) +static void sha256_transform(u32 *state, const u8 *input, struct KW *KW) { u32 a, b, c, d, e, f, g, h; int i; /* load the input */ for (i = 0; i < 16; i += 8) { - LOAD_OP(i + 0, W, input); - LOAD_OP(i + 1, W, input); - LOAD_OP(i + 2, W, input); - LOAD_OP(i + 3, W, input); - LOAD_OP(i + 4, W, input); - LOAD_OP(i + 5, W, input); - LOAD_OP(i + 6, W, input); - LOAD_OP(i + 7, W, input); + LOAD_OP(i + 0, KW->W, input); + LOAD_OP(i + 1, KW->W, input); + LOAD_OP(i + 2, KW->W, input); + LOAD_OP(i + 3, KW->W, input); + LOAD_OP(i + 4, KW->W, input); + LOAD_OP(i + 5, KW->W, input); + LOAD_OP(i + 6, KW->W, input); + LOAD_OP(i + 7, KW->W, input); } /* now blend */ for (i = 16; i < 64; i += 8) { - BLEND_OP(i + 0, W); - BLEND_OP(i + 1, W); - BLEND_OP(i + 2, W); - BLEND_OP(i + 3, W); - BLEND_OP(i + 4, W); - BLEND_OP(i + 5, W); - BLEND_OP(i + 6, W); - BLEND_OP(i + 7, W); + BLEND_OP(i + 0, KW->W); + BLEND_OP(i + 1, KW->W); + BLEND_OP(i + 2, KW->W); + BLEND_OP(i + 3, KW->W); + BLEND_OP(i + 4, KW->W); + BLEND_OP(i + 5, KW->W); + BLEND_OP(i + 6, KW->W); + BLEND_OP(i + 7, KW->W); } /* load the state into our registers */ @@ -115,7 +120,7 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) { unsigned int partial, done; const u8 *src; - u32 W[64]; + struct KW KW; partial = sctx->count & 0x3f; sctx->count += len; @@ -129,13 +134,15 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) src = sctx->buf; } + memcpy(KW.K, SHA256_K, sizeof(KW.K)); + do { - sha256_transform(sctx->state, src, W); + sha256_transform(sctx->state, src, &KW); done += 64; src = data + done; } while (done + 63 < len); - memzero_explicit(W, sizeof(W)); + memzero_explicit(KW.W, sizeof(KW.W)); partial = 0; }