From patchwork Fri Oct 23 19:21:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11854449 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA723C388F9 for ; Fri, 23 Oct 2020 19:22:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 60EAF206DC for ; Fri, 23 Oct 2020 19:22:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754141AbgJWTWI (ORCPT ); Fri, 23 Oct 2020 15:22:08 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:44540 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750844AbgJWTWH (ORCPT ); Fri, 23 Oct 2020 15:22:07 -0400 Received: by mail-qt1-f195.google.com with SMTP id m65so1815195qte.11; Fri, 23 Oct 2020 12:22:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WaiSZM/zHLv+SM8aG+Xi49iKWwDUu00ZSVCNFd+ZJGs=; b=Z/Ydp5l/AvCHjWW1VKrtutnIQLhW7PIL2RaC0pc1yU9u+uGymeYowGFBwtNhKGy7Rx m4s3ZUjmUalVexVhsG6eWGiQli/IvhBVMsZxkVlfY0AtReqggVog3+B0dV/VD8pbFNuG cD+LoE+DewMhcXeJBBpqqTC+AHE3UoUr/N/BRPSAaiHuPUsdkyxp3kZsEgvXiPAJL01Z hBTPkwl8RflQLs+fcQFfXwV61ZH60kU+KIRkYsBQmTLymCpQLVkrVZcPs/OFgqT+YuDr BlGhJDGI/kfjAEf7Ug6EX5IZxvNCa4aTIjo3A2FfPAswCAPZPTWPZ3eblm09uD6LHleu GvWA== X-Gm-Message-State: AOAM5302P/1yuR9/PFMTPNMH1Z9+3h8ryO/I146MBXUomVBCyHmfse08 PnpFKODMBLrUiLLlJ5RTCUKwvmneFXpNgA== X-Google-Smtp-Source: ABdhPJytdwI+3HmIDdQi7x7r8HKnTSFaLNhqau7lJt0ZB5UiXnBLv+RF4MalVjpqB3HF1pY7R+M1gQ== X-Received: by 2002:ac8:58ce:: with SMTP id u14mr3720555qta.56.1603480925844; Fri, 23 Oct 2020 12:22:05 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id n199sm1398493qkn.77.2020.10.23.12.22.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 12:22:05 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , Eric Biggers , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v3 1/5] crypto: Use memzero_explicit() for clearing state Date: Fri, 23 Oct 2020 15:21:59 -0400 Message-Id: <20201023192203.400040-2-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023192203.400040-1-nivedita@alum.mit.edu> References: <20201023192203.400040-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the function can get inlined into sha256(), in which case the memset is optimized away. Signed-off-by: Arvind Sankar --- arch/arm64/crypto/ghash-ce-glue.c | 2 +- arch/arm64/crypto/poly1305-glue.c | 2 +- arch/arm64/crypto/sha3-ce-glue.c | 2 +- arch/x86/crypto/poly1305_glue.c | 2 +- include/crypto/sha1_base.h | 3 ++- include/crypto/sha256_base.h | 3 ++- include/crypto/sha512_base.h | 3 ++- include/crypto/sm3_base.h | 3 ++- lib/crypto/sha256.c | 2 +- 9 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index 8536008e3e35..2427e2f3a9a1 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -168,7 +168,7 @@ static int ghash_final(struct shash_desc *desc, u8 *dst) put_unaligned_be64(ctx->digest[1], dst); put_unaligned_be64(ctx->digest[0], dst + 8); - *ctx = (struct ghash_desc_ctx){}; + memzero_explicit(ctx, sizeof(*ctx)); return 0; } diff --git a/arch/arm64/crypto/poly1305-glue.c b/arch/arm64/crypto/poly1305-glue.c index f33ada70c4ed..683de671741a 100644 --- a/arch/arm64/crypto/poly1305-glue.c +++ b/arch/arm64/crypto/poly1305-glue.c @@ -177,7 +177,7 @@ void poly1305_final_arch(struct poly1305_desc_ctx *dctx, u8 *dst) } poly1305_emit(&dctx->h, dst, dctx->s); - *dctx = (struct poly1305_desc_ctx){}; + memzero_explicit(dctx, sizeof(*dctx)); } EXPORT_SYMBOL(poly1305_final_arch); diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c index 9a4bbfc45f40..e5a2936f0886 100644 --- a/arch/arm64/crypto/sha3-ce-glue.c +++ b/arch/arm64/crypto/sha3-ce-glue.c @@ -94,7 +94,7 @@ static int sha3_final(struct shash_desc *desc, u8 *out) if (digest_size & 4) put_unaligned_le32(sctx->st[i], (__le32 *)digest); - *sctx = (struct sha3_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index e508dbd91813..64d09520d279 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -209,7 +209,7 @@ void poly1305_final_arch(struct poly1305_desc_ctx *dctx, u8 *dst) } poly1305_simd_emit(&dctx->h, dst, dctx->s); - *dctx = (struct poly1305_desc_ctx){}; + memzero_explicit(dctx, sizeof(*dctx)); } EXPORT_SYMBOL(poly1305_final_arch); diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h index 20fd1f7468af..a5d6033efef7 100644 --- a/include/crypto/sha1_base.h +++ b/include/crypto/sha1_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -101,7 +102,7 @@ static inline int sha1_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha1_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h index 6ded110783ae..93f9fd21cc06 100644 --- a/include/crypto/sha256_base.h +++ b/include/crypto/sha256_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -105,7 +106,7 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32)) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha256_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h index fb19c77494dc..93ab73baa38e 100644 --- a/include/crypto/sha512_base.h +++ b/include/crypto/sha512_base.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -126,7 +127,7 @@ static inline int sha512_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64)) put_unaligned_be64(sctx->state[i], digest++); - *sctx = (struct sha512_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sm3_base.h b/include/crypto/sm3_base.h index 1cbf9aa1fe52..2f3a32ab97bb 100644 --- a/include/crypto/sm3_base.h +++ b/include/crypto/sm3_base.h @@ -13,6 +13,7 @@ #include #include #include +#include #include typedef void (sm3_block_fn)(struct sm3_state *sst, u8 const *src, int blocks); @@ -104,7 +105,7 @@ static inline int sm3_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SM3_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sm3_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 2321f6cb322f..d43bc39ab05e 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -265,7 +265,7 @@ static void __sha256_final(struct sha256_state *sctx, u8 *out, int digest_words) put_unaligned_be32(sctx->state[i], &dst[i]); /* Zeroize sensitive information. */ - memset(sctx, 0, sizeof(*sctx)); + memzero_explicit(sctx, sizeof(*sctx)); } void sha256_final(struct sha256_state *sctx, u8 *out) From patchwork Fri Oct 23 19:22:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11854443 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67C87C388F9 for ; Fri, 23 Oct 2020 19:22:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B9B5206DC for ; Fri, 23 Oct 2020 19:22:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751737AbgJWTWJ (ORCPT ); Fri, 23 Oct 2020 15:22:09 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:40468 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750751AbgJWTWJ (ORCPT ); Fri, 23 Oct 2020 15:22:09 -0400 Received: by mail-qk1-f195.google.com with SMTP id h140so2237496qke.7; Fri, 23 Oct 2020 12:22:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2MMdXJnn8I5qbdPqNRJaBrVa9H2HlYZso3sU1VlHnnc=; b=GZSC6nz6G/Hv8vSqa4V+nuUq6c5J5ur8sp6iawql1orNUosyVXZKfnbZdDyeHidvCx yeDVSOawEFDYSIz72UBxrM4BIE9K6YtoFitdSM8YF53xpxPzSQEymClECsXEBuqfzHgG jHcpEB3YlwiBWutk5KMwXxakeJcjfb5F2VvNDac/lTh445wLi52cfcTvUkK4eMhSFGQm qBFTRpUVF7uqqeL7yjSFBHdZ1nJ9dWE4tSLhh7uGozPvtwdrVIHdPHbjGIMPFmow1TBc bePnb7WOzz3DXMBEPl9lKzxmbfpBMr//cXykzsTFbxn/hA+TRfIExMephnIQTBco84NJ FtiQ== X-Gm-Message-State: AOAM531kLKHiGFbDUVDRIn+zJG7kSvDWHoAuTwHTcV/S8/Vi5wkc0haE ymG4EOzcSWZyvxOl0oeY9usDy/TZzr5caQ== X-Google-Smtp-Source: ABdhPJwfoi1h8DUj9DC3Y6t1HmHe7rYHitdm4mibV5dm5Jtk9J8ABhu5i5U4p47WC4EXl6ICwhVfwA== X-Received: by 2002:a05:620a:4f8:: with SMTP id b24mr3565552qkh.299.1603480927925; Fri, 23 Oct 2020 12:22:07 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id n199sm1398493qkn.77.2020.10.23.12.22.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 12:22:06 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , Eric Biggers , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v3 2/5] crypto: lib/sha256 - Don't clear temporary variables Date: Fri, 23 Oct 2020 15:22:00 -0400 Message-Id: <20201023192203.400040-3-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023192203.400040-1-nivedita@alum.mit.edu> References: <20201023192203.400040-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The assignments to clear a through h and t1/t2 are optimized out by the compiler because they are unused after the assignments. Clearing individual scalar variables is unlikely to be useful, as they may have been assigned to registers, and even if stack spilling was required, there may be compiler-generated temporaries that are impossible to clear in any case. So drop the clearing of a through h and t1/t2. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index d43bc39ab05e..099cd11f83c1 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input) state[4] += e; state[5] += f; state[6] += g; state[7] += h; /* clear any sensitive info... */ - a = b = c = d = e = f = g = h = t1 = t2 = 0; memzero_explicit(W, 64 * sizeof(u32)); } From patchwork Fri Oct 23 19:22:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11854447 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8281DC5517A for ; Fri, 23 Oct 2020 19:22:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45E3520715 for ; Fri, 23 Oct 2020 19:22:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751781AbgJWTWM (ORCPT ); Fri, 23 Oct 2020 15:22:12 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:43616 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754439AbgJWTWL (ORCPT ); Fri, 23 Oct 2020 15:22:11 -0400 Received: by mail-qk1-f195.google.com with SMTP id q199so2228158qke.10; Fri, 23 Oct 2020 12:22:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bi7rG0H94VO76zPLa4Hh0/sN75asB3cy5CcRjg5R6zI=; b=LV95wACS0FvBwhkzcDPMIkipEhKZJIflYt1uvRDyQdRfrSL1B3YP80qWKdNXDS7+zV VVHJr67AeLqBEGwB7Pvr8LbKZA6uM/mZt0SZiavyHl47Wma+6j9z9Re628gVKMNhIJ79 y+INqCuyBUrNZo4FoUcOE+v9TPdjTAqkDs5WdEqQIf8Zj5K737jrKNGLVjDNpkWFnfeV WdS5KlCm1TJFSwUHQgLbMrEfw8TOwbsNUVoILUORlIbHgPtNwaWnArCfNGk/foB2AFuX j3uzZbBXXUdFyGTeGPYDCMwuoYeWKmnq8gUYzyzvHxJOSkP5oDPN2FeZ1kD5h1QFbAEi fPuA== X-Gm-Message-State: AOAM532b8R2JrDkU8O9itHCMXir/C3iv5M5L9bjgcUI6Jz2P5BrWYGWD ps8vKNcc3KG0PabHGBpeW+M= X-Google-Smtp-Source: ABdhPJzZ0YAOlaoFKb/Sy7jJeXv9COCoZ28EFLzHBgYJfMmQtu85Qt2AjnFL4nrUYEQ9BmQGrwp4bA== X-Received: by 2002:a37:aa05:: with SMTP id t5mr3728792qke.126.1603480929063; Fri, 23 Oct 2020 12:22:09 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id n199sm1398493qkn.77.2020.10.23.12.22.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 12:22:08 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , Eric Biggers , David Laight Cc: linux-kernel@vger.kernel.org, Eric Biggers Subject: [PATCH v3 3/5] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() Date: Fri, 23 Oct 2020 15:22:01 -0400 Message-Id: <20201023192203.400040-4-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023192203.400040-1-nivedita@alum.mit.edu> References: <20201023192203.400040-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The temporary W[] array is currently zeroed out once every call to sha256_transform(), i.e. once every 64 bytes of input data. Moving it to sha256_update() instead so that it is cleared only once per update can save about 2-3% of the total time taken to compute the digest, with a reasonable memset() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 099cd11f83c1..c6bfeacc5b81 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -43,10 +43,9 @@ static inline void BLEND_OP(int I, u32 *W) W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; } -static void sha256_transform(u32 *state, const u8 *input) +static void sha256_transform(u32 *state, const u8 *input, u32 *W) { u32 a, b, c, d, e, f, g, h, t1, t2; - u32 W[64]; int i; /* load the input */ @@ -200,15 +199,13 @@ static void sha256_transform(u32 *state, const u8 *input) state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; state[5] += f; state[6] += g; state[7] += h; - - /* clear any sensitive info... */ - memzero_explicit(W, 64 * sizeof(u32)); } void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) { unsigned int partial, done; const u8 *src; + u32 W[64]; partial = sctx->count & 0x3f; sctx->count += len; @@ -223,11 +220,13 @@ void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) } do { - sha256_transform(sctx->state, src); + sha256_transform(sctx->state, src, W); done += 64; src = data + done; } while (done + 63 < len); + memzero_explicit(W, sizeof(W)); + partial = 0; } memcpy(sctx->buf + partial, src, len - done); From patchwork Fri Oct 23 19:22:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11854451 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BC11C55178 for ; Fri, 23 Oct 2020 19:22:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25CD5206DC for ; Fri, 23 Oct 2020 19:22:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751796AbgJWTWQ (ORCPT ); Fri, 23 Oct 2020 15:22:16 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:46768 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S465415AbgJWTWN (ORCPT ); Fri, 23 Oct 2020 15:22:13 -0400 Received: by mail-qk1-f194.google.com with SMTP id a23so2220408qkg.13; Fri, 23 Oct 2020 12:22:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eXfwbgzjojGkj+mKXZgWXKOUq4PpH/ee7xr0/kbFbg0=; b=T0n53jBgDJGSVpJvclW9tJ2WrBP4iHcqYvlr+7SQNscw0dbwxM2yH2fbCBbpNffuoP 5E1//Ht0uTw9KOF16WwBPpTc01knsp/lIENFzZcEjpsTt78K/AKgmcqubBe1ts7GKL7f zdIOcGGYdUefNXLzs/HpQvyxdfznt3nidLgxIWDcxJEu0xPYc7dMTo+J4+pJP1a1Ej6T cAc2A1Au786v5tBBPPJ24RG68Hg7JMWxy0VWri5vs+CEyPzKTdAmZOTFiLrjzHCCCqvZ lDJjmlwRr8J7vuG2js8/pKJhHpBmlbQnRVpmDBiDuOMxtgkTKE3ZWIi3RzwMRhjF48OL biig== X-Gm-Message-State: AOAM531a0/h0Kr6fwYrBZquoC3KpYg3F/xaKoavVw6X4RsJn+cGVxT2+ 4CsKhxO9s35WX/TU85IZVfcBH4LqRhtlnA== X-Google-Smtp-Source: ABdhPJwnOQaEPsNo/xvwWZd3Nuaql73/iD/gCnmJ+wXn9QCImDnmJjNPCTtbXgaee8MLzaEK9k3+YQ== X-Received: by 2002:a37:b002:: with SMTP id z2mr3867518qke.251.1603480930169; Fri, 23 Oct 2020 12:22:10 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id n199sm1398493qkn.77.2020.10.23.12.22.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 12:22:09 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , Eric Biggers , David Laight Cc: linux-kernel@vger.kernel.org Subject: [PATCH v3 4/5] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 Date: Fri, 23 Oct 2020 15:22:02 -0400 Message-Id: <20201023192203.400040-5-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023192203.400040-1-nivedita@alum.mit.edu> References: <20201023192203.400040-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll (tested on Broadwell Xeon). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 174 ++++++++++---------------------------------- 1 file changed, 38 insertions(+), 136 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index c6bfeacc5b81..e2e29d9b0ccd 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -18,6 +18,25 @@ #include #include +static const u32 SHA256_K[] = { + 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, + 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, + 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, + 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, + 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, + 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, + 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, + 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, + 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, + 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, + 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, + 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, + 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, + 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, + 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, + 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2, +}; + static inline u32 Ch(u32 x, u32 y, u32 z) { return z ^ (x & (y ^ z)); @@ -43,9 +62,17 @@ static inline void BLEND_OP(int I, u32 *W) W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; } +#define SHA256_ROUND(i, a, b, c, d, e, f, g, h) do { \ + u32 t1, t2; \ + t1 = h + e1(e) + Ch(e, f, g) + SHA256_K[i] + W[i]; \ + t2 = e0(a) + Maj(a, b, c); \ + d += t1; \ + h = t1 + t2; \ +} while (0) + static void sha256_transform(u32 *state, const u8 *input, u32 *W) { - u32 a, b, c, d, e, f, g, h, t1, t2; + u32 a, b, c, d, e, f, g, h; int i; /* load the input */ @@ -61,141 +88,16 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W) e = state[4]; f = state[5]; g = state[6]; h = state[7]; /* now iterate */ - t1 = h + e1(e) + Ch(e, f, g) + 0x428a2f98 + W[0]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x71374491 + W[1]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xb5c0fbcf + W[2]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xe9b5dba5 + W[3]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x3956c25b + W[4]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x59f111f1 + W[5]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x923f82a4 + W[6]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xab1c5ed5 + W[7]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xd807aa98 + W[8]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x12835b01 + W[9]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x243185be + W[10]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x550c7dc3 + W[11]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x72be5d74 + W[12]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x80deb1fe + W[13]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x9bdc06a7 + W[14]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xc19bf174 + W[15]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xe49b69c1 + W[16]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xefbe4786 + W[17]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x0fc19dc6 + W[18]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x240ca1cc + W[19]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x2de92c6f + W[20]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x4a7484aa + W[21]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x5cb0a9dc + W[22]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x76f988da + W[23]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x983e5152 + W[24]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xa831c66d + W[25]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xb00327c8 + W[26]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xbf597fc7 + W[27]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0xc6e00bf3 + W[28]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xd5a79147 + W[29]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x06ca6351 + W[30]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x14292967 + W[31]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x27b70a85 + W[32]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x2e1b2138 + W[33]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x4d2c6dfc + W[34]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x53380d13 + W[35]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x650a7354 + W[36]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x766a0abb + W[37]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x81c2c92e + W[38]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x92722c85 + W[39]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0xa2bfe8a1 + W[40]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0xa81a664b + W[41]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0xc24b8b70 + W[42]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0xc76c51a3 + W[43]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0xd192e819 + W[44]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xd6990624 + W[45]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0xf40e3585 + W[46]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x106aa070 + W[47]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x19a4c116 + W[48]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x1e376c08 + W[49]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x2748774c + W[50]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x34b0bcb5 + W[51]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x391c0cb3 + W[52]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0x4ed8aa4a + W[53]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0x5b9cca4f + W[54]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0x682e6ff3 + W[55]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; - - t1 = h + e1(e) + Ch(e, f, g) + 0x748f82ee + W[56]; - t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; - t1 = g + e1(d) + Ch(d, e, f) + 0x78a5636f + W[57]; - t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; - t1 = f + e1(c) + Ch(c, d, e) + 0x84c87814 + W[58]; - t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; - t1 = e + e1(b) + Ch(b, c, d) + 0x8cc70208 + W[59]; - t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; - t1 = d + e1(a) + Ch(a, b, c) + 0x90befffa + W[60]; - t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; - t1 = c + e1(h) + Ch(h, a, b) + 0xa4506ceb + W[61]; - t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; - t1 = b + e1(g) + Ch(g, h, a) + 0xbef9a3f7 + W[62]; - t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; - t1 = a + e1(f) + Ch(f, g, h) + 0xc67178f2 + W[63]; - t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; + for (i = 0; i < 64; i += 8) { + SHA256_ROUND(i + 0, a, b, c, d, e, f, g, h); + SHA256_ROUND(i + 1, h, a, b, c, d, e, f, g); + SHA256_ROUND(i + 2, g, h, a, b, c, d, e, f); + SHA256_ROUND(i + 3, f, g, h, a, b, c, d, e); + SHA256_ROUND(i + 4, e, f, g, h, a, b, c, d); + SHA256_ROUND(i + 5, d, e, f, g, h, a, b, c); + SHA256_ROUND(i + 6, c, d, e, f, g, h, a, b); + SHA256_ROUND(i + 7, b, c, d, e, f, g, h, a); + } state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; state[5] += f; state[6] += g; state[7] += h; From patchwork Fri Oct 23 19:22:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arvind Sankar X-Patchwork-Id: 11854445 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9482C388F9 for ; Fri, 23 Oct 2020 19:22:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6FAC620E65 for ; Fri, 23 Oct 2020 19:22:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750842AbgJWTWQ (ORCPT ); Fri, 23 Oct 2020 15:22:16 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:42678 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751773AbgJWTWM (ORCPT ); Fri, 23 Oct 2020 15:22:12 -0400 Received: by mail-qk1-f196.google.com with SMTP id i22so2236966qkn.9; Fri, 23 Oct 2020 12:22:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rybpjupXWyCo0yLZ9NEgjQq2QiRjNeLKs2uYGAtk9sY=; b=FgM9KcxHHLi5/dtjCDOYsj6aHKRRHthNr6/ZcEm2lOi7V5CpwSg9PwDQOq47T9eqB4 sL7T6+jmQE1zJx9zEw9H1UzojMpcuqU+XYEzEGMPwUjofkIekUBc00vHYw8sfFxwWK6n lCdF4S2nv4McXnG0TXk6uRUU4vrHWMpUHyNNolyAzChEUG7DPbvwxhZ/IoapdOexiWPR jGxJAGEIRGAZ1HJlBrocaVuc6S88AQ1SfZbMbvQuxnZ56J2LIau3wEWmyNf8UFOLVsQv 3ZmNufmVan7rf5v6KFX3iJmR4pJd0w8Y7fekuPr0dKC4Ofic1i51f3S3gji2Q0uOvDQk YvNg== X-Gm-Message-State: AOAM53096bQy3/Fhw+I33eUnSChw2tDR6YdmmEOaAtfTUQEZaY8P0n9b tt67q1hC/p9YDk1vgeuwZ3I= X-Google-Smtp-Source: ABdhPJyylrFMKbqm8pN8Uu40iZWqkKQdrfSuG++ijxAkjIyXpib1VPruhJJjwLX6FL/FXkezVYap9g== X-Received: by 2002:ae9:f507:: with SMTP id o7mr3658481qkg.420.1603480931081; Fri, 23 Oct 2020 12:22:11 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id n199sm1398493qkn.77.2020.10.23.12.22.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 12:22:10 -0700 (PDT) From: Arvind Sankar To: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , Eric Biggers , David Laight Cc: linux-kernel@vger.kernel.org, Eric Biggers Subject: [PATCH v3 5/5] crypto: lib/sha256 - Unroll LOAD and BLEND loops Date: Fri, 23 Oct 2020 15:22:03 -0400 Message-Id: <20201023192203.400040-6-nivedita@alum.mit.edu> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023192203.400040-1-nivedita@alum.mit.edu> References: <20201023192203.400040-1-nivedita@alum.mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 (tested on Broadwell Xeon) while not increasing code size too much. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index e2e29d9b0ccd..cdef37c05972 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -76,12 +76,28 @@ static void sha256_transform(u32 *state, const u8 *input, u32 *W) int i; /* load the input */ - for (i = 0; i < 16; i++) - LOAD_OP(i, W, input); + for (i = 0; i < 16; i += 8) { + LOAD_OP(i + 0, W, input); + LOAD_OP(i + 1, W, input); + LOAD_OP(i + 2, W, input); + LOAD_OP(i + 3, W, input); + LOAD_OP(i + 4, W, input); + LOAD_OP(i + 5, W, input); + LOAD_OP(i + 6, W, input); + LOAD_OP(i + 7, W, input); + } /* now blend */ - for (i = 16; i < 64; i++) - BLEND_OP(i, W); + for (i = 16; i < 64; i += 8) { + BLEND_OP(i + 0, W); + BLEND_OP(i + 1, W); + BLEND_OP(i + 2, W); + BLEND_OP(i + 3, W); + BLEND_OP(i + 4, W); + BLEND_OP(i + 5, W); + BLEND_OP(i + 6, W); + BLEND_OP(i + 7, W); + } /* load the state into our registers */ a = state[0]; b = state[1]; c = state[2]; d = state[3];