From patchwork Sun Oct 27 09:45:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852462 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9500B67E for ; Sun, 27 Oct 2024 09:45:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022344; cv=none; b=Qi/dhKgc6A9+t7Q9UMcY/xkyUasgB6JnQ1UFP20gAZC81K8E7O+HGL2hNjbNGiWMQPnI0Xjg/2dgPUPSnPx4XVjMb63ht0eRpPknMyzsYFnlLHGfUqC1yNY86UP1rzeO+Iid53XqJNv7Gan8if5KGubO9PapZnFKLT/I5rU/lqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022344; c=relaxed/simple; bh=a5IvobQdDd67NuvafwjGwrdTks1mGVCWAx+yKT8yafs=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=kDKKtbNgknXz6bQKWgmIYPNnwCsLhUTe6AEEkESZu738Gjx/MoXxFYicMQqIwWo0lRxkQ7+QktFBWoOZxb/+JCOan3HIfFtFBLmblT2U2dBr7Zv5w+7h5hnFS7XOAt+hra/4z2BKo25TUHS6nQMznnhr2aEmeGLste0OSDfeY60= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=MpODMt/A; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="MpODMt/A" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=cyd6Nq1HT0PcrsGafZnNvIlqd9vNMhgLdADFFGPRSjQ=; b=MpODMt/AkAIBRrcNPnRpwUmR4G GDW1pqa/YbGpRx8+j0YH6IPc7GqgdGLc7QlRhROr8s0t62sI514A2ipunKhw8pZf361kAu3FKioVH GfGQBMbUWMBl4TCgXXof4u2jN+lM4rZrpSKcemCvvq/C0z4MAhMegkCLjyyhlpA7CqTViSE1E/HJd CB/0FsmwjHaTvDzh22w1GruGAbDfdKgE7o3tE3bEqEUl8DSXdnzqvehZA3RKm4bwE8lzK4dUUhewI ImT+jMYTrjGWbZStgsW0EWo/WqprfnIu8fSmPdKcFnJHgnNPzNfP8x54t7BX93jOW6xDFdn8rigJC T40p5vZA==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zpy-00CRFr-2e; Sun, 27 Oct 2024 17:45:31 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:30 +0800 Date: Sun, 27 Oct 2024 17:45:30 +0800 Message-Id: In-Reply-To: References: From: Herbert Xu Subject: [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: As unaligned operations are supported by the underlying algorithm, ahash_save_req and ahash_restore_req can be greatly simplified to only preserve the callback and data. Signed-off-by: Herbert Xu --- crypto/ahash.c | 97 ++++++++++++++++--------------------------- include/crypto/hash.h | 3 -- 2 files changed, 35 insertions(+), 65 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index bcd9de009a91..c8e7327c6949 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -27,6 +27,12 @@ #define CRYPTO_ALG_TYPE_AHASH_MASK 0x0000000e +struct ahash_save_req_state { + struct ahash_request *req; + crypto_completion_t compl; + void *data; +}; + /* * For an ahash tfm that is using an shash algorithm (instead of an ahash * algorithm), this returns the underlying shash tfm. @@ -262,67 +268,34 @@ int crypto_ahash_init(struct ahash_request *req) } EXPORT_SYMBOL_GPL(crypto_ahash_init); -static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt, - bool has_state) +static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) { - struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - unsigned int ds = crypto_ahash_digestsize(tfm); - struct ahash_request *subreq; - unsigned int subreq_size; - unsigned int reqsize; - u8 *result; + struct ahash_save_req_state *state; gfp_t gfp; u32 flags; - subreq_size = sizeof(*subreq); - reqsize = crypto_ahash_reqsize(tfm); - reqsize = ALIGN(reqsize, crypto_tfm_ctx_alignment()); - subreq_size += reqsize; - subreq_size += ds; - flags = ahash_request_flags(req); gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC; - subreq = kmalloc(subreq_size, gfp); - if (!subreq) + state = kmalloc(sizeof(*state), gfp); + if (!state) return -ENOMEM; - ahash_request_set_tfm(subreq, tfm); - ahash_request_set_callback(subreq, flags, cplt, req); - - result = (u8 *)(subreq + 1) + reqsize; - - ahash_request_set_crypt(subreq, req->src, result, req->nbytes); - - if (has_state) { - void *state; - - state = kmalloc(crypto_ahash_statesize(tfm), gfp); - if (!state) { - kfree(subreq); - return -ENOMEM; - } - - crypto_ahash_export(req, state); - crypto_ahash_import(subreq, state); - kfree_sensitive(state); - } - - req->priv = subreq; + state->compl = req->base.complete; + state->data = req->base.data; + req->base.complete = cplt; + req->base.data = state; + state->req = req; return 0; } -static void ahash_restore_req(struct ahash_request *req, int err) +static void ahash_restore_req(struct ahash_request *req) { - struct ahash_request *subreq = req->priv; + struct ahash_save_req_state *state = req->base.data; - if (!err) - memcpy(req->result, subreq->result, - crypto_ahash_digestsize(crypto_ahash_reqtfm(req))); - - req->priv = NULL; - - kfree_sensitive(subreq); + req->base.complete = state->compl; + req->base.data = state->data; + kfree(state); } int crypto_ahash_update(struct ahash_request *req) @@ -374,51 +347,51 @@ EXPORT_SYMBOL_GPL(crypto_ahash_digest); static void ahash_def_finup_done2(void *data, int err) { - struct ahash_request *areq = data; + struct ahash_save_req_state *state = data; + struct ahash_request *areq = state->req; if (err == -EINPROGRESS) return; - ahash_restore_req(areq, err); - + ahash_restore_req(areq); ahash_request_complete(areq, err); } static int ahash_def_finup_finish1(struct ahash_request *req, int err) { - struct ahash_request *subreq = req->priv; - if (err) goto out; - subreq->base.complete = ahash_def_finup_done2; + req->base.complete = ahash_def_finup_done2; - err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(subreq); + err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(req); if (err == -EINPROGRESS || err == -EBUSY) return err; out: - ahash_restore_req(req, err); + ahash_restore_req(req); return err; } static void ahash_def_finup_done1(void *data, int err) { - struct ahash_request *areq = data; - struct ahash_request *subreq; + struct ahash_save_req_state *state0 = data; + struct ahash_save_req_state state; + struct ahash_request *areq; + state = *state0; + areq = state.req; if (err == -EINPROGRESS) goto out; - subreq = areq->priv; - subreq->base.flags &= CRYPTO_TFM_REQ_MAY_BACKLOG; + areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; err = ahash_def_finup_finish1(areq, err); if (err == -EINPROGRESS || err == -EBUSY) return; out: - ahash_request_complete(areq, err); + state.compl(state.data, err); } static int ahash_def_finup(struct ahash_request *req) @@ -426,11 +399,11 @@ static int ahash_def_finup(struct ahash_request *req) struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); int err; - err = ahash_save_req(req, ahash_def_finup_done1, true); + err = ahash_save_req(req, ahash_def_finup_done1); if (err) return err; - err = crypto_ahash_alg(tfm)->update(req->priv); + err = crypto_ahash_alg(tfm)->update(req); if (err == -EINPROGRESS || err == -EBUSY) return err; diff --git a/include/crypto/hash.h b/include/crypto/hash.h index 2d5ea9f9ff43..9c1f8ca59a77 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -55,9 +55,6 @@ struct ahash_request { struct scatterlist *src; u8 *result; - /* This field may only be used by the ahash API code. */ - void *priv; - void *__ctx[] CRYPTO_MINALIGN_ATTR; }; From patchwork Sun Oct 27 09:45:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852463 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 419CF4084D for ; Sun, 27 Oct 2024 09:45:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022345; cv=none; b=YXPhz4bawyLsHYH+ZQokfRO12Fv4sMb92A69/o+eTjI00Qhy+1rpG3o1O5m4DUic07+3k9sicMMy0R/mV7tvgKeFqE+uaJqp6s4nIpaXiomZlcHFzdcVb4FaBMKTJZjkt7Svvrtr9AGXV28fFn39D+58wMzFq8xAx3ShQsc8wXQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022345; c=relaxed/simple; bh=QJSZs9UMapbx18kSep+zmmpDhNclqEJYsEjWd8+X0Mw=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=sIzsCtNDNwVOXRgOnw0d+FAo6omuEiXHB4FY1x+55CJEQjItFKZBDAFiW/CR7ex0lZ7lV53BlkpYT44laUITlUFQIsdr7IbuTpJuTUzpy1Knhtty0Awi2U1SWeoQ1g/ASqGmxnnRegGK/ekMR96uIMp65OSWGq9iNdTZqRDExb4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=G1wXUfc2; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="G1wXUfc2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=bHYOxGG19WOiabIewZ8ziRLrERDG2qgvvx3gSXuuMO4=; b=G1wXUfc2tsfHY1gM6NtrrAKhMc RbUAC/OL92gAeo22oWM3aODZYi0s7f3sZB7Qj+pSFl7G/+T3jXNEBMmCqijSuqDlWIJAR3FzpqLXw hwq6LR7aj2goxVVLDVZwb5rv8KPXBZINw4nQKDm3rO7QfhR/1FLujwqUutE61blUlL+BxsFCqcFdY XzuEqD+sG2tPgYugtlda/sYvwHu9EhIzwcRVwdViVjgsAEROJrzVPsxUvfsy2nyhUYEIrtyOH7g0j RGSX0SEYEts/SA8GexuDLRgMVtIxxfcHCGf8pp4GMhZiUseTCw5ksDJ0Bp9ImdO/c+MYLxgj6p2ae TRtATXxw==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zq1-00CRG3-0e; Sun, 27 Oct 2024 17:45:34 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:33 +0800 Date: Sun, 27 Oct 2024 17:45:33 +0800 Message-Id: <677614fbdc70b31df2e26483c8d2cd1510c8af91.1730021644.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [PATCH 2/6] crypto: hash - Add request chaining API To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This adds request chaining to the ahash interface. Request chaining allows multiple requests to be submitted in one shot. An algorithm can elect to receive chained requests by setting the flag CRYPTO_ALG_REQ_CHAIN. If this bit is not set, the API will break up chained requests and submit them one-by-one. A new err field is added to struct crypto_async_request to record the return value for each individual request. Signed-off-by: Herbert Xu --- crypto/ahash.c | 259 +++++++++++++++++++++++++++++---- include/crypto/algapi.h | 10 ++ include/crypto/hash.h | 25 ++++ include/crypto/internal/hash.h | 10 ++ include/linux/crypto.h | 26 ++++ 5 files changed, 304 insertions(+), 26 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index c8e7327c6949..e1ee18deca67 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -28,11 +28,19 @@ #define CRYPTO_ALG_TYPE_AHASH_MASK 0x0000000e struct ahash_save_req_state { - struct ahash_request *req; + struct list_head head; + struct ahash_request *req0; + struct ahash_request *cur; + int (*op)(struct ahash_request *req); crypto_completion_t compl; void *data; }; +static void ahash_reqchain_done(void *data, int err); +static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt); +static void ahash_restore_req(struct ahash_request *req); +static int ahash_def_finup(struct ahash_request *req); + /* * For an ahash tfm that is using an shash algorithm (instead of an ahash * algorithm), this returns the underlying shash tfm. @@ -256,24 +264,150 @@ int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key, } EXPORT_SYMBOL_GPL(crypto_ahash_setkey); +static int ahash_reqchain_finish(struct ahash_save_req_state *state, + int err, u32 mask) +{ + struct ahash_request *req0 = state->req0; + struct ahash_request *req = state->cur; + struct ahash_request *n; + + req->base.err = err; + + if (req == req0) + INIT_LIST_HEAD(&req->base.list); + else + list_add_tail(&req->base.list, &req0->base.list); + + list_for_each_entry_safe(req, n, &state->head, base.list) { + list_del_init(&req->base.list); + + req->base.flags &= mask; + req->base.complete = ahash_reqchain_done; + req->base.data = state; + state->cur = req; + err = state->op(req); + + if (err == -EINPROGRESS) { + if (!list_empty(&state->head)) + err = -EBUSY; + goto out; + } + + if (err == -EBUSY) + goto out; + + req->base.err = err; + list_add_tail(&req->base.list, &req0->base.list); + } + + ahash_restore_req(req0); + +out: + return err; +} + +static void ahash_reqchain_done(void *data, int err) +{ + struct ahash_save_req_state *state = data; + crypto_completion_t compl = state->compl; + + data = state->data; + + if (err == -EINPROGRESS) { + if (!list_empty(&state->head)) + return; + goto notify; + } + + err = ahash_reqchain_finish(state, err, CRYPTO_TFM_REQ_MAY_BACKLOG); + if (err == -EBUSY) + return; + +notify: + compl(data, err); +} + +static int ahash_do_req_chain(struct ahash_request *req, + int (*op)(struct ahash_request *req)) +{ + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct ahash_save_req_state *state; + struct ahash_save_req_state state0; + int err; + + if (!ahash_request_chained(req) || list_empty(&req->base.list) || + crypto_ahash_req_chain(tfm)) + return op(req); + + state = &state0; + + if (ahash_is_async(tfm)) { + err = ahash_save_req(req, ahash_reqchain_done); + if (err) { + struct ahash_request *r2; + + req->base.err = err; + list_for_each_entry(r2, &req->base.list, base.list) + r2->base.err = err; + + return err; + } + + state = req->base.data; + } + + state->op = op; + state->cur = req; + INIT_LIST_HEAD(&state->head); + list_splice(&req->base.list, &state->head); + + err = op(req); + if (err == -EBUSY || err == -EINPROGRESS) + return -EBUSY; + + return ahash_reqchain_finish(state, err, ~0); +} + int crypto_ahash_init(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - if (likely(tfm->using_shash)) - return crypto_shash_init(prepare_shash_desc(req, tfm)); + if (likely(tfm->using_shash)) { + struct ahash_request *r2; + int err; + + err = crypto_shash_init(prepare_shash_desc(req, tfm)); + req->base.err = err; + if (!ahash_request_chained(req)) + return err; + + list_for_each_entry(r2, &req->base.list, base.list) { + struct shash_desc *desc; + + desc = prepare_shash_desc(r2, tfm); + r2->base.err = crypto_shash_init(desc); + } + + return 0; + } + if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; - return crypto_ahash_alg(tfm)->init(req); + + return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->init); } EXPORT_SYMBOL_GPL(crypto_ahash_init); static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) { + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); struct ahash_save_req_state *state; gfp_t gfp; u32 flags; + if (!ahash_is_async(tfm)) + return 0; + flags = ahash_request_flags(req); gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC; state = kmalloc(sizeof(*state), gfp); @@ -284,14 +418,20 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) state->data = req->base.data; req->base.complete = cplt; req->base.data = state; - state->req = req; + state->req0 = req; return 0; } static void ahash_restore_req(struct ahash_request *req) { - struct ahash_save_req_state *state = req->base.data; + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct ahash_save_req_state *state; + + if (!ahash_is_async(tfm)) + return; + + state = req->base.data; req->base.complete = state->compl; req->base.data = state->data; @@ -302,10 +442,26 @@ int crypto_ahash_update(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - if (likely(tfm->using_shash)) - return shash_ahash_update(req, ahash_request_ctx(req)); + if (likely(tfm->using_shash)) { + struct ahash_request *r2; + int err; - return crypto_ahash_alg(tfm)->update(req); + err = shash_ahash_update(req, ahash_request_ctx(req)); + req->base.err = err; + if (!ahash_request_chained(req)) + return err; + + list_for_each_entry(r2, &req->base.list, base.list) { + struct shash_desc *desc; + + desc = ahash_request_ctx(r2); + r2->base.err = shash_ahash_update(r2, desc); + } + + return 0; + } + + return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->update); } EXPORT_SYMBOL_GPL(crypto_ahash_update); @@ -313,10 +469,26 @@ int crypto_ahash_final(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - if (likely(tfm->using_shash)) - return crypto_shash_final(ahash_request_ctx(req), req->result); + if (likely(tfm->using_shash)) { + struct ahash_request *r2; + int err; - return crypto_ahash_alg(tfm)->final(req); + err = crypto_shash_final(ahash_request_ctx(req), req->result); + req->base.err = err; + if (!ahash_request_chained(req)) + return err; + + list_for_each_entry(r2, &req->base.list, base.list) { + struct shash_desc *desc; + + desc = ahash_request_ctx(r2); + r2->base.err = crypto_shash_final(desc, r2->result); + } + + return 0; + } + + return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->final); } EXPORT_SYMBOL_GPL(crypto_ahash_final); @@ -324,10 +496,29 @@ int crypto_ahash_finup(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - if (likely(tfm->using_shash)) - return shash_ahash_finup(req, ahash_request_ctx(req)); + if (likely(tfm->using_shash)) { + struct ahash_request *r2; + int err; - return crypto_ahash_alg(tfm)->finup(req); + err = shash_ahash_finup(req, ahash_request_ctx(req)); + req->base.err = err; + if (!ahash_request_chained(req)) + return err; + + list_for_each_entry(r2, &req->base.list, base.list) { + struct shash_desc *desc; + + desc = ahash_request_ctx(r2); + r2->base.err = shash_ahash_finup(r2, desc); + } + + return 0; + } + + if (!crypto_ahash_alg(tfm)->finup) + return ahash_def_finup(req); + + return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->finup); } EXPORT_SYMBOL_GPL(crypto_ahash_finup); @@ -335,20 +526,36 @@ int crypto_ahash_digest(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - if (likely(tfm->using_shash)) - return shash_ahash_digest(req, prepare_shash_desc(req, tfm)); + if (likely(tfm->using_shash)) { + struct ahash_request *r2; + int err; + + err = shash_ahash_digest(req, prepare_shash_desc(req, tfm)); + req->base.err = err; + if (!ahash_request_chained(req)) + return err; + + list_for_each_entry(r2, &req->base.list, base.list) { + struct shash_desc *desc; + + desc = prepare_shash_desc(r2, tfm); + r2->base.err = shash_ahash_digest(r2, desc); + } + + return 0; + } if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; - return crypto_ahash_alg(tfm)->digest(req); + return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->digest); } EXPORT_SYMBOL_GPL(crypto_ahash_digest); static void ahash_def_finup_done2(void *data, int err) { struct ahash_save_req_state *state = data; - struct ahash_request *areq = state->req; + struct ahash_request *areq = state->req0; if (err == -EINPROGRESS) return; @@ -359,12 +566,15 @@ static void ahash_def_finup_done2(void *data, int err) static int ahash_def_finup_finish1(struct ahash_request *req, int err) { + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + if (err) goto out; - req->base.complete = ahash_def_finup_done2; + if (ahash_is_async(tfm)) + req->base.complete = ahash_def_finup_done2; - err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(req); + err = crypto_ahash_final(req); if (err == -EINPROGRESS || err == -EBUSY) return err; @@ -380,7 +590,7 @@ static void ahash_def_finup_done1(void *data, int err) struct ahash_request *areq; state = *state0; - areq = state.req; + areq = state.req0; if (err == -EINPROGRESS) goto out; @@ -396,14 +606,13 @@ static void ahash_def_finup_done1(void *data, int err) static int ahash_def_finup(struct ahash_request *req) { - struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); int err; err = ahash_save_req(req, ahash_def_finup_done1); if (err) return err; - err = crypto_ahash_alg(tfm)->update(req); + err = crypto_ahash_update(req); if (err == -EINPROGRESS || err == -EBUSY) return err; @@ -618,8 +827,6 @@ static int ahash_prepare_alg(struct ahash_alg *alg) base->cra_type = &crypto_ahash_type; base->cra_flags |= CRYPTO_ALG_TYPE_AHASH; - if (!alg->finup) - alg->finup = ahash_def_finup; if (!alg->setkey) alg->setkey = ahash_nosetkey; diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index 156de41ca760..c5df380c7d08 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -271,4 +271,14 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm *tfm) return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK; } +static inline bool crypto_request_chained(struct crypto_async_request *req) +{ + return req->flags & CRYPTO_TFM_REQ_CHAIN; +} + +static inline bool crypto_tfm_req_chain(struct crypto_tfm *tfm) +{ + return tfm->__crt_alg->cra_flags & CRYPTO_ALG_REQ_CHAIN; +} + #endif /* _CRYPTO_ALGAPI_H */ diff --git a/include/crypto/hash.h b/include/crypto/hash.h index 9c1f8ca59a77..de5e5dcd0c95 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -621,6 +621,7 @@ static inline void ahash_request_set_callback(struct ahash_request *req, { req->base.complete = compl; req->base.data = data; + flags &= ~CRYPTO_TFM_REQ_CHAIN; req->base.flags = flags; } @@ -646,6 +647,20 @@ static inline void ahash_request_set_crypt(struct ahash_request *req, req->result = result; } +static inline void ahash_reqchain_init(struct ahash_request *req, + u32 flags, crypto_completion_t compl, + void *data) +{ + ahash_request_set_callback(req, flags, compl, data); + crypto_reqchain_init(&req->base); +} + +static inline void ahash_request_chain(struct ahash_request *req, + struct ahash_request *head) +{ + crypto_request_chain(&req->base, &head->base); +} + /** * DOC: Synchronous Message Digest API * @@ -947,4 +962,14 @@ static inline void shash_desc_zero(struct shash_desc *desc) sizeof(*desc) + crypto_shash_descsize(desc->tfm)); } +static inline int ahash_request_err(struct ahash_request *req) +{ + return req->base.err; +} + +static inline bool ahash_is_async(struct crypto_ahash *tfm) +{ + return crypto_tfm_is_async(&tfm->base); +} + #endif /* _CRYPTO_HASH_H */ diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index 58967593b6b4..81542a48587e 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -270,5 +270,15 @@ static inline struct crypto_shash *__crypto_shash_cast(struct crypto_tfm *tfm) return container_of(tfm, struct crypto_shash, base); } +static inline bool ahash_request_chained(struct ahash_request *req) +{ + return crypto_request_chained(&req->base); +} + +static inline bool crypto_ahash_req_chain(struct crypto_ahash *tfm) +{ + return crypto_tfm_req_chain(&tfm->base); +} + #endif /* _CRYPTO_INTERNAL_HASH_H */ diff --git a/include/linux/crypto.h b/include/linux/crypto.h index b164da5e129e..6126c57b8452 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -13,6 +13,8 @@ #define _LINUX_CRYPTO_H #include +#include +#include #include #include #include @@ -124,6 +126,9 @@ */ #define CRYPTO_ALG_FIPS_INTERNAL 0x00020000 +/* Set if the algorithm supports request chains. */ +#define CRYPTO_ALG_REQ_CHAIN 0x00040000 + /* * Transform masks and values (for crt_flags). */ @@ -133,6 +138,7 @@ #define CRYPTO_TFM_REQ_FORBID_WEAK_KEYS 0x00000100 #define CRYPTO_TFM_REQ_MAY_SLEEP 0x00000200 #define CRYPTO_TFM_REQ_MAY_BACKLOG 0x00000400 +#define CRYPTO_TFM_REQ_CHAIN 0x00000800 /* * Miscellaneous stuff. @@ -174,6 +180,7 @@ struct crypto_async_request { struct crypto_tfm *tfm; u32 flags; + int err; }; /** @@ -540,5 +547,24 @@ int crypto_comp_decompress(struct crypto_comp *tfm, const u8 *src, unsigned int slen, u8 *dst, unsigned int *dlen); +static inline void crypto_reqchain_init(struct crypto_async_request *req) +{ + req->err = -EINPROGRESS; + req->flags |= CRYPTO_TFM_REQ_CHAIN; + INIT_LIST_HEAD(&req->list); +} + +static inline void crypto_request_chain(struct crypto_async_request *req, + struct crypto_async_request *head) +{ + req->err = -EINPROGRESS; + list_add_tail(&req->list, &head->list); +} + +static inline bool crypto_tfm_is_async(struct crypto_tfm *tfm) +{ + return tfm->__crt_alg->cra_flags & CRYPTO_ALG_ASYNC; +} + #endif /* _LINUX_CRYPTO_H */ From patchwork Sun Oct 27 09:45:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852461 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C28C26AD9 for ; Sun, 27 Oct 2024 09:45:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022342; cv=none; b=LLCYqKp+JETP/R3gBsByu3spEUAEDRvxCCpdwVuYejV1DYNlvxjAXrdzBsWlScfmulXkEkZ9WsatlTkuNThtXdLtW9yJcHhFBOoIE9Kh1K4toe8KlAaoJ6YeFqJjwc6pK8fhtINcd7siNMqJ2M25wO5JOL6PJdL8yAxAZ+IJmzs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022342; c=relaxed/simple; bh=99djdOC8D17HEhY9Rc4AhPgvRlZ8ekBNlImQ1Yu//iE=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=ehTlqFQZ9+cNIlA6/fBDyIvpZCyH8WvI4zYjGcuFweOyOMf0nNdA0wLXxt36kM4f/hr3TOssuJjnGvl4pFE0AvjywyYScItnjfJN1bw2o741fzEQfIuP4cTQuTDTdS/NyLOUpyKZCXDPCGzY/c25nHoNZrFH8zZaSZVLao3MaV8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=rYncYzzy; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="rYncYzzy" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=wwzyOCrJuiEyQbhuxXpFe0b2Srx5xC2kBn9ylrfo5KM=; b=rYncYzzyb18QiRiLjVEOt8tkWN alxofrr7W+ZfJBhphNIf7yUmjidzSSeIzC/XhLXJWhNz/UQvkcbWe2xbXfpVTO75GSr4tjE0pT2uR HTvW+uN141JO7+35JohEx5HlaO+zrygON3kQsFxbfOM9nZsYbT7uTBbYgFoeUqmXJgXe+IviD0y31 hhVwxtlDa89S8JoxyxSAhzNOptDPX5dwEaiK7oyjMC/lJc3x6diukFuHBZu4Lk9W3l5h0TSHJM/o5 5q1Hm1TESrUbflf+ETKxs/ehfc3P+cZDhZ9umI96jrbhHpORm7apYTas1zDdrGdqE+Y+Qe+6wuq6l i40aH+eA==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zq3-00CRGE-1e; Sun, 27 Oct 2024 17:45:36 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:35 +0800 Date: Sun, 27 Oct 2024 17:45:35 +0800 Message-Id: <658b601600b2c817ffba72a3c54a37ad12442674.1730021644.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [PATCH 3/6] crypto: tcrypt - Restore multibuffer ahash tests To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This patch is a revert of commit 388ac25efc8ce3bf9768ce7bf24268d6fac285d5. As multibuffer ahash is coming back in the form of request chaining, restore the multibuffer ahash tests using the new interface. Signed-off-by: Herbert Xu --- crypto/tcrypt.c | 227 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 227 insertions(+) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index e9e7dceb606e..f996e5e2c83a 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -716,6 +716,203 @@ static inline int do_one_ahash_op(struct ahash_request *req, int ret) return crypto_wait_req(ret, wait); } +struct test_mb_ahash_data { + struct scatterlist sg[XBUFSIZE]; + char result[64]; + struct ahash_request *req; + struct crypto_wait wait; + char *xbuf[XBUFSIZE]; +}; + +static inline int do_mult_ahash_op(struct test_mb_ahash_data *data, u32 num_mb, + int *rc) +{ + int i, err; + + /* Fire up a bunch of concurrent requests */ + err = crypto_ahash_digest(data[0].req); + + /* Wait for all requests to finish */ + err = crypto_wait_req(err, &data[0].wait); + + for (i = 0; i < num_mb; i++) { + rc[i] = ahash_request_err(data[i].req); + if (rc[i]) { + pr_info("concurrent request %d error %d\n", i, rc[i]); + err = rc[i]; + } + } + + return err; +} + +static int test_mb_ahash_jiffies(struct test_mb_ahash_data *data, int blen, + int secs, u32 num_mb) +{ + unsigned long start, end; + int bcount; + int ret = 0; + int *rc; + + rc = kcalloc(num_mb, sizeof(*rc), GFP_KERNEL); + if (!rc) + return -ENOMEM; + + for (start = jiffies, end = start + secs * HZ, bcount = 0; + time_before(jiffies, end); bcount++) { + ret = do_mult_ahash_op(data, num_mb, rc); + if (ret) + goto out; + } + + pr_cont("%d operations in %d seconds (%llu bytes)\n", + bcount * num_mb, secs, (u64)bcount * blen * num_mb); + +out: + kfree(rc); + return ret; +} + +static int test_mb_ahash_cycles(struct test_mb_ahash_data *data, int blen, + u32 num_mb) +{ + unsigned long cycles = 0; + int ret = 0; + int i; + int *rc; + + rc = kcalloc(num_mb, sizeof(*rc), GFP_KERNEL); + if (!rc) + return -ENOMEM; + + /* Warm-up run. */ + for (i = 0; i < 4; i++) { + ret = do_mult_ahash_op(data, num_mb, rc); + if (ret) + goto out; + } + + /* The real thing. */ + for (i = 0; i < 8; i++) { + cycles_t start, end; + + start = get_cycles(); + ret = do_mult_ahash_op(data, num_mb, rc); + end = get_cycles(); + + if (ret) + goto out; + + cycles += end - start; + } + + pr_cont("1 operation in %lu cycles (%d bytes)\n", + (cycles + 4) / (8 * num_mb), blen); + +out: + kfree(rc); + return ret; +} + +static void test_mb_ahash_speed(const char *algo, unsigned int secs, + struct hash_speed *speed, u32 num_mb) +{ + struct test_mb_ahash_data *data; + struct crypto_ahash *tfm; + unsigned int i, j, k; + int ret; + + data = kcalloc(num_mb, sizeof(*data), GFP_KERNEL); + if (!data) + return; + + tfm = crypto_alloc_ahash(algo, 0, 0); + if (IS_ERR(tfm)) { + pr_err("failed to load transform for %s: %ld\n", + algo, PTR_ERR(tfm)); + goto free_data; + } + + for (i = 0; i < num_mb; ++i) { + if (testmgr_alloc_buf(data[i].xbuf)) + goto out; + + crypto_init_wait(&data[i].wait); + + data[i].req = ahash_request_alloc(tfm, GFP_KERNEL); + if (!data[i].req) { + pr_err("alg: hash: Failed to allocate request for %s\n", + algo); + goto out; + } + + if (i) + ahash_request_chain(data[i].req, data[0].req); + else + ahash_reqchain_init(data[i].req, 0, crypto_req_done, + &data[i].wait); + + sg_init_table(data[i].sg, XBUFSIZE); + for (j = 0; j < XBUFSIZE; j++) { + sg_set_buf(data[i].sg + j, data[i].xbuf[j], PAGE_SIZE); + memset(data[i].xbuf[j], 0xff, PAGE_SIZE); + } + } + + pr_info("\ntesting speed of multibuffer %s (%s)\n", algo, + get_driver_name(crypto_ahash, tfm)); + + for (i = 0; speed[i].blen != 0; i++) { + /* For some reason this only tests digests. */ + if (speed[i].blen != speed[i].plen) + continue; + + if (speed[i].blen > XBUFSIZE * PAGE_SIZE) { + pr_err("template (%u) too big for tvmem (%lu)\n", + speed[i].blen, XBUFSIZE * PAGE_SIZE); + goto out; + } + + if (klen) + crypto_ahash_setkey(tfm, tvmem[0], klen); + + for (k = 0; k < num_mb; k++) + ahash_request_set_crypt(data[k].req, data[k].sg, + data[k].result, speed[i].blen); + + pr_info("test%3u " + "(%5u byte blocks,%5u bytes per update,%4u updates): ", + i, speed[i].blen, speed[i].plen, + speed[i].blen / speed[i].plen); + + if (secs) { + ret = test_mb_ahash_jiffies(data, speed[i].blen, secs, + num_mb); + cond_resched(); + } else { + ret = test_mb_ahash_cycles(data, speed[i].blen, num_mb); + } + + + if (ret) { + pr_err("At least one hashing failed ret=%d\n", ret); + break; + } + } + +out: + for (k = 0; k < num_mb; ++k) + ahash_request_free(data[k].req); + + for (k = 0; k < num_mb; ++k) + testmgr_free_buf(data[k].xbuf); + + crypto_free_ahash(tfm); + +free_data: + kfree(data); +} + static int test_ahash_jiffies_digest(struct ahash_request *req, int blen, char *out, int secs) { @@ -2395,6 +2592,36 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb) test_ahash_speed("sm3", sec, generic_hash_speed_template); if (mode > 400 && mode < 500) break; fallthrough; + case 450: + test_mb_ahash_speed("sha1", sec, generic_hash_speed_template, + num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; + case 451: + test_mb_ahash_speed("sha256", sec, generic_hash_speed_template, + num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; + case 452: + test_mb_ahash_speed("sha512", sec, generic_hash_speed_template, + num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; + case 453: + test_mb_ahash_speed("sm3", sec, generic_hash_speed_template, + num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; + case 454: + test_mb_ahash_speed("streebog256", sec, + generic_hash_speed_template, num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; + case 455: + test_mb_ahash_speed("streebog512", sec, + generic_hash_speed_template, num_mb); + if (mode > 400 && mode < 500) break; + fallthrough; case 499: break; From patchwork Sun Oct 27 09:45:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852464 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41A1541C6A for ; Sun, 27 Oct 2024 09:45:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022346; cv=none; b=E4/o6wLXWRxmTkfq+zfHWPDD2PGqBzAtHIHiG2tiEsDtXJuUoGNqrQbIrFKxsXWojt8+DAbwZFXFvsu7RokCjXe9DcggWiHE3rnxJmEIZVBQUAscRTSilrSLXnop3s19YW1dqdo/qvDm9HUUPub4kmu2IqyYNBsJUDD7j1vuXHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022346; c=relaxed/simple; bh=FXet8gP+cGJsB8p4qKG6a+lbOX/VG/m+NvoPM+kcYsE=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=qd/Vrs4PI6lvQsOrCMaUL1nzfqyBR4VnAedxVCsz0K8hAJswUQB9sHLVZU3WWXKFuklaKaAMM31M4jEkcWx7vw7FwWbR/XYfbee/OmUBA6twre/7Y3+eI8h+uklcZ5xq+nm5qmk0P4n0j6ssk2UFYvZEuIuoW9Z8ejT5y5sWxFg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=p2rXX4wk; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="p2rXX4wk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=uNFm5GoX96+6Fxr+Xlq+KjP/K3kJd8fj7wJWYdhYY24=; b=p2rXX4wkf5Jk+Tl5/JduoirOnF /gOtwjl3hAFn91xPb85EDspadnXqVFrE0ElqEiWREHMq3ZL+WxOh6GS4OeXuXXtdtYKZr1UYP8Gmo gfi9z+K6eF6Wiy6HM6OMivZFEpm6fePVHfncHt2gEZMK+n2GJMU1eqJjxFrJr3epqDEHBohOdFGlq xKiGe2MzRYuWJJmM1GFN6OtBLj4eI8hB6fnmf/KLgq54cw4pvZPHOfP3ueG9BZphg/3DogAXSmim/ m9ygErcdhcWKeepjD2JC07zXFLU9fgaUw6NhZLgHGElJz4lKPS3LjAEnQRtr58lShGNMZfstyCy8u AQkGtYdg==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zq5-00CRGR-2i; Sun, 27 Oct 2024 17:45:38 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:37 +0800 Date: Sun, 27 Oct 2024 17:45:37 +0800 Message-Id: In-Reply-To: References: From: Herbert Xu Subject: [PATCH 4/6] crypto: ahash - Add virtual address support To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This patch adds virtual address support to ahash. Virtual addresses were previously only supported through shash. The user may choose to use virtual addresses with ahash by calling ahash_request_set_virt instead of ahash_request_set_crypt. The API will take care of translating this to an SG list if necessary, unless the algorithm declares that it supports chaining. Therefore in order for an ahash algorithm to support chaining, it must also support virtual addresses directly. Signed-off-by: Herbert Xu --- crypto/ahash.c | 282 +++++++++++++++++++++++++++++---- include/crypto/hash.h | 39 ++++- include/crypto/internal/hash.h | 7 +- include/linux/crypto.h | 2 +- 4 files changed, 294 insertions(+), 36 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index e1ee18deca67..b1c468797990 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -34,11 +34,17 @@ struct ahash_save_req_state { int (*op)(struct ahash_request *req); crypto_completion_t compl; void *data; + struct scatterlist sg; + const u8 *src; + u8 *page; + unsigned int offset; + unsigned int nbytes; }; static void ahash_reqchain_done(void *data, int err); static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt); -static void ahash_restore_req(struct ahash_request *req); +static void ahash_restore_req(struct ahash_save_req_state *state); +static void ahash_def_finup_done1(void *data, int err); static int ahash_def_finup(struct ahash_request *req); /* @@ -100,6 +106,10 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc) unsigned int offset; int err; + if (ahash_request_isvirt(req)) + return crypto_shash_digest(desc, req->svirt, nbytes, + req->result); + if (nbytes && (sg = req->src, offset = sg->offset, nbytes <= min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset))) { @@ -182,6 +192,9 @@ static int hash_walk_new_entry(struct crypto_hash_walk *walk) int crypto_hash_walk_done(struct crypto_hash_walk *walk, int err) { + if ((walk->flags & CRYPTO_AHASH_REQ_VIRT)) + return err; + walk->data -= walk->offset; kunmap_local(walk->data); @@ -209,14 +222,20 @@ int crypto_hash_walk_first(struct ahash_request *req, struct crypto_hash_walk *walk) { walk->total = req->nbytes; + walk->entrylen = 0; - if (!walk->total) { - walk->entrylen = 0; + if (!walk->total) return 0; + + walk->flags = req->base.flags; + + if (ahash_request_isvirt(req)) { + walk->data = req->svirt; + walk->total = 0; + return req->nbytes; } walk->sg = req->src; - walk->flags = req->base.flags; return hash_walk_new_entry(walk); } @@ -264,20 +283,85 @@ int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key, } EXPORT_SYMBOL_GPL(crypto_ahash_setkey); +static bool ahash_request_hasvirt(struct ahash_request *req) +{ + struct ahash_request *r2; + + if (ahash_request_isvirt(req)) + return true; + + if (!ahash_request_chained(req)) + return false; + + list_for_each_entry(r2, &req->base.list, base.list) + if (ahash_request_isvirt(r2)) + return true; + + return false; +} + +static int ahash_reqchain_virt(struct ahash_save_req_state *state, + int err, u32 mask) +{ + struct ahash_request *req = state->cur; + + for (;;) { + unsigned len = state->nbytes; + + req->base.err = err; + + if (!state->offset) + break; + + if (state->offset == len || err) { + u8 *result = req->result; + + ahash_request_set_virt(req, state->src, result, len); + state->offset = 0; + break; + } + + len -= state->offset; + + len = min(PAGE_SIZE, len); + memcpy(state->page, state->src + state->offset, len); + state->offset += len; + req->nbytes = len; + + err = state->op(req); + if (err == -EINPROGRESS) { + if (!list_empty(&state->head) || + state->offset < state->nbytes) + err = -EBUSY; + break; + } + + if (err == -EBUSY) + break; + } + + return err; +} + static int ahash_reqchain_finish(struct ahash_save_req_state *state, int err, u32 mask) { struct ahash_request *req0 = state->req0; struct ahash_request *req = state->cur; + struct crypto_ahash *tfm; struct ahash_request *n; + bool update; - req->base.err = err; + err = ahash_reqchain_virt(state, err, mask); if (req == req0) INIT_LIST_HEAD(&req->base.list); else list_add_tail(&req->base.list, &req0->base.list); + tfm = crypto_ahash_reqtfm(req); + update = state->op == crypto_ahash_alg(tfm)->update; + list_for_each_entry_safe(req, n, &state->head, base.list) { list_del_init(&req->base.list); @@ -285,10 +369,27 @@ static int ahash_reqchain_finish(struct ahash_save_req_state *state, req->base.complete = ahash_reqchain_done; req->base.data = state; state->cur = req; + + if (update && ahash_request_isvirt(req) && req->nbytes) { + unsigned len = req->nbytes; + u8 *result = req->result; + + state->src = req->svirt; + state->nbytes = len; + + len = min(PAGE_SIZE, len); + + memcpy(state->page, req->svirt, len); + state->offset = len; + + ahash_request_set_crypt(req, &state->sg, result, len); + } + err = state->op(req); if (err == -EINPROGRESS) { - if (!list_empty(&state->head)) + if (!list_empty(&state->head) || + state->offset < state->nbytes) err = -EBUSY; goto out; } @@ -296,11 +397,14 @@ static int ahash_reqchain_finish(struct ahash_save_req_state *state, if (err == -EBUSY) goto out; - req->base.err = err; + err = ahash_reqchain_virt(state, err, mask); + if (err == -EINPROGRESS || err == -EBUSY) + goto out; + list_add_tail(&req->base.list, &req0->base.list); } - ahash_restore_req(req0); + ahash_restore_req(state); out: return err; @@ -314,7 +418,7 @@ static void ahash_reqchain_done(void *data, int err) data = state->data; if (err == -EINPROGRESS) { - if (!list_empty(&state->head)) + if (!list_empty(&state->head) || state->offset < state->nbytes) return; goto notify; } @@ -331,41 +435,84 @@ static int ahash_do_req_chain(struct ahash_request *req, int (*op)(struct ahash_request *req)) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + bool update = op == crypto_ahash_alg(tfm)->update; struct ahash_save_req_state *state; struct ahash_save_req_state state0; + struct ahash_request *r2; + u8 *page = NULL; int err; - if (!ahash_request_chained(req) || list_empty(&req->base.list) || - crypto_ahash_req_chain(tfm)) + if (crypto_ahash_req_chain(tfm) || + ((!ahash_request_chained(req) || list_empty(&req->base.list)) && + (!update || !ahash_request_isvirt(req)))) return op(req); - state = &state0; + if (update && ahash_request_hasvirt(req)) { + gfp_t gfp; + u32 flags; + flags = ahash_request_flags(req); + gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? + GFP_KERNEL : GFP_ATOMIC; + page = (void *)__get_free_page(gfp); + err = -ENOMEM; + if (!page) + goto out_set_chain; + } + + state = &state0; if (ahash_is_async(tfm)) { err = ahash_save_req(req, ahash_reqchain_done); - if (err) { - struct ahash_request *r2; - - req->base.err = err; - list_for_each_entry(r2, &req->base.list, base.list) - r2->base.err = err; - - return err; - } + if (err) + goto out_free_page; state = req->base.data; } state->op = op; state->cur = req; + state->page = page; + state->offset = 0; + state->nbytes = 0; INIT_LIST_HEAD(&state->head); list_splice(&req->base.list, &state->head); + if (page) + sg_init_one(&state->sg, page, PAGE_SIZE); + + if (update && ahash_request_isvirt(req) && req->nbytes) { + unsigned len = req->nbytes; + u8 *result = req->result; + + state->src = req->svirt; + state->nbytes = len; + + len = min(PAGE_SIZE, len); + + memcpy(page, req->svirt, len); + state->offset = len; + + ahash_request_set_crypt(req, &state->sg, result, len); + } + err = op(req); if (err == -EBUSY || err == -EINPROGRESS) return -EBUSY; return ahash_reqchain_finish(state, err, ~0); + +out_free_page: + if (page) { + memset(page, 0, PAGE_SIZE); + free_page((unsigned long)page); + } + +out_set_chain: + req->base.err = err; + list_for_each_entry(r2, &req->base.list, base.list) + r2->base.err = err; + + return err; } int crypto_ahash_init(struct ahash_request *req) @@ -419,15 +566,19 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) req->base.complete = cplt; req->base.data = state; state->req0 = req; + state->page = NULL; return 0; } -static void ahash_restore_req(struct ahash_request *req) +static void ahash_restore_req(struct ahash_save_req_state *state) { - struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); - struct ahash_save_req_state *state; + struct ahash_request *req = state->req0; + struct crypto_ahash *tfm; + free_page((unsigned long)state->page); + + tfm = crypto_ahash_reqtfm(req); if (!ahash_is_async(tfm)) return; @@ -515,13 +666,74 @@ int crypto_ahash_finup(struct ahash_request *req) return 0; } - if (!crypto_ahash_alg(tfm)->finup) + if (!crypto_ahash_alg(tfm)->finup || + (!crypto_ahash_req_chain(tfm) && ahash_request_hasvirt(req))) return ahash_def_finup(req); return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->finup); } EXPORT_SYMBOL_GPL(crypto_ahash_finup); +static int ahash_def_digest_finish(struct ahash_save_req_state *state, int err) +{ + struct ahash_request *req = state->req0; + struct crypto_ahash *tfm; + + if (err) + goto out; + + tfm = crypto_ahash_reqtfm(req); + if (ahash_is_async(tfm)) + req->base.complete = ahash_def_finup_done1; + + err = crypto_ahash_update(req); + if (err == -EINPROGRESS || err == -EBUSY) + return err; + +out: + ahash_restore_req(state); + return err; +} + +static void ahash_def_digest_done(void *data, int err) +{ + struct ahash_save_req_state *state0 = data; + struct ahash_save_req_state state; + struct ahash_request *areq; + + state = *state0; + areq = state.req0; + if (err == -EINPROGRESS) + goto out; + + areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + + err = ahash_def_digest_finish(state0, err); + if (err == -EINPROGRESS || err == -EBUSY) + return; + +out: + state.compl(state.data, err); +} + +static int ahash_def_digest(struct ahash_request *req) +{ + struct ahash_save_req_state *state; + int err; + + err = ahash_save_req(req, ahash_def_digest_done); + if (err) + return err; + + state = req->base.data; + + err = crypto_ahash_init(req); + if (err == -EINPROGRESS || err == -EBUSY) + return err; + + return ahash_def_digest_finish(state, err); +} + int crypto_ahash_digest(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); @@ -545,6 +757,9 @@ int crypto_ahash_digest(struct ahash_request *req) return 0; } + if (!crypto_ahash_req_chain(tfm) && ahash_request_hasvirt(req)) + return ahash_def_digest(req); + if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; @@ -560,17 +775,19 @@ static void ahash_def_finup_done2(void *data, int err) if (err == -EINPROGRESS) return; - ahash_restore_req(areq); + ahash_restore_req(state); ahash_request_complete(areq, err); } -static int ahash_def_finup_finish1(struct ahash_request *req, int err) +static int ahash_def_finup_finish1(struct ahash_save_req_state *state, int err) { - struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct ahash_request *req = state->req0; + struct crypto_ahash *tfm; if (err) goto out; + tfm = crypto_ahash_reqtfm(req); if (ahash_is_async(tfm)) req->base.complete = ahash_def_finup_done2; @@ -579,7 +796,7 @@ static int ahash_def_finup_finish1(struct ahash_request *req, int err) return err; out: - ahash_restore_req(req); + ahash_restore_req(state); return err; } @@ -596,7 +813,7 @@ static void ahash_def_finup_done1(void *data, int err) areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; - err = ahash_def_finup_finish1(areq, err); + err = ahash_def_finup_finish1(state0, err); if (err == -EINPROGRESS || err == -EBUSY) return; @@ -606,17 +823,20 @@ static void ahash_def_finup_done1(void *data, int err) static int ahash_def_finup(struct ahash_request *req) { + struct ahash_save_req_state *state; int err; err = ahash_save_req(req, ahash_def_finup_done1); if (err) return err; + state = req->base.data; + err = crypto_ahash_update(req); if (err == -EINPROGRESS || err == -EBUSY) return err; - return ahash_def_finup_finish1(req, err); + return ahash_def_finup_finish1(state, err); } int crypto_ahash_export(struct ahash_request *req, void *out) diff --git a/include/crypto/hash.h b/include/crypto/hash.h index de5e5dcd0c95..0cdadd48d068 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -12,6 +12,9 @@ #include #include +/* Set this bit for virtual address instead of SG list. */ +#define CRYPTO_AHASH_REQ_VIRT 0x00000001 + struct crypto_ahash; /** @@ -52,7 +55,10 @@ struct ahash_request { struct crypto_async_request base; unsigned int nbytes; - struct scatterlist *src; + union { + struct scatterlist *src; + const u8 *svirt; + }; u8 *result; void *__ctx[] CRYPTO_MINALIGN_ATTR; @@ -619,10 +625,13 @@ static inline void ahash_request_set_callback(struct ahash_request *req, crypto_completion_t compl, void *data) { + u32 keep = CRYPTO_TFM_REQ_CHAIN | CRYPTO_AHASH_REQ_VIRT; + req->base.complete = compl; req->base.data = data; - flags &= ~CRYPTO_TFM_REQ_CHAIN; - req->base.flags = flags; + flags &= ~keep; + req->base.flags &= ~keep; + req->base.flags |= flags; } /** @@ -645,6 +654,30 @@ static inline void ahash_request_set_crypt(struct ahash_request *req, req->src = src; req->nbytes = nbytes; req->result = result; + req->base.flags &= ~CRYPTO_AHASH_REQ_VIRT; +} + +/** + * ahash_request_set_virt() - set virtual address data buffers + * @req: ahash_request handle to be updated + * @src: source virtual address + * @result: buffer that is filled with the message digest -- the caller must + * ensure that the buffer has sufficient space by, for example, calling + * crypto_ahash_digestsize() + * @nbytes: number of bytes to process from the source virtual address + * + * By using this call, the caller references the source virtual address. + * The source virtual address points to the data the message digest is to + * be calculated for. + */ +static inline void ahash_request_set_virt(struct ahash_request *req, + const u8 *src, u8 *result, + unsigned int nbytes) +{ + req->svirt = src; + req->nbytes = nbytes; + req->result = result; + req->base.flags |= CRYPTO_AHASH_REQ_VIRT; } static inline void ahash_reqchain_init(struct ahash_request *req, diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index 81542a48587e..195d6aeeede3 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -15,7 +15,7 @@ struct ahash_request; struct scatterlist; struct crypto_hash_walk { - char *data; + const char *data; unsigned int offset; unsigned int flags; @@ -275,6 +275,11 @@ static inline bool ahash_request_chained(struct ahash_request *req) return crypto_request_chained(&req->base); } +static inline bool ahash_request_isvirt(struct ahash_request *req) +{ + return req->base.flags & CRYPTO_AHASH_REQ_VIRT; +} + static inline bool crypto_ahash_req_chain(struct crypto_ahash *tfm) { return crypto_tfm_req_chain(&tfm->base); diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 6126c57b8452..55fd77658b3e 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -126,7 +126,7 @@ */ #define CRYPTO_ALG_FIPS_INTERNAL 0x00020000 -/* Set if the algorithm supports request chains. */ +/* Set if the algorithm supports request chains and virtual addresses. */ #define CRYPTO_ALG_REQ_CHAIN 0x00040000 /* From patchwork Sun Oct 27 09:45:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852465 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9C9CB67E for ; Sun, 27 Oct 2024 09:45:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022347; cv=none; b=Kromnrpr36aOO/IFfvlvYZ7rU8xMFgwY74PtpsT6X7Ua6bJTftmX03PJsliljzCNoTFrHx2VcdOvOALRXH5bS2GwGHkEmEry5FXec9Qz4WayWuByFIzzx6DTDrQVmqoybAv50zOCMW/bJGuBCoX1XwnQQe23CPMzieTOXN3j7es= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022347; c=relaxed/simple; bh=t7lDNliTpNt8+tVIak/WAQRRXRLpgvq5Ul7Q8xb3jRY=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=YyYbRcoeRGrgwfJIXm3WNQB1DYChvnVewsNYRQtsuGkRvZ/X+rLq4Y4QcuxmPMvI2JP04bGHaglc/DWJpTWOMQFJSdD357ac2c0K6UOV/EBYD0gcCO4wn4casEZd9jOTSC0JypNPzXOHgi0icCHdUkE+2y1NBf3+P4q3fKwA4gQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=qgYxLnd+; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="qgYxLnd+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=UQYUuWSNMpohmWfeqoh2kdhr+Z3O839+gSASIhldFMc=; b=qgYxLnd+9ZM3CSCdVtFskbmcjj eFI7nlP3+8opNA6PhDRfKmRF2Il5G1FG31++47CQOnFd5NKwpknT2zZW5lkxc8IuXntDDqrIyOxmS zFeuUyeKfe76nEQFRIBxioYeFwJYtRlSjyVPPIRiO8Wer81YkmkhCi1yoHLKyjbZxJdMmcYV/D2BL Xp0ICPuAw087Ayy2fyAGc1I/tCjO+6hfbIyzOtNu8oBoD7PMyntz/oBTlSelmiHX2CPXyw3OimF1P cpGtUAxyZpPks50Y2KWmAtg0O5tmmt8/TZMSRCGQecVmO4pJ0XelsTYX9H9g1Zzmz8EBAJ1ud6Gsn 67PLIUSA==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zq8-00CRGd-0Y; Sun, 27 Oct 2024 17:45:41 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:40 +0800 Date: Sun, 27 Oct 2024 17:45:40 +0800 Message-Id: <7e696eecae80b88dd374b75d5f59a8b24debc575.1730021644.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [PATCH 5/6] crypto: ahash - Set default reqsize from ahash_alg To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add a reqsize field to struct ahash_alg and use it to set the default reqsize so that algorithms with a static reqsize are not forced to create an init_tfm function. Signed-off-by: Herbert Xu --- crypto/ahash.c | 4 ++++ include/crypto/hash.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/crypto/ahash.c b/crypto/ahash.c index b1c468797990..52dd45d6aeab 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -875,6 +875,7 @@ static int crypto_ahash_init_tfm(struct crypto_tfm *tfm) struct ahash_alg *alg = crypto_ahash_alg(hash); crypto_ahash_set_statesize(hash, alg->halg.statesize); + crypto_ahash_set_reqsize(hash, alg->reqsize); if (tfm->__crt_alg->cra_type == &crypto_shash_type) return crypto_init_ahash_using_shash(tfm); @@ -1040,6 +1041,9 @@ static int ahash_prepare_alg(struct ahash_alg *alg) if (alg->halg.statesize == 0) return -EINVAL; + if (alg->reqsize && alg->reqsize < alg->halg.statesize) + return -EINVAL; + err = hash_prepare_alg(&alg->halg); if (err) return err; diff --git a/include/crypto/hash.h b/include/crypto/hash.h index 0cdadd48d068..db232070a317 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -135,6 +135,7 @@ struct ahash_request { * This is a counterpart to @init_tfm, used to remove * various changes set in @init_tfm. * @clone_tfm: Copy transform into new object, may allocate memory. + * @reqsize: Size of the request context. * @halg: see struct hash_alg_common */ struct ahash_alg { @@ -151,6 +152,8 @@ struct ahash_alg { void (*exit_tfm)(struct crypto_ahash *tfm); int (*clone_tfm)(struct crypto_ahash *dst, struct crypto_ahash *src); + unsigned int reqsize; + struct hash_alg_common halg; }; From patchwork Sun Oct 27 09:45:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13852466 Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B87D137C2A for ; Sun, 27 Oct 2024 09:45:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022352; cv=none; b=VH6mQi6i337a34Cdr5ndjHvM26sAKQ8trbcDKoBpGeuruyuh2umnvxxsNb50iVW259EuVHWHhTfp/p/ZXyABtf2X3ZgZgU3+x3fui4+6rho27PmnpoED6eaeTzeXLh2C1LNjnnSHrGw12Tm8G1drfRUOsMw7XC9OSpzETyMg9zI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730022352; c=relaxed/simple; bh=RfAX0Wsgcbp/sHHtdZEhdEkDk0hM6W8P/XqlqZ0yqWY=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To:Cc; b=KmCJmoXW4+GcF/6vZkR8+95IfPacRmX+FlkHDS/xU/4E085J9rthxdQvtxty/imo1qzxP2db3QWxhF/CXJRBuotrDJ938ejVLinyTuYqJ2JqBxWeAgptnIpv1Dd0LQC6usxSsvNbpzd+rp3SJYxVwCjNfUQgAcpuJMoliLHMRlY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b=V+NONlJl; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hmeau.com header.i=@hmeau.com header.b="V+NONlJl" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hmeau.com; s=formenos; h=Cc:To:Subject:From:References:In-Reply-To:Message-Id:Date: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=28CgZjOqiDG9JuR8aBk5qbdDYKLQJ8p5ymrh++xYmYA=; b=V+NONlJlvGP3OWGCSVnRG3R0Tn DiXlIGVES5B5GOhd2Fg/15TiC8AColPySQzSNXs70IIMcon1i5fs0lSNdjMof8mzJNDUd+Qx1uTJQ jRmeKhrBDeAFNSKLh0LKdy+7+lk/B4q9d4gERLQd/lFcBNakKMMpbxplDG4Ss5xtkBUAlqUHLR+mB XlUtVYtMGIMwFWlvzQi2qFudBh4UcxGUXs3NF4wvC8zaM6tSzMSaYXgn58rdAHWIYfd799UAdgZfO QdtN9C60/2wIevkQcnWeV356B8s2l3rCkYxcyMZURPrbkBwkCgd8YbG5d6ZhkSkVG1l7d7ryaC+81 bsZM+OMw==; Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1t4zqA-00CRGs-1g; Sun, 27 Oct 2024 17:45:43 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 27 Oct 2024 17:45:42 +0800 Date: Sun, 27 Oct 2024 17:45:42 +0800 Message-Id: <6fc95eb867115e898fb6cca4a9470d147a5587bd.1730021644.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [PATCH 6/6] crypto: x86/sha2 - Restore multibuffer AVX2 support To: Linux Crypto Mailing List Cc: Eric Biggers , Ard Biesheuvel , Megha Dey , Tim Chen Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Resurrect the old multibuffer AVX2 code removed by commit ab8085c130ed ("crypto: x86 - remove SHA multibuffer routines and mcryptd") using the new request chaining interface. This is purely a proof of concept and only meant to illustrate the utility of the new API rather than a serious attempt at improving the performance. However, it is interesting to note that with x8 multibuffer the performance of AVX2 is on par with SHA-NI. testing speed of multibuffer sha256 (sha256-avx2) tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 1 operation in 184 cycles (16 bytes) tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1 operation in 165 cycles (64 bytes) tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 1 operation in 444 cycles (256 bytes) tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 1 operation in 1549 cycles (1024 bytes) tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 1 operation in 3060 cycles (2048 bytes) tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 1 operation in 5983 cycles (4096 bytes) tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1 operation in 11980 cycles (8192 bytes) tcrypt: testing speed of async sha256 (sha256-avx2) tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 475 cycles/operation, 29 cycles/byte tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 780 cycles/operation, 12 cycles/byte tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 1872 cycles/operation, 7 cycles/byte tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 5416 cycles/operation, 5 cycles/byte tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 10339 cycles/operation, 5 cycles/byte tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 20214 cycles/operation, 4 cycles/byte tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 40042 cycles/operation, 4 cycles/byte tcrypt: testing speed of async sha256-ni (sha256-ni) tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 207 cycles/operation, 12 cycles/byte tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 299 cycles/operation, 4 cycles/byte tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 543 cycles/operation, 2 cycles/byte tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 1523 cycles/operation, 1 cycles/byte tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 2835 cycles/operation, 1 cycles/byte tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 5459 cycles/operation, 1 cycles/byte tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 10724 cycles/operation, 1 cycles/byte Signed-off-by: Herbert Xu --- arch/x86/crypto/Makefile | 2 +- arch/x86/crypto/sha256_mb_mgr_datastruct.S | 304 +++++++++++ arch/x86/crypto/sha256_ssse3_glue.c | 523 ++++++++++++++++-- arch/x86/crypto/sha256_x8_avx2.S | 596 +++++++++++++++++++++ 4 files changed, 1382 insertions(+), 43 deletions(-) create mode 100644 arch/x86/crypto/sha256_mb_mgr_datastruct.S create mode 100644 arch/x86/crypto/sha256_x8_avx2.S diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 53b4a277809e..e60abbfb6467 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -60,7 +60,7 @@ sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o -sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o sha256_ssse3_glue.o +sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o sha256_ssse3_glue.o sha256_x8_avx2.o sha256-ssse3-$(CONFIG_AS_SHA256_NI) += sha256_ni_asm.o obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o diff --git a/arch/x86/crypto/sha256_mb_mgr_datastruct.S b/arch/x86/crypto/sha256_mb_mgr_datastruct.S new file mode 100644 index 000000000000..5c377bac21d0 --- /dev/null +++ b/arch/x86/crypto/sha256_mb_mgr_datastruct.S @@ -0,0 +1,304 @@ +/* + * Header file for multi buffer SHA256 algorithm data structure + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * Megha Dey + * + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +# Macros for defining data structures + +# Usage example + +#START_FIELDS # JOB_AES +### name size align +#FIELD _plaintext, 8, 8 # pointer to plaintext +#FIELD _ciphertext, 8, 8 # pointer to ciphertext +#FIELD _IV, 16, 8 # IV +#FIELD _keys, 8, 8 # pointer to keys +#FIELD _len, 4, 4 # length in bytes +#FIELD _status, 4, 4 # status enumeration +#FIELD _user_data, 8, 8 # pointer to user data +#UNION _union, size1, align1, \ +# size2, align2, \ +# size3, align3, \ +# ... +#END_FIELDS +#%assign _JOB_AES_size _FIELD_OFFSET +#%assign _JOB_AES_align _STRUCT_ALIGN + +######################################################################### + +# Alternate "struc-like" syntax: +# STRUCT job_aes2 +# RES_Q .plaintext, 1 +# RES_Q .ciphertext, 1 +# RES_DQ .IV, 1 +# RES_B .nested, _JOB_AES_SIZE, _JOB_AES_ALIGN +# RES_U .union, size1, align1, \ +# size2, align2, \ +# ... +# ENDSTRUCT +# # Following only needed if nesting +# %assign job_aes2_size _FIELD_OFFSET +# %assign job_aes2_align _STRUCT_ALIGN +# +# RES_* macros take a name, a count and an optional alignment. +# The count in in terms of the base size of the macro, and the +# default alignment is the base size. +# The macros are: +# Macro Base size +# RES_B 1 +# RES_W 2 +# RES_D 4 +# RES_Q 8 +# RES_DQ 16 +# RES_Y 32 +# RES_Z 64 +# +# RES_U defines a union. It's arguments are a name and two or more +# pairs of "size, alignment" +# +# The two assigns are only needed if this structure is being nested +# within another. Even if the assigns are not done, one can still use +# STRUCT_NAME_size as the size of the structure. +# +# Note that for nesting, you still need to assign to STRUCT_NAME_size. +# +# The differences between this and using "struc" directly are that each +# type is implicitly aligned to its natural length (although this can be +# over-ridden with an explicit third parameter), and that the structure +# is padded at the end to its overall alignment. +# + +######################################################################### + +#ifndef _DATASTRUCT_ASM_ +#define _DATASTRUCT_ASM_ + +#define SZ8 8*SHA256_DIGEST_WORD_SIZE +#define ROUNDS 64*SZ8 +#define PTR_SZ 8 +#define SHA256_DIGEST_WORD_SIZE 4 +#define MAX_SHA256_LANES 8 +#define SHA256_DIGEST_WORDS 8 +#define SHA256_DIGEST_ROW_SIZE (MAX_SHA256_LANES * SHA256_DIGEST_WORD_SIZE) +#define SHA256_DIGEST_SIZE (SHA256_DIGEST_ROW_SIZE * SHA256_DIGEST_WORDS) +#define SHA256_BLK_SZ 64 + +# START_FIELDS +.macro START_FIELDS + _FIELD_OFFSET = 0 + _STRUCT_ALIGN = 0 +.endm + +# FIELD name size align +.macro FIELD name size align + _FIELD_OFFSET = (_FIELD_OFFSET + (\align) - 1) & (~ ((\align)-1)) + \name = _FIELD_OFFSET + _FIELD_OFFSET = _FIELD_OFFSET + (\size) +.if (\align > _STRUCT_ALIGN) + _STRUCT_ALIGN = \align +.endif +.endm + +# END_FIELDS +.macro END_FIELDS + _FIELD_OFFSET = (_FIELD_OFFSET + _STRUCT_ALIGN-1) & (~ (_STRUCT_ALIGN-1)) +.endm + +######################################################################## + +.macro STRUCT p1 +START_FIELDS +.struc \p1 +.endm + +.macro ENDSTRUCT + tmp = _FIELD_OFFSET + END_FIELDS + tmp = (_FIELD_OFFSET - %%tmp) +.if (tmp > 0) + .lcomm tmp +.endif +.endstruc +.endm + +## RES_int name size align +.macro RES_int p1 p2 p3 + name = \p1 + size = \p2 + align = .\p3 + + _FIELD_OFFSET = (_FIELD_OFFSET + (align) - 1) & (~ ((align)-1)) +.align align +.lcomm name size + _FIELD_OFFSET = _FIELD_OFFSET + (size) +.if (align > _STRUCT_ALIGN) + _STRUCT_ALIGN = align +.endif +.endm + +# macro RES_B name, size [, align] +.macro RES_B _name, _size, _align=1 +RES_int _name _size _align +.endm + +# macro RES_W name, size [, align] +.macro RES_W _name, _size, _align=2 +RES_int _name 2*(_size) _align +.endm + +# macro RES_D name, size [, align] +.macro RES_D _name, _size, _align=4 +RES_int _name 4*(_size) _align +.endm + +# macro RES_Q name, size [, align] +.macro RES_Q _name, _size, _align=8 +RES_int _name 8*(_size) _align +.endm + +# macro RES_DQ name, size [, align] +.macro RES_DQ _name, _size, _align=16 +RES_int _name 16*(_size) _align +.endm + +# macro RES_Y name, size [, align] +.macro RES_Y _name, _size, _align=32 +RES_int _name 32*(_size) _align +.endm + +# macro RES_Z name, size [, align] +.macro RES_Z _name, _size, _align=64 +RES_int _name 64*(_size) _align +.endm + +#endif + + +######################################################################## +#### Define SHA256 Out Of Order Data Structures +######################################################################## + +START_FIELDS # LANE_DATA +### name size align +FIELD _job_in_lane, 8, 8 # pointer to job object +END_FIELDS + + _LANE_DATA_size = _FIELD_OFFSET + _LANE_DATA_align = _STRUCT_ALIGN + +######################################################################## + +START_FIELDS # SHA256_ARGS_X4 +### name size align +FIELD _digest, 4*8*8, 4 # transposed digest +FIELD _data_ptr, 8*8, 8 # array of pointers to data +END_FIELDS + + _SHA256_ARGS_X4_size = _FIELD_OFFSET + _SHA256_ARGS_X4_align = _STRUCT_ALIGN + _SHA256_ARGS_X8_size = _FIELD_OFFSET + _SHA256_ARGS_X8_align = _STRUCT_ALIGN + +####################################################################### + +START_FIELDS # MB_MGR +### name size align +FIELD _args, _SHA256_ARGS_X4_size, _SHA256_ARGS_X4_align +FIELD _lens, 4*8, 8 +FIELD _unused_lanes, 8, 8 +FIELD _ldata, _LANE_DATA_size*8, _LANE_DATA_align +END_FIELDS + + _MB_MGR_size = _FIELD_OFFSET + _MB_MGR_align = _STRUCT_ALIGN + +_args_digest = _args + _digest +_args_data_ptr = _args + _data_ptr + +####################################################################### + +START_FIELDS #STACK_FRAME +### name size align +FIELD _data, 16*SZ8, 1 # transposed digest +FIELD _digest, 8*SZ8, 1 # array of pointers to data +FIELD _ytmp, 4*SZ8, 1 +FIELD _rsp, 8, 1 +END_FIELDS + + _STACK_FRAME_size = _FIELD_OFFSET + _STACK_FRAME_align = _STRUCT_ALIGN + +####################################################################### + +######################################################################## +#### Define constants +######################################################################## + +#define STS_UNKNOWN 0 +#define STS_BEING_PROCESSED 1 +#define STS_COMPLETED 2 + +######################################################################## +#### Define JOB_SHA256 structure +######################################################################## + +START_FIELDS # JOB_SHA256 + +### name size align +FIELD _buffer, 8, 8 # pointer to buffer +FIELD _len, 8, 8 # length in bytes +FIELD _result_digest, 8*4, 32 # Digest (output) +FIELD _status, 4, 4 +FIELD _user_data, 8, 8 +END_FIELDS + + _JOB_SHA256_size = _FIELD_OFFSET + _JOB_SHA256_align = _STRUCT_ALIGN diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c index e04a43d9f7d5..78055bd78b31 100644 --- a/arch/x86/crypto/sha256_ssse3_glue.c +++ b/arch/x86/crypto/sha256_ssse3_glue.c @@ -41,8 +41,24 @@ #include #include +struct sha256_x8_mbctx { + u32 state[8][8]; + const u8 *input[8]; +}; + +struct sha256_reqctx { + struct sha256_state state; + struct crypto_hash_walk walk; + const u8 *input; + unsigned int total; + unsigned int next; +}; + asmlinkage void sha256_transform_ssse3(struct sha256_state *state, const u8 *data, int blocks); +asmlinkage void sha256_transform_rorx(struct sha256_state *state, + const u8 *data, int blocks); +asmlinkage void sha256_x8_avx2(struct sha256_x8_mbctx *mbctx, int blocks); static const struct x86_cpu_id module_cpu_ids[] = { #ifdef CONFIG_AS_SHA256_NI @@ -55,14 +71,69 @@ static const struct x86_cpu_id module_cpu_ids[] = { }; MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids); -static int _sha256_update(struct shash_desc *desc, const u8 *data, - unsigned int len, sha256_block_fn *sha256_xform) +static int sha256_import(struct ahash_request *req, const void *in) { - struct sha256_state *sctx = shash_desc_ctx(desc); + struct sha256_reqctx *rctx = ahash_request_ctx(req); + memcpy(&rctx->state, in, sizeof(rctx->state)); + return 0; +} + +static int sha256_export(struct ahash_request *req, void *out) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + + memcpy(out, &rctx->state, sizeof(rctx->state)); + return 0; +} + +static int sha256_ahash_init(struct ahash_request *req) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct ahash_request *r2; + + sha256_init(&rctx->state); + + if (!ahash_request_chained(req)) + return 0; + + req->base.err = 0; + list_for_each_entry(r2, &req->base.list, base.list) { + r2->base.err = 0; + rctx = ahash_request_ctx(r2); + sha256_init(&rctx->state); + } + + return 0; +} + +static int sha224_ahash_init(struct ahash_request *req) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct ahash_request *r2; + + sha224_init(&rctx->state); + + if (!ahash_request_chained(req)) + return 0; + + req->base.err = 0; + list_for_each_entry(r2, &req->base.list, base.list) { + rctx = ahash_request_ctx(r2); + sha224_init(&rctx->state); + } + + return 0; +} + +static void __sha256_update(struct sha256_state *sctx, const u8 *data, + unsigned int len, sha256_block_fn *sha256_xform) +{ if (!crypto_simd_usable() || - (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE) - return crypto_sha256_update(desc, data, len); + (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE) { + sha256_update(sctx, data, len); + return; + } /* * Make sure struct sha256_state begins directly with the SHA256 @@ -71,25 +142,97 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data, BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); kernel_fpu_begin(); - sha256_base_do_update(desc, data, len, sha256_xform); + lib_sha256_base_do_update(sctx, data, len, sha256_xform); + kernel_fpu_end(); +} + +static int _sha256_update(struct shash_desc *desc, const u8 *data, + unsigned int len, sha256_block_fn *sha256_xform) +{ + __sha256_update(shash_desc_ctx(desc), data, len, sha256_xform); + return 0; +} + +static int sha256_ahash_update(struct ahash_request *req, + sha256_block_fn *sha256_xform) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct crypto_hash_walk *walk = &rctx->walk; + struct sha256_state *state = &rctx->state; + int nbytes; + + /* + * Make sure struct sha256_state begins directly with the SHA256 + * 256-bit internal state, as this is what the asm functions expect. + */ + BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); + + for (nbytes = crypto_hash_walk_first(req, walk); nbytes > 0; + nbytes = crypto_hash_walk_done(walk, 0)) + __sha256_update(state, walk->data, nbytes, sha256_transform_rorx); + + return nbytes; +} + +static void _sha256_finup(struct sha256_state *state, const u8 *data, + unsigned int len, u8 *out, unsigned int ds, + sha256_block_fn *sha256_xform) +{ + if (!crypto_simd_usable()) { + sha256_update(state, data, len); + if (ds == SHA224_DIGEST_SIZE) + sha224_final(state, out); + else + sha256_final(state, out); + return; + } + + kernel_fpu_begin(); + if (len) + lib_sha256_base_do_update(state, data, len, sha256_xform); + lib_sha256_base_do_finalize(state, sha256_xform); kernel_fpu_end(); - return 0; + lib_sha256_base_finish(state, out, ds); +} + +static int sha256_ahash_finup(struct ahash_request *req, + sha256_block_fn *sha256_xform) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct crypto_hash_walk *walk = &rctx->walk; + struct sha256_state *state = &rctx->state; + unsigned int ds; + int nbytes; + + ds = crypto_ahash_digestsize(crypto_ahash_reqtfm(req)); + if (!req->nbytes) { + _sha256_finup(state, NULL, 0, req->result, + ds, sha256_transform_rorx); + return 0; + } + + for (nbytes = crypto_hash_walk_first(req, walk); nbytes > 0; + nbytes = crypto_hash_walk_done(walk, 0)) { + if (crypto_hash_walk_last(walk)) { + _sha256_finup(state, walk->data, nbytes, req->result, + ds, sha256_transform_rorx); + continue; + } + + __sha256_update(state, walk->data, nbytes, sha256_transform_rorx); + } + + return nbytes; } static int sha256_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out, sha256_block_fn *sha256_xform) { - if (!crypto_simd_usable()) - return crypto_sha256_finup(desc, data, len, out); + unsigned int ds = crypto_shash_digestsize(desc->tfm); - kernel_fpu_begin(); - if (len) - sha256_base_do_update(desc, data, len, sha256_xform); - sha256_base_do_finalize(desc, sha256_xform); - kernel_fpu_end(); - - return sha256_base_finish(desc, out); + _sha256_finup(shash_desc_ctx(desc), data, len, out, ds, sha256_xform); + return 0; } static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data, @@ -247,61 +390,357 @@ static void unregister_sha256_avx(void) ARRAY_SIZE(sha256_avx_algs)); } -asmlinkage void sha256_transform_rorx(struct sha256_state *state, - const u8 *data, int blocks); - -static int sha256_avx2_update(struct shash_desc *desc, const u8 *data, - unsigned int len) +static int sha256_pad2(struct ahash_request *req) { - return _sha256_update(desc, data, len, sha256_transform_rorx); + const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64); + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct sha256_state *state = &rctx->state; + unsigned int partial = state->count; + __be64 *bits; + + if (rctx->total) + return 0; + + rctx->total = 1; + + partial %= SHA256_BLOCK_SIZE; + memset(state->buf + partial, 0, bit_offset - partial); + bits = (__be64 *)(state->buf + bit_offset); + *bits = cpu_to_be64(state->count << 3); + + return SHA256_BLOCK_SIZE; } -static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *out) +static int sha256_pad1(struct ahash_request *req, bool final) { - return sha256_finup(desc, data, len, out, sha256_transform_rorx); + const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64); + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct sha256_state *state = &rctx->state; + unsigned int partial = state->count; + + if (!final) + return 0; + + rctx->total = 0; + rctx->input = state->buf; + + partial %= SHA256_BLOCK_SIZE; + state->buf[partial++] = 0x80; + + if (partial > bit_offset) { + memset(state->buf + partial, 0, SHA256_BLOCK_SIZE - partial); + return SHA256_BLOCK_SIZE; + } + + return sha256_pad2(req); } -static int sha256_avx2_final(struct shash_desc *desc, u8 *out) +static int sha256_mb_start(struct ahash_request *req, bool final) { - return sha256_avx2_finup(desc, NULL, 0, out); + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct sha256_state *state = &rctx->state; + unsigned int partial; + int nbytes; + + nbytes = crypto_hash_walk_first(req, &rctx->walk); + if (!nbytes) + return sha256_pad1(req, final); + + rctx->input = rctx->walk.data; + + partial = state->count % SHA256_BLOCK_SIZE; + while (partial + nbytes < SHA256_BLOCK_SIZE) { + memcpy(state->buf + partial, rctx->input, nbytes); + state->count += nbytes; + partial += nbytes; + + nbytes = crypto_hash_walk_done(&rctx->walk, 0); + if (!nbytes) + return sha256_pad1(req, final); + + rctx->input = rctx->walk.data; + } + + rctx->total = nbytes; + if (nbytes == 1) { + rctx->total = 0; + state->count++; + } + + if (partial) { + unsigned int offset = SHA256_BLOCK_SIZE - partial; + + memcpy(state->buf + partial, rctx->input, offset); + rctx->input = state->buf; + + return SHA256_BLOCK_SIZE; + } + + return nbytes; } -static int sha256_avx2_digest(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *out) +static int sha256_mb_next(struct ahash_request *req, unsigned int len, + bool final) { - return sha256_base_init(desc) ?: - sha256_avx2_finup(desc, data, len, out); + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct sha256_state *state = &rctx->state; + unsigned int partial; + + if (rctx->input != state->buf) { + rctx->input += len; + rctx->total -= len; + state->count += len; + } else if (rctx->total > 1) { + unsigned int offset; + + offset = SHA256_BLOCK_SIZE - state->count % SHA256_BLOCK_SIZE; + rctx->input = rctx->walk.data + offset; + rctx->total -= offset; + state->count += offset; + } else + return sha256_pad2(req); + + partial = 0; + while (partial + rctx->total < SHA256_BLOCK_SIZE) { + memcpy(state->buf + partial, rctx->input, rctx->total); + state->count += rctx->total; + partial += rctx->total; + + rctx->total = crypto_hash_walk_done(&rctx->walk, 0); + if (!rctx->total) + return sha256_pad1(req, final); + + rctx->input = rctx->walk.data; + } + + return rctx->total; } -static struct shash_alg sha256_avx2_algs[] = { { - .digestsize = SHA256_DIGEST_SIZE, - .init = sha256_base_init, +static void sha256_update_x8x1(struct list_head *list, + struct ahash_request *reqs[8], bool final) +{ + struct sha256_x8_mbctx mbctx; + unsigned int len = 0; + u32 *states[8]; + int i = 0; + + do { + struct sha256_reqctx *rctx = ahash_request_ctx(reqs[i]); + unsigned int nbytes; + + nbytes = rctx->next; + if (!i || nbytes < len) + len = nbytes; + + states[i] = rctx->state.state; + mbctx.input[i] = rctx->input; + } while (++i < 8 && reqs[i]); + + for (; i < 8; i++) { + mbctx.input[i] = mbctx.input[0]; + } + + for (i = 0; i < 8; i++) { + int j; + + for (j = 0; j < 8; j++) + mbctx.state[i][j] = states[j][i]; + } + + sha256_x8_avx2(&mbctx, len / SHA256_BLOCK_SIZE); + + for (i = 0; i < 8; i++) { + int j; + + for (j = 0; j < 8; j++) + states[i][j] = mbctx.state[j][i]; + } + + i = 0; + do { + struct sha256_reqctx *rctx = ahash_request_ctx(reqs[i]); + + rctx->next = sha256_mb_next(reqs[i], len, final); + + if (rctx->next) { + if (++i >= 8) + break; + continue; + } + + if (i == 7 || !reqs[i + 1]) { + struct ahash_request *r2 = reqs[i]; + + reqs[i] = NULL; + do { + while (!list_is_last(&r2->base.list, list)) { + r2 = list_next_entry(r2, base.list); + r2->base.err = 0; + + rctx = ahash_request_ctx(r2); + rctx->next = sha256_mb_start(r2, final); + if (rctx->next) { + reqs[i] = r2; + break; + } + } + } while (reqs[i] && ++i < 8); + + break; + } + + memmove(reqs + i, reqs + i + 1, sizeof(reqs[0]) * (7 - i)); + reqs[7] = NULL; + } while (reqs[i]); +} + +static void sha256_update_x8(struct list_head *list, + struct ahash_request *reqs[8], + bool final) +{ + do { + sha256_update_x8x1(list, reqs, final); + } while (reqs[0]); +} + +static void sha256_chain(struct ahash_request *req, bool final) +{ + struct sha256_reqctx *rctx = ahash_request_ctx(req); + struct ahash_request *reqs[8]; + struct ahash_request *r2; + int i; + + req->base.err = 0; + reqs[0] = req; + rctx->next = sha256_mb_start(req, final); + i = !!rctx->next; + list_for_each_entry(r2, &req->base.list, base.list) { + r2->base.err = 0; + + rctx = ahash_request_ctx(r2); + rctx->next = sha256_mb_start(r2, final); + if (!rctx->next) + continue; + + reqs[i++] = r2; + if (i < 8) + continue; + + sha256_update_x8(&req->base.list, reqs, final); + i = 0; + } + + if (i) { + memset(reqs + i, 0, sizeof(reqs) * (8 - i)); + sha256_update_x8(&req->base.list, reqs, final); + } + + return; +} + +static int sha256_avx2_update(struct ahash_request *req) +{ + struct ahash_request *r2; + int err; + + if (ahash_request_chained(req) && crypto_simd_usable()) { + sha256_chain(req, false); + return 0; + } + + err = sha256_ahash_update(req, sha256_transform_rorx); + if (!ahash_request_chained(req)) + return err; + + req->base.err = err; + + list_for_each_entry(r2, &req->base.list, base.list) { + err = sha256_ahash_update(r2, sha256_transform_rorx); + r2->base.err = err; + } + + return 0; +} + +static int sha256_avx2_finup(struct ahash_request *req) +{ + struct ahash_request *r2; + int err; + + if (ahash_request_chained(req) && crypto_simd_usable()) { + sha256_chain(req, true); + return 0; + } + + err = sha256_ahash_finup(req, sha256_transform_rorx); + if (!ahash_request_chained(req)) + return err; + + req->base.err = err; + + list_for_each_entry(r2, &req->base.list, base.list) { + err = sha256_ahash_finup(r2, sha256_transform_rorx); + r2->base.err = err; + } + + return 0; +} + +static int sha256_avx2_final(struct ahash_request *req) +{ + req->nbytes = 0; + return sha256_avx2_finup(req); +} + +static int sha256_avx2_digest(struct ahash_request *req) +{ + return sha256_ahash_init(req) ?: + sha256_avx2_finup(req); +} + +static int sha224_avx2_digest(struct ahash_request *req) +{ + return sha224_ahash_init(req) ?: + sha256_avx2_finup(req); +} + +static struct ahash_alg sha256_avx2_algs[] = { { + .halg.digestsize = SHA256_DIGEST_SIZE, + .halg.statesize = sizeof(struct sha256_state), + .reqsize = sizeof(struct sha256_reqctx), + .init = sha256_ahash_init, .update = sha256_avx2_update, .final = sha256_avx2_final, .finup = sha256_avx2_finup, .digest = sha256_avx2_digest, - .descsize = sizeof(struct sha256_state), - .base = { + .import = sha256_import, + .export = sha256_export, + .halg.base = { .cra_name = "sha256", .cra_driver_name = "sha256-avx2", .cra_priority = 170, .cra_blocksize = SHA256_BLOCK_SIZE, .cra_module = THIS_MODULE, + .cra_flags = CRYPTO_ALG_REQ_CHAIN, } }, { - .digestsize = SHA224_DIGEST_SIZE, - .init = sha224_base_init, + .halg.digestsize = SHA224_DIGEST_SIZE, + .halg.statesize = sizeof(struct sha256_state), + .reqsize = sizeof(struct sha256_reqctx), + .init = sha224_ahash_init, .update = sha256_avx2_update, .final = sha256_avx2_final, .finup = sha256_avx2_finup, - .descsize = sizeof(struct sha256_state), - .base = { + .digest = sha224_avx2_digest, + .import = sha256_import, + .export = sha256_export, + .halg.base = { .cra_name = "sha224", .cra_driver_name = "sha224-avx2", .cra_priority = 170, .cra_blocksize = SHA224_BLOCK_SIZE, .cra_module = THIS_MODULE, + .cra_flags = CRYPTO_ALG_REQ_CHAIN, } } }; @@ -317,7 +756,7 @@ static bool avx2_usable(void) static int register_sha256_avx2(void) { if (avx2_usable()) - return crypto_register_shashes(sha256_avx2_algs, + return crypto_register_ahashes(sha256_avx2_algs, ARRAY_SIZE(sha256_avx2_algs)); return 0; } @@ -325,7 +764,7 @@ static int register_sha256_avx2(void) static void unregister_sha256_avx2(void) { if (avx2_usable()) - crypto_unregister_shashes(sha256_avx2_algs, + crypto_unregister_ahashes(sha256_avx2_algs, ARRAY_SIZE(sha256_avx2_algs)); } diff --git a/arch/x86/crypto/sha256_x8_avx2.S b/arch/x86/crypto/sha256_x8_avx2.S new file mode 100644 index 000000000000..deb891b458c8 --- /dev/null +++ b/arch/x86/crypto/sha256_x8_avx2.S @@ -0,0 +1,596 @@ +/* + * Multi-buffer SHA256 algorithm hash compute routine + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * Megha Dey + * + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include "sha256_mb_mgr_datastruct.S" + +## code to compute oct SHA256 using SSE-256 +## outer calling routine takes care of save and restore of XMM registers +## Logic designed/laid out by JDG + +## Function clobbers: rax, rcx, rdx, rbx, rsi, rdi, r9-r15; %ymm0-15 +## Linux clobbers: rax rbx rcx rdx rsi r9 r10 r11 r12 r13 r14 r15 +## Linux preserves: rdi rbp r8 +## +## clobbers %ymm0-15 + +arg1 = %rdi +arg2 = %rsi +reg3 = %rcx +reg4 = %rdx + +# Common definitions +STATE = arg1 +INP_SIZE = arg2 + +IDX = %rax +ROUND = %rbx +TBL = reg3 + +inp0 = %r9 +inp1 = %r10 +inp2 = %r11 +inp3 = %r12 +inp4 = %r13 +inp5 = %r14 +inp6 = %r15 +inp7 = reg4 + +a = %ymm0 +b = %ymm1 +c = %ymm2 +d = %ymm3 +e = %ymm4 +f = %ymm5 +g = %ymm6 +h = %ymm7 + +T1 = %ymm8 + +a0 = %ymm12 +a1 = %ymm13 +a2 = %ymm14 +TMP = %ymm15 +TMP0 = %ymm6 +TMP1 = %ymm7 + +TT0 = %ymm8 +TT1 = %ymm9 +TT2 = %ymm10 +TT3 = %ymm11 +TT4 = %ymm12 +TT5 = %ymm13 +TT6 = %ymm14 +TT7 = %ymm15 + +# Define stack usage + +# Assume stack aligned to 32 bytes before call +# Therefore FRAMESZ mod 32 must be 32-8 = 24 + +#define FRAMESZ 0x388 + +#define VMOVPS vmovups + +# TRANSPOSE8 r0, r1, r2, r3, r4, r5, r6, r7, t0, t1 +# "transpose" data in {r0...r7} using temps {t0...t1} +# Input looks like: {r0 r1 r2 r3 r4 r5 r6 r7} +# r0 = {a7 a6 a5 a4 a3 a2 a1 a0} +# r1 = {b7 b6 b5 b4 b3 b2 b1 b0} +# r2 = {c7 c6 c5 c4 c3 c2 c1 c0} +# r3 = {d7 d6 d5 d4 d3 d2 d1 d0} +# r4 = {e7 e6 e5 e4 e3 e2 e1 e0} +# r5 = {f7 f6 f5 f4 f3 f2 f1 f0} +# r6 = {g7 g6 g5 g4 g3 g2 g1 g0} +# r7 = {h7 h6 h5 h4 h3 h2 h1 h0} +# +# Output looks like: {r0 r1 r2 r3 r4 r5 r6 r7} +# r0 = {h0 g0 f0 e0 d0 c0 b0 a0} +# r1 = {h1 g1 f1 e1 d1 c1 b1 a1} +# r2 = {h2 g2 f2 e2 d2 c2 b2 a2} +# r3 = {h3 g3 f3 e3 d3 c3 b3 a3} +# r4 = {h4 g4 f4 e4 d4 c4 b4 a4} +# r5 = {h5 g5 f5 e5 d5 c5 b5 a5} +# r6 = {h6 g6 f6 e6 d6 c6 b6 a6} +# r7 = {h7 g7 f7 e7 d7 c7 b7 a7} +# + +.macro TRANSPOSE8 r0 r1 r2 r3 r4 r5 r6 r7 t0 t1 + # process top half (r0..r3) {a...d} + vshufps $0x44, \r1, \r0, \t0 # t0 = {b5 b4 a5 a4 b1 b0 a1 a0} + vshufps $0xEE, \r1, \r0, \r0 # r0 = {b7 b6 a7 a6 b3 b2 a3 a2} + vshufps $0x44, \r3, \r2, \t1 # t1 = {d5 d4 c5 c4 d1 d0 c1 c0} + vshufps $0xEE, \r3, \r2, \r2 # r2 = {d7 d6 c7 c6 d3 d2 c3 c2} + vshufps $0xDD, \t1, \t0, \r3 # r3 = {d5 c5 b5 a5 d1 c1 b1 a1} + vshufps $0x88, \r2, \r0, \r1 # r1 = {d6 c6 b6 a6 d2 c2 b2 a2} + vshufps $0xDD, \r2, \r0, \r0 # r0 = {d7 c7 b7 a7 d3 c3 b3 a3} + vshufps $0x88, \t1, \t0, \t0 # t0 = {d4 c4 b4 a4 d0 c0 b0 a0} + + # use r2 in place of t0 + # process bottom half (r4..r7) {e...h} + vshufps $0x44, \r5, \r4, \r2 # r2 = {f5 f4 e5 e4 f1 f0 e1 e0} + vshufps $0xEE, \r5, \r4, \r4 # r4 = {f7 f6 e7 e6 f3 f2 e3 e2} + vshufps $0x44, \r7, \r6, \t1 # t1 = {h5 h4 g5 g4 h1 h0 g1 g0} + vshufps $0xEE, \r7, \r6, \r6 # r6 = {h7 h6 g7 g6 h3 h2 g3 g2} + vshufps $0xDD, \t1, \r2, \r7 # r7 = {h5 g5 f5 e5 h1 g1 f1 e1} + vshufps $0x88, \r6, \r4, \r5 # r5 = {h6 g6 f6 e6 h2 g2 f2 e2} + vshufps $0xDD, \r6, \r4, \r4 # r4 = {h7 g7 f7 e7 h3 g3 f3 e3} + vshufps $0x88, \t1, \r2, \t1 # t1 = {h4 g4 f4 e4 h0 g0 f0 e0} + + vperm2f128 $0x13, \r1, \r5, \r6 # h6...a6 + vperm2f128 $0x02, \r1, \r5, \r2 # h2...a2 + vperm2f128 $0x13, \r3, \r7, \r5 # h5...a5 + vperm2f128 $0x02, \r3, \r7, \r1 # h1...a1 + vperm2f128 $0x13, \r0, \r4, \r7 # h7...a7 + vperm2f128 $0x02, \r0, \r4, \r3 # h3...a3 + vperm2f128 $0x13, \t0, \t1, \r4 # h4...a4 + vperm2f128 $0x02, \t0, \t1, \r0 # h0...a0 + +.endm + +.macro ROTATE_ARGS +TMP_ = h +h = g +g = f +f = e +e = d +d = c +c = b +b = a +a = TMP_ +.endm + +.macro _PRORD reg imm tmp + vpslld $(32-\imm),\reg,\tmp + vpsrld $\imm,\reg, \reg + vpor \tmp,\reg, \reg +.endm + +# PRORD_nd reg, imm, tmp, src +.macro _PRORD_nd reg imm tmp src + vpslld $(32-\imm), \src, \tmp + vpsrld $\imm, \src, \reg + vpor \tmp, \reg, \reg +.endm + +# PRORD dst/src, amt +.macro PRORD reg imm + _PRORD \reg,\imm,TMP +.endm + +# PRORD_nd dst, src, amt +.macro PRORD_nd reg tmp imm + _PRORD_nd \reg, \imm, TMP, \tmp +.endm + +# arguments passed implicitly in preprocessor symbols i, a...h +.macro ROUND_00_15 _T1 i + PRORD_nd a0,e,5 # sig1: a0 = (e >> 5) + + vpxor g, f, a2 # ch: a2 = f^g + vpand e,a2, a2 # ch: a2 = (f^g)&e + vpxor g, a2, a2 # a2 = ch + + PRORD_nd a1,e,25 # sig1: a1 = (e >> 25) + + vmovdqu \_T1,(SZ8*(\i & 0xf))(%rsp) + vpaddd (TBL,ROUND,1), \_T1, \_T1 # T1 = W + K + vpxor e,a0, a0 # sig1: a0 = e ^ (e >> 5) + PRORD a0, 6 # sig1: a0 = (e >> 6) ^ (e >> 11) + vpaddd a2, h, h # h = h + ch + PRORD_nd a2,a,11 # sig0: a2 = (a >> 11) + vpaddd \_T1,h, h # h = h + ch + W + K + vpxor a1, a0, a0 # a0 = sigma1 + PRORD_nd a1,a,22 # sig0: a1 = (a >> 22) + vpxor c, a, \_T1 # maj: T1 = a^c + add $SZ8, ROUND # ROUND++ + vpand b, \_T1, \_T1 # maj: T1 = (a^c)&b + vpaddd a0, h, h + vpaddd h, d, d + vpxor a, a2, a2 # sig0: a2 = a ^ (a >> 11) + PRORD a2,2 # sig0: a2 = (a >> 2) ^ (a >> 13) + vpxor a1, a2, a2 # a2 = sig0 + vpand c, a, a1 # maj: a1 = a&c + vpor \_T1, a1, a1 # a1 = maj + vpaddd a1, h, h # h = h + ch + W + K + maj + vpaddd a2, h, h # h = h + ch + W + K + maj + sigma0 + ROTATE_ARGS +.endm + +# arguments passed implicitly in preprocessor symbols i, a...h +.macro ROUND_16_XX _T1 i + vmovdqu (SZ8*((\i-15)&0xf))(%rsp), \_T1 + vmovdqu (SZ8*((\i-2)&0xf))(%rsp), a1 + vmovdqu \_T1, a0 + PRORD \_T1,11 + vmovdqu a1, a2 + PRORD a1,2 + vpxor a0, \_T1, \_T1 + PRORD \_T1, 7 + vpxor a2, a1, a1 + PRORD a1, 17 + vpsrld $3, a0, a0 + vpxor a0, \_T1, \_T1 + vpsrld $10, a2, a2 + vpxor a2, a1, a1 + vpaddd (SZ8*((\i-16)&0xf))(%rsp), \_T1, \_T1 + vpaddd (SZ8*((\i-7)&0xf))(%rsp), a1, a1 + vpaddd a1, \_T1, \_T1 + + ROUND_00_15 \_T1,\i +.endm + +# void sha256_x8_avx2(struct sha256_mbctx *ctx, int blocks); +# +# arg 1 : ctx : pointer to array of pointers to input data +# arg 2 : blocks : size of input in blocks + # save rsp, allocate 32-byte aligned for local variables +SYM_FUNC_START(sha256_x8_avx2) + # save callee-saved clobbered registers to comply with C function ABI + push %r12 + push %r13 + push %r14 + push %r15 + + push %rbp + mov %rsp, %rbp + + sub $FRAMESZ, %rsp + and $~0x1F, %rsp + + # Load the pre-transposed incoming digest. + vmovdqu 0*SHA256_DIGEST_ROW_SIZE(STATE),a + vmovdqu 1*SHA256_DIGEST_ROW_SIZE(STATE),b + vmovdqu 2*SHA256_DIGEST_ROW_SIZE(STATE),c + vmovdqu 3*SHA256_DIGEST_ROW_SIZE(STATE),d + vmovdqu 4*SHA256_DIGEST_ROW_SIZE(STATE),e + vmovdqu 5*SHA256_DIGEST_ROW_SIZE(STATE),f + vmovdqu 6*SHA256_DIGEST_ROW_SIZE(STATE),g + vmovdqu 7*SHA256_DIGEST_ROW_SIZE(STATE),h + + lea K256_8(%rip),TBL + + # load the address of each of the 4 message lanes + # getting ready to transpose input onto stack + mov _args_data_ptr+0*PTR_SZ(STATE),inp0 + mov _args_data_ptr+1*PTR_SZ(STATE),inp1 + mov _args_data_ptr+2*PTR_SZ(STATE),inp2 + mov _args_data_ptr+3*PTR_SZ(STATE),inp3 + mov _args_data_ptr+4*PTR_SZ(STATE),inp4 + mov _args_data_ptr+5*PTR_SZ(STATE),inp5 + mov _args_data_ptr+6*PTR_SZ(STATE),inp6 + mov _args_data_ptr+7*PTR_SZ(STATE),inp7 + + xor IDX, IDX +lloop: + xor ROUND, ROUND + + # save old digest + vmovdqu a, _digest(%rsp) + vmovdqu b, _digest+1*SZ8(%rsp) + vmovdqu c, _digest+2*SZ8(%rsp) + vmovdqu d, _digest+3*SZ8(%rsp) + vmovdqu e, _digest+4*SZ8(%rsp) + vmovdqu f, _digest+5*SZ8(%rsp) + vmovdqu g, _digest+6*SZ8(%rsp) + vmovdqu h, _digest+7*SZ8(%rsp) + i = 0 +.rep 2 + VMOVPS i*32(inp0, IDX), TT0 + VMOVPS i*32(inp1, IDX), TT1 + VMOVPS i*32(inp2, IDX), TT2 + VMOVPS i*32(inp3, IDX), TT3 + VMOVPS i*32(inp4, IDX), TT4 + VMOVPS i*32(inp5, IDX), TT5 + VMOVPS i*32(inp6, IDX), TT6 + VMOVPS i*32(inp7, IDX), TT7 + vmovdqu g, _ytmp(%rsp) + vmovdqu h, _ytmp+1*SZ8(%rsp) + TRANSPOSE8 TT0, TT1, TT2, TT3, TT4, TT5, TT6, TT7, TMP0, TMP1 + vmovdqu PSHUFFLE_BYTE_FLIP_MASK(%rip), TMP1 + vmovdqu _ytmp(%rsp), g + vpshufb TMP1, TT0, TT0 + vpshufb TMP1, TT1, TT1 + vpshufb TMP1, TT2, TT2 + vpshufb TMP1, TT3, TT3 + vpshufb TMP1, TT4, TT4 + vpshufb TMP1, TT5, TT5 + vpshufb TMP1, TT6, TT6 + vpshufb TMP1, TT7, TT7 + vmovdqu _ytmp+1*SZ8(%rsp), h + vmovdqu TT4, _ytmp(%rsp) + vmovdqu TT5, _ytmp+1*SZ8(%rsp) + vmovdqu TT6, _ytmp+2*SZ8(%rsp) + vmovdqu TT7, _ytmp+3*SZ8(%rsp) + ROUND_00_15 TT0,(i*8+0) + vmovdqu _ytmp(%rsp), TT0 + ROUND_00_15 TT1,(i*8+1) + vmovdqu _ytmp+1*SZ8(%rsp), TT1 + ROUND_00_15 TT2,(i*8+2) + vmovdqu _ytmp+2*SZ8(%rsp), TT2 + ROUND_00_15 TT3,(i*8+3) + vmovdqu _ytmp+3*SZ8(%rsp), TT3 + ROUND_00_15 TT0,(i*8+4) + ROUND_00_15 TT1,(i*8+5) + ROUND_00_15 TT2,(i*8+6) + ROUND_00_15 TT3,(i*8+7) + i = (i+1) +.endr + add $64, IDX + i = (i*8) + + jmp Lrounds_16_xx +.align 16 +Lrounds_16_xx: +.rep 16 + ROUND_16_XX T1, i + i = (i+1) +.endr + + cmp $ROUNDS,ROUND + jb Lrounds_16_xx + + # add old digest + vpaddd _digest+0*SZ8(%rsp), a, a + vpaddd _digest+1*SZ8(%rsp), b, b + vpaddd _digest+2*SZ8(%rsp), c, c + vpaddd _digest+3*SZ8(%rsp), d, d + vpaddd _digest+4*SZ8(%rsp), e, e + vpaddd _digest+5*SZ8(%rsp), f, f + vpaddd _digest+6*SZ8(%rsp), g, g + vpaddd _digest+7*SZ8(%rsp), h, h + + sub $1, INP_SIZE # unit is blocks + jne lloop + + # write back to memory (state object) the transposed digest + vmovdqu a, 0*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu b, 1*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu c, 2*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu d, 3*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu e, 4*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu f, 5*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu g, 6*SHA256_DIGEST_ROW_SIZE(STATE) + vmovdqu h, 7*SHA256_DIGEST_ROW_SIZE(STATE) + + # update input pointers + add IDX, inp0 + mov inp0, _args_data_ptr+0*8(STATE) + add IDX, inp1 + mov inp1, _args_data_ptr+1*8(STATE) + add IDX, inp2 + mov inp2, _args_data_ptr+2*8(STATE) + add IDX, inp3 + mov inp3, _args_data_ptr+3*8(STATE) + add IDX, inp4 + mov inp4, _args_data_ptr+4*8(STATE) + add IDX, inp5 + mov inp5, _args_data_ptr+5*8(STATE) + add IDX, inp6 + mov inp6, _args_data_ptr+6*8(STATE) + add IDX, inp7 + mov inp7, _args_data_ptr+7*8(STATE) + + # Postamble + mov %rbp, %rsp + pop %rbp + + # restore callee-saved clobbered registers + pop %r15 + pop %r14 + pop %r13 + pop %r12 + + RET +SYM_FUNC_END(sha256_x8_avx2) + +.section .rodata.K256_8, "a", @progbits +.align 64 +K256_8: + .octa 0x428a2f98428a2f98428a2f98428a2f98 + .octa 0x428a2f98428a2f98428a2f98428a2f98 + .octa 0x71374491713744917137449171374491 + .octa 0x71374491713744917137449171374491 + .octa 0xb5c0fbcfb5c0fbcfb5c0fbcfb5c0fbcf + .octa 0xb5c0fbcfb5c0fbcfb5c0fbcfb5c0fbcf + .octa 0xe9b5dba5e9b5dba5e9b5dba5e9b5dba5 + .octa 0xe9b5dba5e9b5dba5e9b5dba5e9b5dba5 + .octa 0x3956c25b3956c25b3956c25b3956c25b + .octa 0x3956c25b3956c25b3956c25b3956c25b + .octa 0x59f111f159f111f159f111f159f111f1 + .octa 0x59f111f159f111f159f111f159f111f1 + .octa 0x923f82a4923f82a4923f82a4923f82a4 + .octa 0x923f82a4923f82a4923f82a4923f82a4 + .octa 0xab1c5ed5ab1c5ed5ab1c5ed5ab1c5ed5 + .octa 0xab1c5ed5ab1c5ed5ab1c5ed5ab1c5ed5 + .octa 0xd807aa98d807aa98d807aa98d807aa98 + .octa 0xd807aa98d807aa98d807aa98d807aa98 + .octa 0x12835b0112835b0112835b0112835b01 + .octa 0x12835b0112835b0112835b0112835b01 + .octa 0x243185be243185be243185be243185be + .octa 0x243185be243185be243185be243185be + .octa 0x550c7dc3550c7dc3550c7dc3550c7dc3 + .octa 0x550c7dc3550c7dc3550c7dc3550c7dc3 + .octa 0x72be5d7472be5d7472be5d7472be5d74 + .octa 0x72be5d7472be5d7472be5d7472be5d74 + .octa 0x80deb1fe80deb1fe80deb1fe80deb1fe + .octa 0x80deb1fe80deb1fe80deb1fe80deb1fe + .octa 0x9bdc06a79bdc06a79bdc06a79bdc06a7 + .octa 0x9bdc06a79bdc06a79bdc06a79bdc06a7 + .octa 0xc19bf174c19bf174c19bf174c19bf174 + .octa 0xc19bf174c19bf174c19bf174c19bf174 + .octa 0xe49b69c1e49b69c1e49b69c1e49b69c1 + .octa 0xe49b69c1e49b69c1e49b69c1e49b69c1 + .octa 0xefbe4786efbe4786efbe4786efbe4786 + .octa 0xefbe4786efbe4786efbe4786efbe4786 + .octa 0x0fc19dc60fc19dc60fc19dc60fc19dc6 + .octa 0x0fc19dc60fc19dc60fc19dc60fc19dc6 + .octa 0x240ca1cc240ca1cc240ca1cc240ca1cc + .octa 0x240ca1cc240ca1cc240ca1cc240ca1cc + .octa 0x2de92c6f2de92c6f2de92c6f2de92c6f + .octa 0x2de92c6f2de92c6f2de92c6f2de92c6f + .octa 0x4a7484aa4a7484aa4a7484aa4a7484aa + .octa 0x4a7484aa4a7484aa4a7484aa4a7484aa + .octa 0x5cb0a9dc5cb0a9dc5cb0a9dc5cb0a9dc + .octa 0x5cb0a9dc5cb0a9dc5cb0a9dc5cb0a9dc + .octa 0x76f988da76f988da76f988da76f988da + .octa 0x76f988da76f988da76f988da76f988da + .octa 0x983e5152983e5152983e5152983e5152 + .octa 0x983e5152983e5152983e5152983e5152 + .octa 0xa831c66da831c66da831c66da831c66d + .octa 0xa831c66da831c66da831c66da831c66d + .octa 0xb00327c8b00327c8b00327c8b00327c8 + .octa 0xb00327c8b00327c8b00327c8b00327c8 + .octa 0xbf597fc7bf597fc7bf597fc7bf597fc7 + .octa 0xbf597fc7bf597fc7bf597fc7bf597fc7 + .octa 0xc6e00bf3c6e00bf3c6e00bf3c6e00bf3 + .octa 0xc6e00bf3c6e00bf3c6e00bf3c6e00bf3 + .octa 0xd5a79147d5a79147d5a79147d5a79147 + .octa 0xd5a79147d5a79147d5a79147d5a79147 + .octa 0x06ca635106ca635106ca635106ca6351 + .octa 0x06ca635106ca635106ca635106ca6351 + .octa 0x14292967142929671429296714292967 + .octa 0x14292967142929671429296714292967 + .octa 0x27b70a8527b70a8527b70a8527b70a85 + .octa 0x27b70a8527b70a8527b70a8527b70a85 + .octa 0x2e1b21382e1b21382e1b21382e1b2138 + .octa 0x2e1b21382e1b21382e1b21382e1b2138 + .octa 0x4d2c6dfc4d2c6dfc4d2c6dfc4d2c6dfc + .octa 0x4d2c6dfc4d2c6dfc4d2c6dfc4d2c6dfc + .octa 0x53380d1353380d1353380d1353380d13 + .octa 0x53380d1353380d1353380d1353380d13 + .octa 0x650a7354650a7354650a7354650a7354 + .octa 0x650a7354650a7354650a7354650a7354 + .octa 0x766a0abb766a0abb766a0abb766a0abb + .octa 0x766a0abb766a0abb766a0abb766a0abb + .octa 0x81c2c92e81c2c92e81c2c92e81c2c92e + .octa 0x81c2c92e81c2c92e81c2c92e81c2c92e + .octa 0x92722c8592722c8592722c8592722c85 + .octa 0x92722c8592722c8592722c8592722c85 + .octa 0xa2bfe8a1a2bfe8a1a2bfe8a1a2bfe8a1 + .octa 0xa2bfe8a1a2bfe8a1a2bfe8a1a2bfe8a1 + .octa 0xa81a664ba81a664ba81a664ba81a664b + .octa 0xa81a664ba81a664ba81a664ba81a664b + .octa 0xc24b8b70c24b8b70c24b8b70c24b8b70 + .octa 0xc24b8b70c24b8b70c24b8b70c24b8b70 + .octa 0xc76c51a3c76c51a3c76c51a3c76c51a3 + .octa 0xc76c51a3c76c51a3c76c51a3c76c51a3 + .octa 0xd192e819d192e819d192e819d192e819 + .octa 0xd192e819d192e819d192e819d192e819 + .octa 0xd6990624d6990624d6990624d6990624 + .octa 0xd6990624d6990624d6990624d6990624 + .octa 0xf40e3585f40e3585f40e3585f40e3585 + .octa 0xf40e3585f40e3585f40e3585f40e3585 + .octa 0x106aa070106aa070106aa070106aa070 + .octa 0x106aa070106aa070106aa070106aa070 + .octa 0x19a4c11619a4c11619a4c11619a4c116 + .octa 0x19a4c11619a4c11619a4c11619a4c116 + .octa 0x1e376c081e376c081e376c081e376c08 + .octa 0x1e376c081e376c081e376c081e376c08 + .octa 0x2748774c2748774c2748774c2748774c + .octa 0x2748774c2748774c2748774c2748774c + .octa 0x34b0bcb534b0bcb534b0bcb534b0bcb5 + .octa 0x34b0bcb534b0bcb534b0bcb534b0bcb5 + .octa 0x391c0cb3391c0cb3391c0cb3391c0cb3 + .octa 0x391c0cb3391c0cb3391c0cb3391c0cb3 + .octa 0x4ed8aa4a4ed8aa4a4ed8aa4a4ed8aa4a + .octa 0x4ed8aa4a4ed8aa4a4ed8aa4a4ed8aa4a + .octa 0x5b9cca4f5b9cca4f5b9cca4f5b9cca4f + .octa 0x5b9cca4f5b9cca4f5b9cca4f5b9cca4f + .octa 0x682e6ff3682e6ff3682e6ff3682e6ff3 + .octa 0x682e6ff3682e6ff3682e6ff3682e6ff3 + .octa 0x748f82ee748f82ee748f82ee748f82ee + .octa 0x748f82ee748f82ee748f82ee748f82ee + .octa 0x78a5636f78a5636f78a5636f78a5636f + .octa 0x78a5636f78a5636f78a5636f78a5636f + .octa 0x84c8781484c8781484c8781484c87814 + .octa 0x84c8781484c8781484c8781484c87814 + .octa 0x8cc702088cc702088cc702088cc70208 + .octa 0x8cc702088cc702088cc702088cc70208 + .octa 0x90befffa90befffa90befffa90befffa + .octa 0x90befffa90befffa90befffa90befffa + .octa 0xa4506ceba4506ceba4506ceba4506ceb + .octa 0xa4506ceba4506ceba4506ceba4506ceb + .octa 0xbef9a3f7bef9a3f7bef9a3f7bef9a3f7 + .octa 0xbef9a3f7bef9a3f7bef9a3f7bef9a3f7 + .octa 0xc67178f2c67178f2c67178f2c67178f2 + .octa 0xc67178f2c67178f2c67178f2c67178f2 + +.section .rodata.cst32.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 32 +.align 32 +PSHUFFLE_BYTE_FLIP_MASK: +.octa 0x0c0d0e0f08090a0b0405060700010203 +.octa 0x0c0d0e0f08090a0b0405060700010203 + +.section .rodata.cst256.K256, "aM", @progbits, 256 +.align 64 +.global K256 +K256: + .int 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + .int 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + .int 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + .int 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + .int 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + .int 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + .int 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + .int 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + .int 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + .int 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + .int 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + .int 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + .int 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + .int 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + .int 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + .int 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2